Wednesday, November 16, 2011

Ruby/Rails utilizing Solr in Master/Slave setup

Apache Solr is a high-performance enterprise grade search server being interacted with a REST-like API, documents are provided to as xml, json or binary. It extends Lucene search library.

Ruby/Rails communicates with this awesome search server using Sunspot (sunspot, sunspot_rails gem) library, to do a full-text search. Nice tutorial to use Solr in your Rails project using Sunspot.

Solr can perform as a standalone search server and even as master/slave instances collaborating with each other, with slaves polling master to sync-in their data.

Solr instances can be configured like Slave and as a Master.
Configuration file 'solrconfig.xml' needs to be edited with a new RequestHandler for Replication configured:
 
{for Slave} :: to either poll at their Master's machine address at regular intervals
{for Master} :: or commit the changes and clone the mentioned configuration files when Slave asked for it 
for detailed reference:http://wiki.apache.org/solr/SolrReplication 
for optimizaed ways using ssh/rsync based replication:
http://wiki.apache.org/solr/CollectionDistribution

Now, here you can even use a single configuration file with 'Replication' node having fields for both Master and Slave with an 'enable' child-node with possible values 'true|false' set as per requirement for Master & Slave nodes.

Sunspot dealing with a single Solr instance, gets $ProjectRoot/config/sunspot.yml

production:
  solr:
    hostname: standaloneSolr.mydomain.com
    port: 8983
    log_level: WARNING

Sunspot dealing with a master/slave Solr set-up, gets $ProjectRoot/config/sunspot.yml

production:
  solr:
    hostname: slaveSolr.mydomain.com
    port: 8983

    master_hostname: masterSolr.mydomain.com
    master_port: 8983
    log_level: WARNING
  master_solr:
    hostname: masterSolr.mydomain.com
    port: 8983
    log_level: WARNING
If you have more than one slaves, they need to be handled by a load-balancer and the DNS Entry for that Load Balancer comes here in slave's hostname field.

Also, the fields 'master_hostname' and 'master_port' below 'solr:' are not mandatory and supposed to be referred from 'master_solr:' block. But, it has been observed in some cases that mentioning them explicitly avoids non-picking of configuration.

By default, Sunspot configures Ruby/Rails application to Write-Only to Master and Read-Only from Slave.

No comments:

Post a Comment