Re: Replicating Lucene Index with out SOLR

mark harwood Thu, 28 Aug 2008 03:22:01 -0700

>> You don't need to copy the whole index every time
>> if you do incremental  indexing/updates and don't optimize the index



But at 5 minute intervals for replication does this not quickly lead to a very 
fragmented index?

It seems there is a fundamental conflict when building replication systems 
based entirely on the lucene file format:
* In the interests of good search performance the index should ideally be a 
small number of large files (which is what mergepolicy/optimize are all about 
maintaining)
* However, in the interest of minimising replication network traffic, the ideal 
is a large number of small files.

I've previously built replication systems which rely on each server pulling 
deltas in the form of insert/update/delete records from a database and using 
IndexWriter locally on each server to apply these sets of changes. Obviously 
this duplicates the analyzing/indexing effort across replicas but does mean the 
content being transferred is not restricted by the design of the Lucene file 
format and therefore uses minimal network traffic and places no restrictions on 
the IndexWriter merge policies I may choose to use to optimise search speed.

Keen to explore the pros and cons of these different replication schemes.

Cheers,
Mark



--- On Thu, 28/8/08, rahul_k123 <[EMAIL PROTECTED]> wrote:

> From: rahul_k123 <[EMAIL PROTECTED]>
> Subject: Re: Replicating Lucene Index with out SOLR
> To: java-user@lucene.apache.org
> Date: Thursday, 28 August, 2008, 6:47 AM
> Can i make use of solr scripts for this purpose.
> 
> 
> The snapinstaller runs on the slave after a snapshot has
> been pulled from
> the master. This signals the local Solr server to open a
> new index reader,
> then auto-warming of the cache(s) begins (in the new
> reader), while other
> requests continue to be served by the original index
> reader.
> 
> How can i achieve the above in my case??
> 
> 
> Otis Gospodnetic wrote:
> > 
> > You don't need to copy the whole index every time
> if you do incremental
> > indexing/updates and don't optimize the index
> before copying.  If you use
> > rsync for copying the index, only the new/modified
> files be copied.  This
> > is what Solr replication scripts do, too.
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr -
> Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: rahul_k123 <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]
> >> Sent: Wednesday, August 27, 2008 11:36:07 PM
> >> Subject: Re: Replicating Lucene Index with out
> SOLR
> >> 
> >> 
> >> Currently we index every certain amount of time on
> A.
> >> 
> >> -copy the index
> >>      Copying the whole index everytime ? 
> >> 
> >> Currently i am investigating how i can make use of
> SOLR replication
> >> scripts
> >> to achive this.
> >> 
> >> 
> >> Is there anyone who did this with out SOLR before?
> >> 
> >> 
> >> Thanks
> >> 
> >> 
> >> 
> >> Otis Gospodnetic wrote:
> >> > 
> >> > Hi,
> >> > 
> >> > You may want to ask on the java-user list
> (more subscribers), which I'm
> >> > CC-ing, so we can continue discussion there.
> >> > I think you will have to implement your own
> logic that runs on A and
> >> does
> >> > something like this:
> >> > 
> >> > - stop adding new docs
> >> > - call commit on the IndexWriter
> >> > 
> >> > - copy the index
> >> > - resume indexing
> >> > 
> >> > Otis
> >> > --
> >> > Sematext -- http://sematext.com/ -- Lucene -
> Solr - Nutch
> >> > 
> >> > 
> >> > 
> >> > ----- Original Message ----
> >> >> From: rahul_k123 
> >> >> To: [EMAIL PROTECTED]
> >> >> Sent: Thursday, August 28, 2008 1:34:41
> AM
> >> >> Subject: Replicating Lucene Index with
> out SOLR
> >> >> 
> >> >> 
> >> >> I have the following requirement
> >> >> 
> >> >> Right now we have multiple indexes 
> serving our web application. Our
> >> >> indexes
> >> >> are around 30 GB size.
> >> >> 
> >> >> We want to replicate the index data so
> that we can use them to
> >> distribute
> >> >> the search load.
> >> >> 
> >> >> This is what we need ideally.
> >> >> 
> >> >> A – (supports writes and reads)
> >> >> 
> >> >> A1 –Replicated Index (Supports reads) 
> . We want to synchronize this
> >> >> every 5
> >> >> mins.
> >> >> 
> >> >> 
> >> >> 
> >> >> Any help is appreciated.   We are not
> using SOLR
> >> >> 
> >> >> I also interested in knowing what will be
> the best way so that I can
> >> >> scale
> >> >> my application adding more boxes for
> search if our load increases.
> >> >> 
> >> >> Thanks.  
> >> >> 
> >> >> -- 
> >> >> View this message in context: 
> >> >> 
> >>
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19191752.html
> >> >> Sent from the Lucene - General mailing
> list archive at Nabble.com.
> >> > 
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >>
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19193670.html
> >> Sent from the Lucene - General mailing list
> archive at Nabble.com.
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> > 
> > 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p19194576.html
> Sent from the Lucene - Java Users mailing list archive at
> Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]


Send instant messages to your online friends http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Replicating Lucene Index with out SOLR

Reply via email to