Re: Best Practices for Distributing Lucene Indexing and Searching

Paul Smith Thu, 14 Jul 2005 15:07:53 -0700

I had a crack at whipping up something along this lines during a 1day hackathon we held here at work, using ActiveMQ as the bus betweenthe 'co-ordinator' (Queen bee) and the 'worker" bees. The index workwas segmented as jobs on a work queue, and the workers feed therelatively smal index chunks back along a result queue, which the co-ordinator then merged in.

The tough part from observing the outcome is knowing what the chunksize should be, because in the end the co-ordinator needs to mergeall the sub-indexes together into 1 and for a large index that's notan insignificant time. You also have to use bookkeeping to work outif a 'job' has not been completed in time (maybe failure by theworker) and decide whether the job should be resubmitted (in theoryJMS with transactions would help there, but then you have athroughput problem on that too).

Would love to see something like this work really well, and perhapsgeneralize it a bit more. I do like the simplicity of the SEDAprinciples.


cheers,

Paul Smith


On 14/07/2005, at 11:50 PM, Peter Gelderbloem wrote:

I am currently looking into building a similar system and came across
this architecture:
http://www.eecs.harvard.edu/~mdw/proj/seda/

I am just reading up on it now. Does anyone have experience building a
lucene system based on this architecture? Any advice would be greatly
appreciated.

Peter Gelderbloem

   Registered in England 3186704
-----Original Message-----
From: Luke Francl [mailto:[EMAIL PROTECTED]
Sent: 13 May 2005 22:04
To: java-user@lucene.apache.org
Subject: Re: Best Practices for Distributing Lucene Indexing and
Searching

On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote:

I don't really consider reading/writing to an NFS mounted FSDirectory

to

be viable for the very reasons you listed; but I haven't really found

any

evidence of problems if you take they approach that a single "writer"
node indexes to local disk, which is NFS mounted by all of your other
nodes for doing queries.  concurent updates/queries may still not be

safe

(i'm not sure) but you could have the writer node "clone" the entire

index

into a new directory, apply the updates and then signal the other

nodes to

stop using the old FSDirectory and start using the new one.


Thanks to everyone who contributed advice to my question about how to
distribute a Lucene index across a cluster.

I'm about to start on the implementation and I wanted to clarify
something about using NFS that Chris wrote about above.

There are many warnings about indexing on an NFS file system, butis it

safe to have a single node index, while the other nodes use the file
system in read-only mode?

On a related note, our software is cross-platform and needs to work on
Windows as well. Are there any problems known problems having a
read-only index shared over SMB?

Using a shared file system is preferable to me because it's easier,butif it's necessary I will write the code to copy the index to eachnode.


Thanks,
Luke Francl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best Practices for Distributing Lucene Indexing and Searching

Reply via email to