RE: Distributed Lucene..

Andrew Schetinin Mon, 06 Mar 2006 06:37:48 -0800

Hello,

We are implementing a distributed searcher and indexer based on Lucene.
I cannot share its code but I may provide hints basing on our
experience.


What we did basically is having several machines indexing documents and
creating small Lucene indexes. 
We hacked :-) IndexWriter of Lucene to start all segment names with a
prefix unique for each small index part.
Then, when adding it to the actual index, we simply copy the new segment
to the folder with the other segments, and add it in such a way so that
the optimize() function cannot be called.
This way adding a new segment is very unintrusive for the searcher.
Optimization is scheduled to happen at night.

The index is divided into parts located on different physical machines,
and here is where complexity begins.
We do not try normalizing term weights across the machines, assuming
that with large data quantities they will be more or less evenly
distributed. 
But the problem exist, and probably in the future we will think how to
handle it.
The documents are distributed across the machines randomally, and
merging the results becames a little head pain :-)

Best Regards,

Andrew Schetinin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Distributed Lucene..

Reply via email to