Hello, We are implementing a distributed searcher and indexer based on Lucene. I cannot share its code but I may provide hints basing on our experience.
What we did basically is having several machines indexing documents and creating small Lucene indexes. We hacked :-) IndexWriter of Lucene to start all segment names with a prefix unique for each small index part. Then, when adding it to the actual index, we simply copy the new segment to the folder with the other segments, and add it in such a way so that the optimize() function cannot be called. This way adding a new segment is very unintrusive for the searcher. Optimization is scheduled to happen at night. The index is divided into parts located on different physical machines, and here is where complexity begins. We do not try normalizing term weights across the machines, assuming that with large data quantities they will be more or less evenly distributed. But the problem exist, and probably in the future we will think how to handle it. The documents are distributed across the machines randomally, and merging the results becames a little head pain :-) Best Regards, Andrew Schetinin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]