Hi, (I am using Lucene 2.0.0)
I have been looking at a way to use stable IDs with Lucene. The reason I want this is so I can efficiently store and retrieve information outside of Lucene for filtering search results. It looks like this is going to require most of Lucene to be rewritten, so I gave up on that approach. I have a new idea where I want the documents IDs to only change at a specific moment instead of whenever Lucene choses to do so. This way the document IDs remain stable and I can use these IDs in the external data. I want to merge the segments of the index at a specific moment because updating the external data to match the new document IDs is too expensive to do continuously. At the moment that I want to merge the segments of the index causing the document IDs to change, I can also update my external data so the correct data is attached to the correct Lucene document ID. If I understand correctly, merging only shifts document IDs to remove deleted document IDs, so I can do the same shifting with the external data by getting the set of deleted documents before the merge. I already set 'mergeFactor' and 'maxBufferedDocs' to very high values so all documents of a batch will be stored in RAM. The problem I am facing is that the IndexWriter merges the segments in RAM with the segments on disk when I close the IndexWriter. What I need instead is that the IndexWriter will create a new segment on disk containing the data from the segment(s) in RAM. This way the document IDs of the exising disk segments are not affected. Creating a new segment instead of merging with the existing ones will also cause lots of segments with a variable number of documents to be created on disk, but I believe the IndexReader/IndexSearcher is able to handle this. I only have to make sure that the number of segments does not become to high (i.e. merge regularly) because this might cause 'too many open files' errors. So my questions are: is there a way to prevent the IndexWriter from merging, forcing it to create a new segment for each indexing batch? And if so, will it still be possible to merge the disk segments when I want to? Kind regards, Johan Stuyts Hippo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]