Greg Gershman wrote:
I'm trying to delete a large number of documents (~15million) from a a large index (30+ million documents). I've started with an optimized index, and a list of docIds (our own unique identifier for a document, not a Lucene doc number) to pass to the IndexReader.delete(Term t) method. I've had a few different problems. ... Any ideas? I'm really confused, and the only other option I can think of is to reindex the documents I need, which would take much longer than deleting the ones I dont.
Maybe it would be useful to take a step back up the tree of abstractions here and reexamine why you're deleting such a large fraction of your index, particularly if you're doing it on a regular basis. For example, is there a chronological or other "natural" break in the data such that you could make 2 indexes with ~15M docs each in the first place, then just delete a few index *files* instead of 15M documents, one at a time?
--MDC --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]