Greg Gershman wrote:

I'm trying to delete a large number of documents
(~15million) from a a large index (30+ million
documents).  I've started with an optimized index, and
a list of docIds (our own unique identifier for a
document, not a Lucene doc number) to pass to the
IndexReader.delete(Term t) method.  I've had a few
different problems.
...
Any ideas?  I'm really confused, and the only other
option I can think of is to reindex the documents I
need, which would take much longer than deleting the
ones I dont.

Maybe it would be useful to take a step back up the tree of abstractions here and reexamine why you're deleting such a large fraction of your index, particularly if you're doing it on a regular basis. For example, is there a chronological or other "natural" break in the data such that you could make 2 indexes with ~15M docs each in the first place, then just delete a few index *files* instead of 15M documents, one at a time?

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to