No problem; this is not meant to be a regular operation, rather it's a (hopefully) one-time thing till the index can be restructured.
The data is chronological in nature, deleting everything before a specific point in time. The index is optimized, so is it possible to remove specific files? I'm open to other suggestions as to how to approach this. Also, I neglected to mention I'm using version 1.4.3. Greg --- "Michael D. Curtin" <[EMAIL PROTECTED]> wrote: > Greg Gershman wrote: > > > I'm trying to delete a large number of documents > > (~15million) from a a large index (30+ million > > documents). I've started with an optimized index, > and > > a list of docIds (our own unique identifier for a > > document, not a Lucene doc number) to pass to the > > IndexReader.delete(Term t) method. I've had a few > > different problems. > > ... > > Any ideas? I'm really confused, and the only > other > > option I can think of is to reindex the documents > I > > need, which would take much longer than deleting > the > > ones I dont. > > Maybe it would be useful to take a step back up the > tree of abstractions here > and reexamine why you're deleting such a large > fraction of your index, > particularly if you're doing it on a regular basis. > For example, is there a > chronological or other "natural" break in the data > such that you could make 2 > indexes with ~15M docs each in the first place, then > just delete a few index > *files* instead of 15M documents, one at a time? > > --MDC > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]