I'm trying to delete a large number of documents (~15million) from a a large index (30+ million documents). I've started with an optimized index, and a list of docIds (our own unique identifier for a document, not a Lucene doc number) to pass to the IndexReader.delete(Term t) method. I've had a few different problems.
The following code is inside the loop that iterates through the document IDs: try { Term t = new Term("docID", String.valueOf(docID)); deletedCount+=indexReader.delete(t); } catch (Exception e) { System.out.println("Error while deleting docID#" + docID); e.printStackTrace(); } In order to commit the deletions, I also close and reopen the IndexReader periodically. At first I was reopening the IndexReader after every 500K documents deleted. The problem was that after ~60-75K deletions, the delete call began to throw a NullPointerException: Error while deleting docID#27136356 java.lang.NullPointerException at java.lang.String.compareTo(String.java:402) at org.apache.lucene.index.Term.compareTo(Term.java:76) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:132) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364) at org.apache.lucene.index.IndexReader.delete(IndexReader.java:449) at IndexEraser.main(IndexEraser.java:32) After a little fiddling around, I tried reducing the interval between reopens to 5000, and most of the NullPointerExceptions went away. A test search of the resulting, unoptimized index worked fine. I then optimized the index to reduce the size of the index. Now, instead of getting data back for many of the results, I get a null value. Any ideas? I'm really confused, and the only other option I can think of is to reindex the documents I need, which would take much longer than deleting the ones I dont. Thanks! Greg Gershman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]