A few days ago someone on this list asked how to efficiently "update" documents in the index, i.e., delete the old version of the document (found by some unique id field) and add the new version. The problem was that opening and closing the IndexReader and IndexWriter after each document was inefficient (using IndexModifier doesn't help here, because it does the same under the scenes). I was also interested in doing the same thing myself.
People suggested doing the deletes immediately and buffering the document additions in memory for later. This is doable, but I wanted to avoid buffering the new documents (potentially large) in memory myself (let Lucene do whatever buffering it wishes in IndexWriter). I also did not like the idea that in some periods of time, searches will not return the updated file, because the old version was already deleted and the new version was not yet indexed. I therefore came up with the following solution, which I'll be happy to hear comments about (especially if you think this solution is broken in some way or my assumptions are wrong). The idea is basically this: when I want to replace a document, I immediatly add (with IndexWriter.addDocument) the new document to the open IndexWriter. I also save the document;s unique id term to a vector "idsReplaced", of terms we will deal with later: private Vector idsReplaced = new Vector(); public void replaceDocument(Document document, String idfield, Analyzer analyzer) throws IOException { indexwriter.addDocument(document, analyzer); idsReplaced.add(new Term(idfield,document.get(idfield))); } Now, when I want to flush the index, I close the IndexWriter to make sure all the new documents were added, and then for each id in the idsReplaced vector, I remove all but the last document with this id. The trick here is that IndexReader.termDocs(term) returns the matching documents ordered by internal document number, and documents added later get a higher number (I hope this is actually true... It seems like that in my experiments), so we can delete all but the last matching document for the same id. The code looks something like this: // call this after doing indexwriter.close(); private void doDelete() throws IOException { if(idsReplaced.isEmpty()) return; IndexReader ir = IndexReader.open(indexDir); for(Iterator i = idsReplaced.iterator(); i.hasNext();){ Term term = (Term) i.next(); TermDocs docs = ir.termDocs(term); int doctodelete = -1; while(docs.next()){ if(doctodelete>0) ir.deleteDocument(doctodelete); doctodelete=docs.doc(); } } idsReplaced.clear(); ir.close(); } I did not test this idea too much, but in some initial experiments I tried, it seems to work. -- Nadav Har'El [EMAIL PROTECTED] +972-4-829-6326 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]