Update a bunch of documents

Carsten Schnober Thu, 11 Apr 2013 08:46:38 -0700

Hi,
I have the following scenario: I have an index of very large size
(although I'm testing with around 200,000 documents, but should scale to
many millions) and I want to perform a search on a certain field.
According to that search, I would like to manipulate a different field
for all the matching documents.
The only approach I could come up with so far would be to load the
matching documents ids into a Collector, iterate over them, load the
Document objects with IndexReader.document(docid), and manipulate them
one by one. Finally, I would delete all the documents matching the
initial query with IndexWriter.deleteDocuments(Query query) and write
the edited ones with IndexWriter.addDocuments(Iterable<? extends
Iterable<? extends IndexableField>> docs)


However, the iteration seems to be very time-consuming as it can concern
large portions of the indexed documents and I wonder if there is a
smarter way to perform the document manipulation. This is limited to one
field only (not the one on which the query is typically performed!),
shouldn't that help?

Thanks!
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Update a bunch of documents

Reply via email to