Parallel incremental indexing (http://issues.apache.org/jira/browse/LUCENE-1879) is one way to solve this.
Mike On Thu, Jan 14, 2010 at 4:27 AM, Babak Farhang <farh...@gmail.com> wrote: >> Reading that trail, I wish the original poster gave up on his idea ( > > Err, that should have read.. > > "Reading that trail, I wish the original poster hadn't given up on his idea" > > > On Thu, Jan 14, 2010 at 2:23 AM, Babak Farhang <farh...@gmail.com> wrote: >> Hi, >> >> I've been thinking about how to update a single field of a document >> without touching its other fields. This is an old problem and I was >> considering a solution along the lines of Andrzej Bialecki's post to >> the dev list back in '07: >> >> >> <quote http://markmail.org/message/tbkgmnilhvrt6bii > >> >> I have the following scenario: I want to use ParallelReader to >> maintain parts of the index that are changing quickly, and where >> changes are limited to specific fields only. >> >> Let's say I have a "main" index (many fields, slowly changing, large >> updates), and an "aux" index (fast changing, usually single doc and >> single field updates). I'd like to "replace" documents in the "aux" >> index - that is, delete one doc and add another - but in a way that >> doesn't change the internal document numbers, so that I can keep the >> mapping required by ParallelReader intact. >> >> I think this is possible to achieve by using a FilterIndexReader, >> which keeps a map of updated documents, and re-maps old doc ids to the >> new ones on the fly. >> >> From time to time I'd like to optimize the "aux" index to get rid of >> deleted docs. At this time I need to figure out how to preserve the >> old->new mapping during the optimization. >> >> So, here's the question: is this scenario feasible? If so, then in the >> trunk/ version of Lucene, is there any way to figure out (predictably) >> how internal document numbers are reassigned after calling optimize() >> ? >> >> </quote> >> >> >> Reading that trail, I wish the original poster gave up on his idea ( >> http://markmail.org/message/tbkgmnilhvrt6bii#query:+page:1+mid:kn77zpiu43kd2ufn+state:results >> ) >> >> >> <quote> >> Thanks for the input - for now I gave up on this, after discovering >> that I would have no way to ensure in TermDocs.skipTo() that document >> id-s are monotonically increasing (which seems to be a part of the >> contract). >> </quote> >> >> I imagine if Andrzej's proposed FilterIndexReader maintains 2 sorted >> (ordered) maps, one from internal document-ids to "view" document-ids, >> and another mapping from "view" document-ids to internal >> document-ids, then things like skipTo() can be implemented reasonably >> efficiently. Only the mapped ids are maintained in these structures. >> (Also note that a mapped "view" document-id represents an internally >> deleted document with that id.) >> >> And if we can find a way to merge the segments of this "aux" index >> along whenever the segments of its associated "main" index are merged >> or optimized (such that the [internal] doc-ids in the merged aux index >> end up getting sync'ed with those of the trunk), then there shouldn't >> be all that many doc-ids to map anyway (if we merge frequently >> enough). >> >> So to go back Andrzej's question: is there any way to figure out >> (predictably) how internal document numbers [in the main index] are >> reassigned after calling optimize() ? How does LUCENE-847, as Doug >> Cutting suggests in that trail, help? >> >> Sorry if that was long winded, had to start somewhere ;) >> >> -Babak >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org