Re: incremental document field update

Michael McCandless Thu, 14 Jan 2010 02:39:57 -0800

Parallel incremental indexing
(http://issues.apache.org/jira/browse/LUCENE-1879) is one way to solve
this.


Mike

On Thu, Jan 14, 2010 at 4:27 AM, Babak Farhang <farh...@gmail.com> wrote:
>> Reading that trail, I wish the original poster gave up on his idea (
>
> Err, that should have read..
>
> "Reading that trail, I wish the original poster hadn't given up on his idea"
>
>
> On Thu, Jan 14, 2010 at 2:23 AM, Babak Farhang <farh...@gmail.com> wrote:
>> Hi,
>>
>> I've been thinking about how to update a single field of a document
>> without touching its other fields. This is an old problem and I was
>> considering a solution along the lines of Andrzej Bialecki's post to
>> the dev list back in '07:
>>
>>
>> <quote  http://markmail.org/message/tbkgmnilhvrt6bii >
>>
>> I have the following scenario: I want to use ParallelReader to
>> maintain parts of the index that are changing quickly, and where
>> changes are limited to specific fields only.
>>
>> Let's say I have a "main" index (many fields, slowly changing, large
>> updates), and an "aux" index (fast changing, usually single doc and
>> single field updates). I'd like to "replace" documents in the "aux"
>> index - that is, delete one doc and add another - but in a way that
>> doesn't change the internal document numbers, so that I can keep the
>> mapping required by ParallelReader intact.
>>
>> I think this is possible to achieve by using a FilterIndexReader,
>> which keeps a map of updated documents, and re-maps old doc ids to the
>> new ones on the fly.
>>
>> From time to time I'd like to optimize the "aux" index to get rid of
>> deleted docs. At this time I need to figure out how to preserve the
>> old->new mapping during the optimization.
>>
>> So, here's the question: is this scenario feasible? If so, then in the
>> trunk/ version of Lucene, is there any way to figure out (predictably)
>> how internal document numbers are reassigned after calling optimize()
>> ?
>>
>> </quote>
>>
>>
>> Reading that trail, I wish the original poster gave up on his idea (
>> http://markmail.org/message/tbkgmnilhvrt6bii#query:+page:1+mid:kn77zpiu43kd2ufn+state:results
>> )
>>
>>
>> <quote>
>> Thanks for the input - for now I gave up on this, after discovering
>> that I would have no way to ensure in TermDocs.skipTo() that document
>> id-s are monotonically increasing (which seems to be a part of the
>> contract).
>> </quote>
>>
>> I imagine if Andrzej's proposed FilterIndexReader maintains 2 sorted
>> (ordered) maps, one from internal document-ids to "view" document-ids,
>> and another mapping from  "view" document-ids to internal
>> document-ids, then things like skipTo() can be implemented reasonably
>> efficiently. Only the mapped ids are maintained in these structures.
>> (Also note that a mapped "view" document-id represents an internally
>> deleted document with that id.)
>>
>> And if we can find a way to merge the segments of this "aux" index
>> along whenever the segments of its associated "main" index are merged
>> or optimized (such that the [internal] doc-ids in the merged aux index
>> end up getting sync'ed with those of the trunk), then there shouldn't
>> be all that many doc-ids to map anyway (if we merge frequently
>> enough).
>>
>> So to go back Andrzej's question: is there any way to figure out
>> (predictably) how internal document numbers [in the main index] are
>> reassigned after calling optimize() ? How does LUCENE-847, as Doug
>> Cutting suggests in that trail, help?
>>
>> Sorry if that was long winded, had to start somewhere ;)
>>
>> -Babak
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: incremental document field update

Reply via email to