Re: incremental document field update

2010-01-17 Thread Babak Farhang
Thanks Mike! This is pretty cool.. So LUCENE-1879 takes care of aligning (syncing) doc-ids across parallel index / segment merges. Missing is the machinery for updating a field (or fields) in a parallel slave index: to do this the appropriate segment in the slave index must somehow be rewritten.

Using TermDocs.seek vs. IndexReader.termDocs()

2010-01-17 Thread Shai Erera
Hi I remember a while ago a discussion around the efficiency of TermDocs.seek and how it is inefficient and it's better to call IndexReader.termDocs instead (actually someone was proposing to remove seek entirely from the interface because of that). I've looked at FieldCacheImpl's ByteCache.create

Re: incremental document field update

2010-01-17 Thread Michael McCandless
On Sun, Jan 17, 2010 at 4:33 AM, Babak Farhang wrote: > Thanks Mike!  This is pretty cool.. > > So LUCENE-1879 takes care of aligning (syncing) doc-ids across > parallel index / segment merges. Missing is the machinery for > updating a field (or fields) in a parallel slave index: to do this the >

Re: Using TermDocs.seek vs. IndexReader.termDocs()

2010-01-17 Thread Michael McCandless
On Sun, Jan 17, 2010 at 5:01 AM, Shai Erera wrote: > I remember a while ago a discussion around the efficiency of TermDocs.seek > and how it is inefficient and it's better to call IndexReader.termDocs > instead (actually someone was proposing to remove seek entirely from the > interface because o

Re: Using TermDocs.seek vs. IndexReader.termDocs()

2010-01-17 Thread Shai Erera
Oh right, I confused TermEnum.skipTo w/ TermDocs.seek. Thanks for reminding me that. BTW, the flex implementation looks really useful. I like it that I won't need to compare the field anymore. Looking forward to it. Thanks Shai On Sun, Jan 17, 2010 at 12:24 PM, Michael McCandless < luc...@mikemc

Re: incremental document field update

2010-01-17 Thread Babak Farhang
> So the idea is, I can change the field for only a few docs in a > massive index, and the amount of "work" done, and bytes written, is in > proportion only to how many docs were changed? Exactly. We append auxiliary data to the parallel segment and delay rewriting the segment to when it'll be mer

Re: incremental document field update

2010-01-17 Thread Michael McCandless
On Sun, Jan 17, 2010 at 7:45 AM, Babak Farhang wrote: >> So the idea is, I can change the field for only a few docs in a >> massive index, and the amount of "work" done, and bytes written, is in >> proportion only to how many docs were changed? > > Exactly. We append auxiliary data to the parallel

Re: incremental document field update

2010-01-17 Thread Babak Farhang
> Got it. You'd presumably have to add a generation to this file, so > that multiple sessions of updates + commit write to private files > ("write once")? And then the reader merges all of them. Actually, I hadn't considered concurrent update/commit semantics; I was thinking more along a single