Re: incremental document field update

2010-01-21 Thread Babak Farhang
> OK; this approach (modifying an already written & possible in-use (by > an IndexReader) file) would be problematic for Lucene... If you have N slots, there would have to be N-1 commits + an Nth commit in progress while reading the "entry-count block" for there to be the possibility of a bad rea

Re: incremental document field update

2010-01-20 Thread Michael McCandless
On Tue, Jan 19, 2010 at 10:45 PM, Babak Farhang wrote: >> I see -- so your file format allows you to append to the same file >> without affecting prior readers?  We never do that in Lucene today >> (all files are "write once"). > > Yes. For the most part it only appends. The exception is when the

Re: incremental document field update

2010-01-19 Thread Babak Farhang
> I see -- so your file format allows you to append to the same file > without affecting prior readers? We never do that in Lucene today > (all files are "write once"). Yes. For the most part it only appends. The exception is when the log's entry count is updated (when the appends actually "commi

Re: incremental document field update

2010-01-19 Thread Michael McCandless
On Tue, Jan 19, 2010 at 1:32 AM, Babak Farhang wrote: >> This is about multiple sessions with the writer.  Ie, open writer, >> update a few docs, close.  Do the same again, but, that 2nd session >> cannot overwrite the same files from the first one, since readers may >> have those files open.  The

Re: incremental document field update

2010-01-18 Thread Babak Farhang
> This is about multiple sessions with the writer. Ie, open writer, > update a few docs, close. Do the same again, but, that 2nd session > cannot overwrite the same files from the first one, since readers may > have those files open. The "write once" model gives Lucene its > transactional semant

Re: incremental document field update

2010-01-18 Thread Michael McCandless
On Mon, Jan 18, 2010 at 12:35 AM, Babak Farhang wrote: >> Got it. You'd presumably have to add a generation to this file, so >> that multiple sessions of updates + commit write to private files >> ("write once")? And then the reader merges all of them. > > Actually, I hadn't considered concurre

Re: incremental document field update

2010-01-17 Thread Babak Farhang
> Got it. You'd presumably have to add a generation to this file, so > that multiple sessions of updates + commit write to private files > ("write once")? And then the reader merges all of them. Actually, I hadn't considered concurrent update/commit semantics; I was thinking more along a single

Re: incremental document field update

2010-01-17 Thread Michael McCandless
On Sun, Jan 17, 2010 at 7:45 AM, Babak Farhang wrote: >> So the idea is, I can change the field for only a few docs in a >> massive index, and the amount of "work" done, and bytes written, is in >> proportion only to how many docs were changed? > > Exactly. We append auxiliary data to the parallel

Re: incremental document field update

2010-01-17 Thread Babak Farhang
> So the idea is, I can change the field for only a few docs in a > massive index, and the amount of "work" done, and bytes written, is in > proportion only to how many docs were changed? Exactly. We append auxiliary data to the parallel segment and delay rewriting the segment to when it'll be mer

Re: incremental document field update

2010-01-17 Thread Michael McCandless
On Sun, Jan 17, 2010 at 4:33 AM, Babak Farhang wrote: > Thanks Mike!  This is pretty cool.. > > So LUCENE-1879 takes care of aligning (syncing) doc-ids across > parallel index / segment merges. Missing is the machinery for > updating a field (or fields) in a parallel slave index: to do this the >

Re: incremental document field update

2010-01-17 Thread Babak Farhang
Thanks Mike! This is pretty cool.. So LUCENE-1879 takes care of aligning (syncing) doc-ids across parallel index / segment merges. Missing is the machinery for updating a field (or fields) in a parallel slave index: to do this the appropriate segment in the slave index must somehow be rewritten.

Re: incremental document field update

2010-01-14 Thread Michael McCandless
Parallel incremental indexing (http://issues.apache.org/jira/browse/LUCENE-1879) is one way to solve this. Mike On Thu, Jan 14, 2010 at 4:27 AM, Babak Farhang wrote: >> Reading that trail, I wish the original poster gave up on his idea ( > > Err, that should have read.. > > "Reading that trail,

Re: incremental document field update

2010-01-14 Thread Babak Farhang
> Reading that trail, I wish the original poster gave up on his idea ( Err, that should have read.. "Reading that trail, I wish the original poster hadn't given up on his idea" On Thu, Jan 14, 2010 at 2:23 AM, Babak Farhang wrote: > Hi, > > I've been thinking about how to update a single field