Thanks Mike! This is pretty cool..
So LUCENE-1879 takes care of aligning (syncing) doc-ids across
parallel index / segment merges. Missing is the machinery for
updating a field (or fields) in a parallel slave index: to do this the
appropriate segment in the slave index must somehow be rewritten.
Hi
I remember a while ago a discussion around the efficiency of TermDocs.seek
and how it is inefficient and it's better to call IndexReader.termDocs
instead (actually someone was proposing to remove seek entirely from the
interface because of that). I've looked at FieldCacheImpl's
ByteCache.create
On Sun, Jan 17, 2010 at 4:33 AM, Babak Farhang wrote:
> Thanks Mike! This is pretty cool..
>
> So LUCENE-1879 takes care of aligning (syncing) doc-ids across
> parallel index / segment merges. Missing is the machinery for
> updating a field (or fields) in a parallel slave index: to do this the
>
On Sun, Jan 17, 2010 at 5:01 AM, Shai Erera wrote:
> I remember a while ago a discussion around the efficiency of TermDocs.seek
> and how it is inefficient and it's better to call IndexReader.termDocs
> instead (actually someone was proposing to remove seek entirely from the
> interface because o
Oh right, I confused TermEnum.skipTo w/ TermDocs.seek. Thanks for reminding
me that.
BTW, the flex implementation looks really useful. I like it that I won't
need to compare the field anymore. Looking forward to it.
Thanks
Shai
On Sun, Jan 17, 2010 at 12:24 PM, Michael McCandless <
luc...@mikemc
> So the idea is, I can change the field for only a few docs in a
> massive index, and the amount of "work" done, and bytes written, is in
> proportion only to how many docs were changed?
Exactly. We append auxiliary data to the parallel segment and
delay rewriting the segment to when it'll be mer
On Sun, Jan 17, 2010 at 7:45 AM, Babak Farhang wrote:
>> So the idea is, I can change the field for only a few docs in a
>> massive index, and the amount of "work" done, and bytes written, is in
>> proportion only to how many docs were changed?
>
> Exactly. We append auxiliary data to the parallel
> Got it. You'd presumably have to add a generation to this file, so
> that multiple sessions of updates + commit write to private files
> ("write once")? And then the reader merges all of them.
Actually, I hadn't considered concurrent update/commit semantics;
I was thinking more along a single