Re: Incremental Field Updates

Shai Erera Wed, 02 Jul 2014 07:00:30 -0700

Using BinaryDocValues is not recommended for all scenarios. It is a
"catchall" alternative to the other DocValues types. I would not use it
unless it makes sense for your application, even if it means that you need
to re-index a document in order to update a single field.


DocValues are not good for "search" - by search I assume you mean take a
query such as "apache AND lucene" and find all documents which contain both
terms under the same field. They are good for sorting and faceting though.

So I guess the answer to your question is "it depends" (it always is!) - I
would use DocValues for sorting and faceting, but not for regular search
queries. And I would use BinaryDocValues only when the other DocValues
types don't match.

Also, note that the current field-level update of DocValues is not always
better than re-indexing the document, you can read here for more details:
http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

Shai


On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode <
[email protected]> wrote:

> Hi Shai,
>
> So one follow-up question.
>
> Assume that my use case is to have approx. ~50M documents indexed with
> each document having about ~10-15 indexed but not stored fields. These
> fields will never change, but there are another ~5-6 fields that will
> change and will continue to change after the index is written. These ~5-6
> fields may also be multivalued. The size of this index turns out to be
> ~120GB.
>
> In this case, I would like to sort or facet or search on these ~5-6
> fields. Which approach do you suggest? Should I use BinaryDocValues and
> update using IW or use either a ParallelReader/Join query.
>
> -----------------------
> Thanks n Regards,
> Sandeep Ramesh Khanzode
>
>
> On Tuesday, July 1, 2014 9:53 PM, Shai Erera <[email protected]> wrote:
>
>
>
> Except that Lucene now offers efficient numeric and binary DocValues
> updates. See IndexWriter.updateNumeric/Binary...
>
> On Jul 1, 2014 5:51 PM, "Erick Erickson" <[email protected]> wrote:
>
> > This JIRA is "complicated", don't really expect it in 4.9 as it's
> > been hanging around for quite a while. Everyone would like this,
> > but it's not easy.
> >
> > Atomic updates will work, but you have to stored="true" for all
> > source fields. Under the covers this actually reads the document
> > out of the stored fields, deletes the old one and adds it
> > over again.
> >
> > FWIW,
> > Erick
> >
> > On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
> > <[email protected]> wrote:
> > > Hi,
> > >
> > > I wanted to know of the best approach to follow if a few fields in my
> > indexed documents are changing at run time (after index and before or
> > during search), but a majority of them are created at index time.
> > >
> > > I could see the JIRA given below but it is scheduled for Lucene 4.9, I
> > believe.
> > >
> > > There are a few other approaches, like maintaining a separate index for
> > changing fields and use either a parallelreader or use a Join.
> > >
> > > Can everyone share their experience for this scenario on how it is
> > handled in your systems? Thanks,
> > >
> > > [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
> > JIRA
> > >
> > >
> > >  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
> > JIRA
> > > Shai and I would like to start working on the proposal to Incremental
> > Field Updates outlined here (
> http://markmail.org/message/zhrdxxpfk6qvdaex
> > ).
> > > View on issues.apache.org Preview by Yahoo
> > >
> > >
> > > -----------------------
> > > Thanks n Regards,
> > > Sandeep Ramesh Khanzode
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

Re: Incremental Field Updates

Reply via email to