Re: Performance issues with the default field compression

Alex Parvulescu Thu, 10 Apr 2014 02:20:08 -0700

Hi Adrien,

Thanks for clarifying!
We're going to go the custom codec & custom visitor route.


best,
alex



On Wed, Apr 9, 2014 at 10:38 PM, Adrien Grand <jpou...@gmail.com> wrote:

> Hi Alex,
>
> Indeed, one or several (the number depends on the size of your
> documents) documents need to be fully decompressed in order to read a
> single field of a single document.
>
> Regarding the stored fields visitor, the default one doesn't return
> STOP when the field has been found because other fields with the same
> name might be stored further in the stream of stored fields (in case
> of a multivalued field). If you know that you have a single field
> value, you can write your own field visitor that will return STOP
> after the first value has been read. As you noted, this probably has
> less impact on performance than the first point that you raised.
>
> The default stored fields visitor is rather targeted at large indices
> where compression helps save disk space and can also make stored
> fields retrieval faster since a larger portion of the stored fields
> can fit in the filesystem cache. However, if your index is small and
> fully fits in the filesystem cache, this stored fields format might
> indeed have non-negligible overhead.
>
>
> On Wed, Apr 9, 2014 at 9:17 PM, Alex Parvulescu
> <alexparvule...@apache.org> wrote:
> > Hi,
> >
> > I was investigating some performance issues and during profiling I
> noticed
> > that there is a significant amount of time being spent decompressing
> fields
> > which are unrelated to the actual field I'm trying to load from the
> lucene
> > documents. In our benchmark doing mostly a simple full-test search, 40%
> of
> > the time was lost in these parts.
> >
> > My code does the following: reader.document(id,
> Set(":path")).get(":path"),
> > and this is where the fun begins :)
> > I noticed 2 things, please excuse the ignorance if some of the things I
> > write here are not 100% correct:
> >
> >  - all the fields in the document are being decompressed prior to
> applying
> > the field filter. We've noticed this because we have a lot of content
> > stored in the index, so there is an important time lost around
> > decompressing junk. At one point I tried adding the field first, thinking
> > this will save some work, but it doesn't look like it's doing much.
> > Reference code, the visitor is only used at the very end. [0]
> >
> >  - second, and probably of a smaller impact would be to have the
> > DocumentStoredFieldVisitor return STOP when there are no more fields in
> the
> > visitor to visit. I only have one, and it looks like it will #skip
> through
> > a bunch of other stuff before finishing a document. [1]
> >
> > thanks in advance,
> > alex
> >
> >
> > [0]
> >
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressingStoredFieldsReader.java?view=markup#l364
> >
> > [1]
> >
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/document/DocumentStoredFieldVisitor.java?view=markup#l100
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Performance issues with the default field compression

Reply via email to