Hi Adrien, Thanks for clarifying! We're going to go the custom codec & custom visitor route.
best, alex On Wed, Apr 9, 2014 at 10:38 PM, Adrien Grand <jpou...@gmail.com> wrote: > Hi Alex, > > Indeed, one or several (the number depends on the size of your > documents) documents need to be fully decompressed in order to read a > single field of a single document. > > Regarding the stored fields visitor, the default one doesn't return > STOP when the field has been found because other fields with the same > name might be stored further in the stream of stored fields (in case > of a multivalued field). If you know that you have a single field > value, you can write your own field visitor that will return STOP > after the first value has been read. As you noted, this probably has > less impact on performance than the first point that you raised. > > The default stored fields visitor is rather targeted at large indices > where compression helps save disk space and can also make stored > fields retrieval faster since a larger portion of the stored fields > can fit in the filesystem cache. However, if your index is small and > fully fits in the filesystem cache, this stored fields format might > indeed have non-negligible overhead. > > > On Wed, Apr 9, 2014 at 9:17 PM, Alex Parvulescu > <alexparvule...@apache.org> wrote: > > Hi, > > > > I was investigating some performance issues and during profiling I > noticed > > that there is a significant amount of time being spent decompressing > fields > > which are unrelated to the actual field I'm trying to load from the > lucene > > documents. In our benchmark doing mostly a simple full-test search, 40% > of > > the time was lost in these parts. > > > > My code does the following: reader.document(id, > Set(":path")).get(":path"), > > and this is where the fun begins :) > > I noticed 2 things, please excuse the ignorance if some of the things I > > write here are not 100% correct: > > > > - all the fields in the document are being decompressed prior to > applying > > the field filter. We've noticed this because we have a lot of content > > stored in the index, so there is an important time lost around > > decompressing junk. At one point I tried adding the field first, thinking > > this will save some work, but it doesn't look like it's doing much. > > Reference code, the visitor is only used at the very end. [0] > > > > - second, and probably of a smaller impact would be to have the > > DocumentStoredFieldVisitor return STOP when there are no more fields in > the > > visitor to visit. I only have one, and it looks like it will #skip > through > > a bunch of other stuff before finishing a document. [1] > > > > thanks in advance, > > alex > > > > > > [0] > > > https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressingStoredFieldsReader.java?view=markup#l364 > > > > [1] > > > https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/document/DocumentStoredFieldVisitor.java?view=markup#l100 > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >