Joel, I tried to hack it straightforwardly, but found no free gain there. The only attempt I can suggest is to try to reuse bytes in https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right now it allocates bytes every time, which beside of GC can also impact memory access locality. Could you try fix memory waste and repeat performance test?
Have a good hack! On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <[email protected]> wrote: > > Hi, > > I'm looking for a faster way to perform large scale docId -> bytesRef > lookups for BinaryDocValues. > > I'm finding that I can't get the performance that I need from the random > access seek in the BinaryDocValues interface. > > I'm wondering if sequentially scanning the docValues would be a faster > approach. I have a BitSet of matching docs, so if I sequentially moved > through the docValues I could test each one against that bitset. > > Wondering if that approach would be faster for bulk extracts and how > tricky it would be to add an iterator to the BinaryDocValues interface? > > Thanks, > Joel > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <[email protected]>
