Going sequentially should help, if the pages are not hot (in the OS's IO cache).
You can also use a different DVFormat, e.g. Direct, but this holds all bytes in RAM. Mike McCandless http://blog.mikemccandless.com On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev <[email protected]> wrote: > Joel, > > I tried to hack it straightforwardly, but found no free gain there. The only > attempt I can suggest is to try to reuse bytes in > https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401 > right now it allocates bytes every time, which beside of GC can also impact > memory access locality. Could you try fix memory waste and repeat > performance test? > > Have a good hack! > > > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <[email protected]> wrote: >> >> >> Hi, >> >> I'm looking for a faster way to perform large scale docId -> bytesRef >> lookups for BinaryDocValues. >> >> I'm finding that I can't get the performance that I need from the random >> access seek in the BinaryDocValues interface. >> >> I'm wondering if sequentially scanning the docValues would be a faster >> approach. I have a BitSet of matching docs, so if I sequentially moved >> through the docValues I could test each one against that bitset. >> >> Wondering if that approach would be faster for bulk extracts and how >> tricky it would be to add an iterator to the BinaryDocValues interface? >> >> Thanks, >> Joel > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
