Re: Iterating BinaryDocValues

Mikhail Khludnev Tue, 07 Jan 2014 10:18:17 -0800

Joel,

I tried to hack it straightforwardly, but found no free gain there. The
only attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right
now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?


Have a good hack!


On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <[email protected]> wrote:

>
> Hi,
>
> I'm looking for a faster way to perform large scale docId -> bytesRef
> lookups for BinaryDocValues.
>
> I'm finding that I can't get the performance that I need from the random
> access seek in the BinaryDocValues interface.
>
> I'm wondering if sequentially scanning the docValues would be a faster
> approach. I have a BitSet of matching docs, so if I sequentially moved
> through the docValues I could test each one against that bitset.
>
> Wondering if that approach would be faster for bulk extracts and how
> tricky it would be to add an iterator to the BinaryDocValues interface?
>
> Thanks,
> Joel
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <[email protected]>

Re: Iterating BinaryDocValues

Reply via email to