Re: Iterating BinaryDocValues

Michael McCandless Tue, 07 Jan 2014 12:55:02 -0800

Going sequentially should help, if the pages are not hot (in the OS's IO cache).


You can also use a different DVFormat, e.g. Direct, but this holds all
bytes in RAM.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
<[email protected]> wrote:
> Joel,
>
> I tried to hack it straightforwardly, but found no free gain there. The only
> attempt I can suggest is to try to reuse bytes in
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> right now it allocates bytes every time, which beside of GC can also impact
> memory access locality. Could you try fix memory waste and repeat
> performance test?
>
> Have a good hack!
>
>
> On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <[email protected]> wrote:
>>
>>
>> Hi,
>>
>> I'm looking for a faster way to perform large scale docId -> bytesRef
>> lookups for BinaryDocValues.
>>
>> I'm finding that I can't get the performance that I need from the random
>> access seek in the BinaryDocValues interface.
>>
>> I'm wondering if sequentially scanning the docValues would be a faster
>> approach. I have a BitSet of matching docs, so if I sequentially moved
>> through the docValues I could test each one against that bitset.
>>
>> Wondering if that approach would be faster for bulk extracts and how
>> tricky it would be to add an iterator to the BinaryDocValues interface?
>>
>> Thanks,
>> Joel
>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Iterating BinaryDocValues

Reply via email to