Re: Why Two Levels of Indirection in BytesRefHash class ?

Toke Eskildsen Sun, 11 Dec 2016 04:05:33 -0800

Adrien Grand <jpou...@gmail.com> wrote:
> That would work if you are only interested in using BytesRefHash as a hash
> set for byte[]. However these incremental ids are useful if you want to
> associate data with each byte[]: you can create parallel arrays and use the
> ids returned by the BytesRefHash as indices in these arrays.


That could be solved by prepending the stored BytesRef with the counter value, 
then using a fixed +4 delta to the offset to get the BytesRef. Same space 
requirements as now, but with one less level of indirection meaning less 
CPU-cache invalidation.

However, this removes the nice property of providing insertion-order 
iterability of the DocValues in the structure, so it would be quite a change to 
current code.


One optimization, while we are on the subject, is to exploit the indirection. 
As the bytesStarts are monotonic incremental offsets in the ByteBlockPool, 
there is no need to store the length of the BytesRefs. They can be calculated 
with bytesStarts[id+1] - bytesStarts[id]. This saves 1-2 bytes per entry and 
upholds memory locality, so it should have the same performance as now (needs 
to be tested of course).

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Why Two Levels of Indirection in BytesRefHash class ?

Reply via email to