Robert Muir created LUCENE-5750:
-----------------------------------
Summary: Speed up monotonic address access in BINARY/SORTED_SET
Key: LUCENE-5750
URL: https://issues.apache.org/jira/browse/LUCENE-5750
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Attachments: LUCENE-5750.patch
I found this while exploring LUCENE-5748, but it currently applies to both
variable length BINARY and SORTED_SET, so I think its worth it to do here first.
I think its just a holdover from before MonotonicBlockPackedWriter that to
access element N we currently do:
{code}
startOffset = (docID == 0 ? 0 : ordIndex.get(docID-1));
endOffset = ordIndex.get(docID);
{code}
Thats because previously we didnt have packed ints that supported >
Integer.MAX_VALUE elements. But thats been fixed for a long time. If we just
write a 0 first and do this:
{code}
startOffset = ordIndex.get(docID);
endOffset = ordIndex.get(docID+1);
{code}
The access is then much faster. For sorting i see around 20% improvement. We
don't lose any compression because we should assume the delta from 0 .. 1 is
similar to any other gap N .. N+1
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]