[
https://issues.apache.org/jira/browse/LUCENE-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026708#comment-14026708
]
Michael McCandless commented on LUCENE-5750:
--------------------------------------------
+1
> Speed up monotonic address access in BINARY/SORTED_SET
> ------------------------------------------------------
>
> Key: LUCENE-5750
> URL: https://issues.apache.org/jira/browse/LUCENE-5750
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-5750.patch
>
>
> I found this while exploring LUCENE-5748, but it currently applies to both
> variable length BINARY and SORTED_SET, so I think its worth it to do here
> first.
> I think its just a holdover from before MonotonicBlockPackedWriter that to
> access element N we currently do:
> {code}
> startOffset = (docID == 0 ? 0 : ordIndex.get(docID-1));
> endOffset = ordIndex.get(docID);
> {code}
> Thats because previously we didnt have packed ints that supported >
> Integer.MAX_VALUE elements. But thats been fixed for a long time. If we just
> write a 0 first and do this:
> {code}
> startOffset = ordIndex.get(docID);
> endOffset = ordIndex.get(docID+1);
> {code}
> The access is then much faster. For sorting i see around 20% improvement. We
> don't lose any compression because we should assume the delta from 0 .. 1 is
> similar to any other gap N .. N+1
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]