Martijn van Groningen created LUCENE-6496:
---------------------------------------------
Summary: Updatable OrdinalMap
Key: LUCENE-6496
URL: https://issues.apache.org/jira/browse/LUCENE-6496
Project: Lucene - Core
Issue Type: Bug
Reporter: Martijn van Groningen
Priority: Minor
The MultiDocValues.OrdinalMap that we have to today requires a rebuild on each
reopen. When the OrdinalMap has been built, lookups are fast and the logic is
simple. Many time rebuilding the the OrdinalMap isn't even an issue, because
for low to medium cardinality fields the rebuilding doesn't take that much
time. The time required to build the OrdinalMap depends on the number of unique
terms in a field.
For high cardinality fields (lets say >= 1M terms) rebuilding the OrdinalMap
can take some time to complete. This can then impact the NRT aspect of many
applications (facets may rely on ordinal maps to be rebuilt before a new search
can happen after the reopen).
I like to explore a different OrdinalMap implementation that doesn't need to be
rebuilt on each reopen. There are simple improvements that can made:
* Lets say docs have only been marked as deleted, then we basically reuse the
OrdinalMap that has already been built.
* If no new terms have been introduced we can just add segment ordinal to
global ordinal lookups to the OrdinalMap that has already been built.
I think a complete OrdinalMap rebuild is inevitable, but it would be great if
we could rebuild on a flush / merge instead of on each reopen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]