Martijn van Groningen created LUCENE-6496:
---------------------------------------------

             Summary: Updatable OrdinalMap 
                 Key: LUCENE-6496
                 URL: https://issues.apache.org/jira/browse/LUCENE-6496
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Martijn van Groningen
            Priority: Minor


The MultiDocValues.OrdinalMap that we have to today requires a rebuild on each 
reopen. When the OrdinalMap has been built, lookups are fast and the logic is 
simple. Many time rebuilding the the OrdinalMap isn't even an issue, because 
for low to medium cardinality fields the rebuilding doesn't take that much 
time. The time required to build the OrdinalMap depends on the number of unique 
terms in a field.

For high cardinality fields (lets say >= 1M terms) rebuilding the OrdinalMap 
can take some time to complete. This can then impact the NRT aspect of many 
applications (facets may rely on ordinal maps to be rebuilt before a new search 
can happen after the reopen).

I like to explore a different OrdinalMap implementation that doesn't need to be 
rebuilt on each reopen. There are simple improvements that can made:
* Lets say docs have only been marked as deleted, then we basically reuse the 
OrdinalMap that has already been built. 
* If no new terms have been introduced we can just add segment ordinal to 
global ordinal lookups to the OrdinalMap that has already been built.

I think a complete OrdinalMap rebuild is inevitable, but it would be great if 
we could rebuild on a flush / merge instead of on each reopen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to