Robert Muir created LUCENE-5675:
-----------------------------------

             Summary: "ID postings format"
                 Key: LUCENE-5675
                 URL: https://issues.apache.org/jira/browse/LUCENE-5675
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: Robert Muir


Today the primary key lookup in lucene is not that great for systems like solr 
and elasticsearch that have versioning in front of IndexWriter.

To some extend BlockTree can "sometimes" help avoid seeks by telling you the 
term does not exist for a segment. But this technique (based on FST prefix) is 
fragile. The only other choice today is bloom filters, which use up huge 
amounts of memory.

I don't think we are using everything we know: particularly the version 
semantics.

Instead, if the FST for the terms index used an algebra that represents the max 
version for any subtree, we might be able to answer that there is no term T 
with version < V in that segment very efficiently.

Also ID fields dont need postings lists, they dont need stats like 
docfreq/totaltermfreq, etc this stuff is all implicit. 

As far as API, i think for users to provide "IDs with versions" to such a PF, a 
start would to set a payload or whatever on the term field to get it thru 
indexwriter to the codec. And a "consumer" of the codec can just cast the Terms 
to a subclass that exposes the FST to do this version check efficiently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to