[jira] [Commented] (LUCENE-5938) New DocIdSet implementation with random write access

Adrien Grand (JIRA) Thu, 11 Sep 2014 15:43:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130824#comment-14130824
 ]


Adrien Grand commented on LUCENE-5938:
--------------------------------------

The results of the benchmark are a bit disappointing:

{noformat}
                 Prefix3       44.93     (17.0%)       22.59      (2.4%)  
-49.7% ( -59% -  -36%)
                  IntNRQ       16.25     (17.1%)        9.13      (2.2%)  
-43.8% ( -53% -  -29%)
                Wildcard       68.48     (14.8%)       38.63      (4.6%)  
-43.6% ( -54% -  -28%)
{noformat}

I looked at the queries and the explanation is that quite a number of queries 
match a significant portion of the index (more than 1%), which makes 
FixedBitSet faster. I tried to play with some queries individually, and queries 
that match fewer docs are however faster with this set compared to the fixed 
bitset. The cutover seems to be around 1‰ of documents matched.

> New DocIdSet implementation with random write access
> ----------------------------------------------------
>
>                 Key: LUCENE-5938
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5938
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-5938.patch, LUCENE-5938.patch
>
>
> We have a great cost API that is supposed to help make decisions about how to 
> best execute queries. However, due to the fact that several of our filter 
> implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, 
> either we use the cost API and make bad decisions, or need to fall back to 
> heuristics which are not as good such as 
> RandomAccessFilterStrategy.useRandomAccess which decides that random access 
> should be used if the first doc in the set is less than 100.
> On the other hand, we also have some nice compressed and cacheable DocIdSet 
> implementation but we cannot make use of them because TermsFilter requires a 
> DocIdSet that has random write access, and FixedBitSet is the only DocIdSet 
> that we have that supports random access.
> I think it would be nice to replace FixedBitSet in those filters with another 
> DocIdSet that would also support random write access but would have a better 
> cost?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5938) New DocIdSet implementation with random write access

Reply via email to