[jira] [Commented] (LUCENE-7462) Faster search APIs for doc values

Michael McCandless (JIRA) Wed, 19 Oct 2016 06:50:33 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588818#comment-15588818
 ]


Michael McCandless commented on LUCENE-7462:
--------------------------------------------

I also see good speedups to the otherwise "lightweight" queries:

{noformat}
Report after iter 19:
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Prefix3       43.40      (5.2%)       42.48      (8.8%)   
-2.1% ( -15% -   12%)
                  IntNRQ       10.05      (8.8%)        9.87     (10.5%)   
-1.8% ( -19% -   19%)
            HighSpanNear       19.38      (5.2%)       19.14      (6.6%)   
-1.2% ( -12% -   11%)
               LowPhrase       19.34      (1.9%)       19.21      (3.6%)   
-0.7% (  -6% -    4%)
                PKLookup      350.45      (1.3%)      348.51      (2.8%)   
-0.6% (  -4% -    3%)
             MedSpanNear       41.12      (4.5%)       40.98      (4.7%)   
-0.4% (  -9% -    9%)
                  Fuzzy1      115.35      (2.3%)      115.06      (2.8%)   
-0.2% (  -5% -    5%)
             LowSpanNear       85.93      (2.1%)       85.78      (2.3%)   
-0.2% (  -4% -    4%)
               MedPhrase       77.08      (2.7%)       77.03      (2.9%)   
-0.1% (  -5% -    5%)
                 Respell       62.22      (2.2%)       62.26      (1.4%)    
0.1% (  -3% -    3%)
                Wildcard       37.39      (4.4%)       37.43      (5.8%)    
0.1% (  -9% -   10%)
                  Fuzzy2      100.18      (2.0%)      100.31      (1.6%)    
0.1% (  -3% -    3%)
         LowSloppyPhrase       14.75      (4.9%)       14.79      (4.2%)    
0.2% (  -8% -    9%)
              HighPhrase        3.81      (5.2%)        3.82      (6.2%)    
0.4% ( -10% -   12%)
              AndHighLow      912.50      (2.5%)      916.11      (3.8%)    
0.4% (  -5% -    6%)
            OrNotHighLow      957.24      (2.5%)      963.91      (2.7%)    
0.7% (  -4% -    6%)
         MedSloppyPhrase       48.46      (4.8%)       48.80      (4.3%)    
0.7% (  -8% -   10%)
              AndHighMed       46.40      (1.7%)       46.87      (1.6%)    
1.0% (  -2% -    4%)
             AndHighHigh       43.36      (1.9%)       43.80      (1.9%)    
1.0% (  -2% -    4%)
                 LowTerm      449.83      (2.5%)      454.76      (5.1%)    
1.1% (  -6% -    8%)
        HighSloppyPhrase       16.13      (6.8%)       16.34      (6.3%)    
1.3% ( -11% -   15%)
            OrNotHighMed       98.19      (3.2%)       99.56      (3.1%)    
1.4% (  -4% -    7%)
           OrNotHighHigh       21.69      (4.5%)       22.16      (4.8%)    
2.2% (  -6% -   12%)
           OrHighNotHigh       18.16      (7.7%)       18.75      (8.0%)    
3.2% ( -11% -   20%)
            OrHighNotMed       61.81      (9.4%)       64.27      (9.5%)    
4.0% ( -13% -   25%)
                 MedTerm      123.87      (4.5%)      129.22      (3.3%)    
4.3% (  -3% -   12%)
            OrHighNotLow       25.19     (11.2%)       26.28     (11.5%)    
4.4% ( -16% -   30%)
              OrHighHigh       12.29      (7.4%)       12.96      (8.7%)    
5.5% (  -9% -   23%)
               OrHighMed       12.36      (7.4%)       13.09      (8.5%)    
5.9% (  -9% -   23%)
                HighTerm       38.51      (5.7%)       40.80      (4.4%)    
5.9% (  -3% -   17%)
               OrHighLow       19.42      (8.6%)       20.66      (9.7%)    
6.4% ( -10% -   26%)
{noformat}

> Faster search APIs for doc values
> ---------------------------------
>
>                 Key: LUCENE-7462
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7462
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0)
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7462-advanceExact.patch
>
>
> While the iterator API helps deal with sparse doc values more efficiently, it 
> also makes search-time operations more costly. For instance, the old 
> random-access API allowed to compute facets on a given segment without any 
> conditionals, by just incrementing the counter at index {{ordinal+1}} while 
> the new API requires to advance the iterator if necessary and then check 
> whether it is exactly on the right document or not.
> Since it is very common for fields to exist across most documents, I suspect 
> codecs will keep an internal structure that is similar to the current codec 
> in the dense case, by having a dense representation of the data and just 
> making the iterator skip over the minority of documents that do not have a 
> value.
> I suggest that we add APIs that make things cheaper at search time. For 
> instance in the case of SORTED doc values, it could look like 
> {{LegacySortedDocValues}} with the additional restriction that documents can 
> only be consumed in order. Codecs that can implement this API efficiently 
> would hide it behind a {{SortedDocValues}} adapter, and then at search time 
> facets and comparators (which liked the {{LegacySortedDocValues}} API better) 
> would either unwrap or hide the SortedDocValues they got behind a more 
> random-access API (which would only happen in the truly sparse case if the 
> codec optimizes the dense case).
> One challenge is that we already use the same idea for hiding single-valued 
> impls behind multi-valued impls, so we would need to enforce the order in 
> which the wrapping needs to happen. At first sight, it seems that it would be 
> best to do the single-value-behind-multi-value-API wrapping above the 
> random-access-behind-iterator-API wrapping. The complexity of 
> wrapping/unwrapping in the right order could be contained in the 
> {{DocValues}} helper class.
> I think this change would also simplify search-time consumption of doc 
> values, which currently needs to spend several lines of code positioning the 
> iterator everytime it needs to do something interesting with doc values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7462) Faster search APIs for doc values

Reply via email to