[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

Michael McCandless (JIRA) Tue, 19 Jun 2012 14:36:46 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397082#comment-13397082
 ]


Michael McCandless commented on LUCENE-4069:
--------------------------------------------

Results from last patch:
{noformat}
                Task    QPS base StdDev base   QPS bloomStdDev bloom      Pct 
diff
              IntNRQ       11.35        1.27       10.14        0.62  -24% -    
6%
              Fuzzy1      108.52        3.34      101.82        2.90  -11% -    
0%
             Prefix3       64.87        2.17       61.55        1.61  -10% -    
0%
            Wildcard       43.18        1.74       41.33        1.17  -10% -    
2%
              Fuzzy2       41.76        1.40       40.05        1.00   -9% -    
1%
                Term      151.71        4.38      147.24        4.42   -8% -    
2%
            SpanNear        5.23        0.09        5.11        0.12   -6% -    
1%
           OrHighMed       12.60        0.88       12.34        0.48  -11% -    
9%
        SloppyPhrase        8.25        0.20        8.09        0.07   -5% -    
1%
        TermBGroup1M       69.98        0.68       68.80        1.13   -4% -    
0%
          OrHighHigh       10.06        0.66        9.93        0.39  -11% -    
9%
              Phrase       12.73        0.30       12.57        0.35   -6% -    
3%
         TermGroup1M       35.44        0.42       35.08        0.67   -4% -    
2%
          AndHighMed       63.40        2.27       62.90        1.11   -5% -    
4%
             Respell       93.11        3.70       92.81        2.33   -6% -    
6%
      TermBGroup1M1P       50.93        1.53       50.96        1.75   -6% -    
6%
         AndHighHigh       15.86        0.71       15.93        0.27   -5% -    
6%
            PKLookup      127.44        2.15      134.85        8.68   -2% -   
14%
{noformat}

Looks like FuzzyN/Respell is good again ... PKLookup is a bit faster ... the 
rest is likely noise.
                
> Segment-level Bloom filters for a 2 x speed up on rare term searches
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4069
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 3.6, 4.0
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 4.0, 3.6.1
>
>         Attachments: BloomFilterPostingsBranch4x.patch, 
> MHBloomFilterOn3.6Branch.patch, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

Reply via email to