[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430439#comment-13430439
 ] 

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Hmm also not great results on my env (base=Block, packed=BlockPacked), based on 
current branch head:

{noformat}
                Task    QPS base StdDev base  QPS packedStdDev packed      Pct 
diff
          AndHighMed       59.23        3.07       34.24        0.69  -46% -  
-37%
          AndHighLow      576.35       21.09      349.57        7.44  -42% -  
-35%
         AndHighHigh       23.83        0.72       15.53        0.29  -37% -  
-31%
           MedPhrase       12.56        0.20        8.87        0.31  -32% -  
-25%
           LowPhrase       20.52        0.21       14.89        0.43  -30% -  
-24%
     MedSloppyPhrase        7.46        0.20        5.41        0.13  -31% -  
-23%
     LowSloppyPhrase        6.73        0.18        4.92        0.12  -30% -  
-22%
         LowSpanNear        7.63        0.32        5.65        0.19  -31% -  
-20%
    HighSloppyPhrase        1.90        0.08        1.52        0.05  -25% -  
-14%
          HighPhrase        1.57        0.04        1.26        0.08  -26% -  
-12%
         MedSpanNear        3.84        0.18        3.14        0.14  -25% -  
-10%
             LowTerm      433.22       34.89      364.03       15.63  -25% -   
-4%
        HighSpanNear        1.40        0.07        1.19        0.06  -23% -   
-6%
              IntNRQ        9.50        0.43        8.09        0.92  -27% -    
0%
            HighTerm       29.47        4.89       25.46        2.35  -32% -   
13%
             MedTerm      148.76       21.53      129.17        9.59  -29% -    
9%
             Prefix3       72.81        2.20       63.65        3.88  -20% -   
-4%
            Wildcard       44.79        0.92       39.91        2.20  -17% -   
-4%
           OrHighMed       16.81        0.48       15.28        0.21  -12% -   
-5%
           OrHighLow       21.85        0.67       20.03        0.32  -12% -   
-3%
          OrHighHigh        8.49        0.28        7.80        0.14  -12% -   
-3%
              Fuzzy1       61.33        1.95       58.91        1.11   -8% -    
1%
            PKLookup      156.87        1.14      154.08        2.13   -3% -    
0%
             Respell       58.72        1.57       59.60        1.28   -3% -    
6%
              Fuzzy2       60.98        2.34       62.03        1.89   -5% -    
9%
{noformat}

I think optimizing the all-values-same case is actually quite important for 
payloads (but luceneutil doesn't test this today).

But, curiously, my BlockPacked index is a bit smaller than my Block index (4643 
MB vs 4650 MB).

I do wonder about using long[] to hold the uncompressed results (they only need 
int[]); that's one big difference still.  Also: I'd love to see how 
acceptableOverheadRatio > 0 does ... (and, using PACKED_SINGLE_BLOCK ... we'd 
have to put a bit in the header to record the format).
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to