[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431485#comment-13431485
]
Michael McCandless commented on LUCENE-3892:
--------------------------------------------
I also see (smaller) gains with BlockPacked vs Block (this is 10M doc index):
{noformat}
Task QPS base StdDev base QPS packedStdDev packed Pct
diff
AndHighMed 69.19 0.53 66.43 0.63 -5% -
-2%
Fuzzy2 63.71 1.24 62.25 1.58 -6% -
2%
Respell 62.69 1.41 61.53 1.47 -6% -
2%
IntNRQ 11.86 0.43 11.73 0.03 -4% -
2%
Fuzzy1 75.48 1.21 75.05 1.52 -4% -
3%
Wildcard 53.23 0.63 52.96 0.25 -2% -
1%
MedSpanNear 4.88 0.16 4.88 0.11 -5% -
5%
PKLookup 191.48 2.84 191.62 3.98 -3% -
3%
HighTerm 35.71 0.63 35.91 0.06 -1% -
2%
Prefix3 83.14 1.34 83.83 0.49 -1% -
3%
LowTerm 513.35 0.77 517.92 1.50 0% -
1%
HighSpanNear 1.70 0.06 1.71 0.03 -4% -
6%
AndHighHigh 23.45 0.09 23.69 0.10 0% -
1%
OrHighLow 27.27 1.06 27.59 0.15 -3% -
5%
OrHighMed 23.61 0.92 23.89 0.17 -3% -
6%
OrHighHigh 11.42 0.44 11.59 0.12 -3% -
6%
MedSloppyPhrase 6.84 0.17 6.95 0.23 -4% -
7%
LowPhrase 22.02 0.39 22.43 0.15 0% -
4%
MedTerm 196.76 3.01 200.62 0.33 0% -
3%
LowSpanNear 9.60 0.24 9.82 0.31 -3% -
8%
MedPhrase 13.08 0.30 13.41 0.12 0% -
5%
LowSloppyPhrase 7.55 0.21 7.77 0.27 -3% -
9%
AndHighLow 649.84 18.26 669.08 6.63 0% -
6%
HighSloppyPhrase 1.98 0.08 2.04 0.09 -4% -
12%
HighPhrase 1.76 0.11 1.96 0.10 0% -
24%
{noformat}
The index is 4669 MB with Block and 4790 with BlockPacked = ~2.6%
larger ... seems worth it! Apps can always tune the 20% too.
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]