[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431882#comment-13431882
]
Han Jiang commented on LUCENE-3892:
-----------------------------------
I revived the PFor codes, and test it agains BlockFor and BlockPacked:
BlockFor as base:
{noformat}
Task QPS base StdDev base QPS pfor StdDev pfor Pct
diff
AndHighHigh 121.54 1.37 116.69 2.03 -6% -
-1%
AndHighLow 2286.36 14.19 2212.92 11.48 -4% -
-2%
AndHighMed 322.97 7.37 294.19 4.76 -12% -
-5%
Fuzzy1 85.56 1.46 87.97 3.27 -2% -
8%
Fuzzy2 30.94 0.56 32.16 1.34 -2% -
10%
HighPhrase 9.39 0.38 9.02 0.45 -12% -
5%
HighSloppyPhrase 5.38 0.08 5.24 0.12 -6% -
1%
HighSpanNear 10.38 0.39 9.92 0.08 -8% -
0%
HighTerm 180.30 6.87 172.83 6.26 -11% -
3%
IntNRQ 62.01 3.73 60.89 3.54 -12% -
10%
LowPhrase 42.44 0.67 38.73 0.89 -12% -
-5%
LowSloppyPhrase 62.82 0.79 56.79 0.43 -11% -
-7%
LowSpanNear 81.79 2.00 74.10 1.13 -12% -
-5%
LowTerm 1763.95 39.62 1721.30 34.22 -6% -
1%
MedPhrase 27.87 0.59 25.82 0.74 -11% -
-2%
MedSloppyPhrase 32.15 0.41 29.91 0.31 -9% -
-4%
MedSpanNear 23.48 0.71 22.00 0.05 -9% -
-3%
MedTerm 662.11 24.22 638.81 19.31 -9% -
3%
OrHighHigh 26.82 0.47 27.14 1.93 -7% -
10%
OrHighLow 152.40 3.54 156.58 11.11 -6% -
12%
OrHighMed 103.20 2.26 105.84 7.55 -6% -
12%
PKLookup 216.38 4.32 219.32 2.59 -1% -
4%
Prefix3 169.89 4.97 163.82 3.34 -8% -
1%
Respell 83.23 1.44 86.20 3.00 -1% -
9%
Wildcard 155.81 2.79 152.30 2.54 -5% -
1%
{noformat}
BlockPacked as base:
{noformat}
Task QPS base StdDev base QPS pfor StdDev pfor Pct
diff
AndHighHigh 122.94 3.43 116.24 1.90 -9% -
-1%
AndHighLow 2294.32 58.32 2199.14 31.97 -7% -
0%
AndHighMed 325.55 12.44 290.20 3.80 -15% -
-6%
Fuzzy1 88.33 1.84 87.86 2.54 -5% -
4%
Fuzzy2 31.92 0.80 32.00 0.92 -5% -
5%
HighPhrase 9.73 0.47 9.04 0.29 -14% -
0%
HighSloppyPhrase 5.49 0.19 5.16 0.03 -9% -
-1%
HighSpanNear 10.93 0.23 9.90 0.09 -12% -
-6%
HighTerm 178.31 6.37 171.06 6.14 -10% -
3%
IntNRQ 60.87 4.71 62.38 5.49 -13% -
20%
LowPhrase 44.97 1.18 38.36 1.01 -19% -
-10%
LowSloppyPhrase 69.61 1.19 55.90 1.39 -23% -
-16%
LowSpanNear 88.50 0.66 72.80 2.23 -20% -
-14%
LowTerm 1769.84 32.66 1717.02 39.75 -6% -
1%
MedPhrase 28.88 0.84 25.57 0.68 -16% -
-6%
MedSloppyPhrase 34.47 0.50 29.29 0.54 -17% -
-12%
MedSpanNear 24.88 0.32 21.69 0.38 -15% -
-10%
MedTerm 667.95 21.61 633.73 22.17 -11% -
1%
OrHighHigh 27.96 1.29 26.82 0.81 -11% -
3%
OrHighLow 158.62 5.82 155.08 5.05 -8% -
4%
OrHighMed 107.16 4.19 104.81 3.17 -8% -
4%
PKLookup 217.22 1.86 216.83 1.87 -1% -
1%
Prefix3 167.32 6.72 166.12 6.53 -8% -
7%
Respell 85.25 2.27 85.85 2.16 -4% -
6%
Wildcard 156.24 5.69 154.63 3.02 -6% -
4%
{noformat}
Current PFor impl only saves 1.8% against For, but get quite some perf loss.
Let's use the Packed version!
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch, LUCENE-3892-non-specialized.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]