[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430439#comment-13430439
]
Michael McCandless commented on LUCENE-3892:
--------------------------------------------
Hmm also not great results on my env (base=Block, packed=BlockPacked), based on
current branch head:
{noformat}
Task QPS base StdDev base QPS packedStdDev packed Pct
diff
AndHighMed 59.23 3.07 34.24 0.69 -46% -
-37%
AndHighLow 576.35 21.09 349.57 7.44 -42% -
-35%
AndHighHigh 23.83 0.72 15.53 0.29 -37% -
-31%
MedPhrase 12.56 0.20 8.87 0.31 -32% -
-25%
LowPhrase 20.52 0.21 14.89 0.43 -30% -
-24%
MedSloppyPhrase 7.46 0.20 5.41 0.13 -31% -
-23%
LowSloppyPhrase 6.73 0.18 4.92 0.12 -30% -
-22%
LowSpanNear 7.63 0.32 5.65 0.19 -31% -
-20%
HighSloppyPhrase 1.90 0.08 1.52 0.05 -25% -
-14%
HighPhrase 1.57 0.04 1.26 0.08 -26% -
-12%
MedSpanNear 3.84 0.18 3.14 0.14 -25% -
-10%
LowTerm 433.22 34.89 364.03 15.63 -25% -
-4%
HighSpanNear 1.40 0.07 1.19 0.06 -23% -
-6%
IntNRQ 9.50 0.43 8.09 0.92 -27% -
0%
HighTerm 29.47 4.89 25.46 2.35 -32% -
13%
MedTerm 148.76 21.53 129.17 9.59 -29% -
9%
Prefix3 72.81 2.20 63.65 3.88 -20% -
-4%
Wildcard 44.79 0.92 39.91 2.20 -17% -
-4%
OrHighMed 16.81 0.48 15.28 0.21 -12% -
-5%
OrHighLow 21.85 0.67 20.03 0.32 -12% -
-3%
OrHighHigh 8.49 0.28 7.80 0.14 -12% -
-3%
Fuzzy1 61.33 1.95 58.91 1.11 -8% -
1%
PKLookup 156.87 1.14 154.08 2.13 -3% -
0%
Respell 58.72 1.57 59.60 1.28 -3% -
6%
Fuzzy2 60.98 2.34 62.03 1.89 -5% -
9%
{noformat}
I think optimizing the all-values-same case is actually quite important for
payloads (but luceneutil doesn't test this today).
But, curiously, my BlockPacked index is a bit smaller than my Block index (4643
MB vs 4650 MB).
I do wonder about using long[] to hold the uncompressed results (they only need
int[]); that's one big difference still. Also: I'd love to see how
acceptableOverheadRatio > 0 does ... (and, using PACKED_SINGLE_BLOCK ... we'd
have to put a bit in the header to record the format).
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]