[
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959078#comment-13959078
]
Michael McCandless commented on LUCENE-5527:
--------------------------------------------
+1 for LeafCollector and the patch.
I tested if there are search performance impacts from this:
{noformat}
Report after iter 10:
Task QPS base StdDev QPS comp StdDev
Pct diff
Respell 49.44 (3.3%) 48.10 (3.7%)
-2.7% ( -9% - 4%)
Fuzzy2 46.74 (3.2%) 45.73 (3.1%)
-2.2% ( -8% - 4%)
Fuzzy1 59.25 (3.7%) 58.08 (3.5%)
-2.0% ( -8% - 5%)
IntNRQ 3.42 (3.8%) 3.40 (3.8%)
-0.7% ( -7% - 7%)
Prefix3 86.67 (2.6%) 86.17 (2.6%)
-0.6% ( -5% - 4%)
LowSloppyPhrase 44.44 (2.3%) 44.42 (2.5%)
-0.1% ( -4% - 4%)
Wildcard 19.08 (3.5%) 19.07 (3.0%)
-0.1% ( -6% - 6%)
AndHighMed 34.38 (1.0%) 34.38 (1.0%)
-0.0% ( -2% - 2%)
LowSpanNear 10.41 (3.1%) 10.41 (2.3%)
0.0% ( -5% - 5%)
HighSloppyPhrase 3.49 (7.9%) 3.49 (6.6%)
0.1% ( -13% - 15%)
AndHighHigh 28.35 (1.1%) 28.39 (1.0%)
0.1% ( -1% - 2%)
MedSpanNear 31.06 (2.8%) 31.12 (2.7%)
0.2% ( -5% - 5%)
AndHighLow 391.44 (2.9%) 392.73 (2.6%)
0.3% ( -5% - 6%)
MedSloppyPhrase 3.54 (5.2%) 3.56 (4.6%)
0.4% ( -8% - 10%)
OrHighMed 26.51 (4.0%) 26.66 (5.7%)
0.6% ( -8% - 10%)
OrHighNotLow 24.84 (4.1%) 24.98 (5.8%)
0.6% ( -9% - 10%)
LowPhrase 13.19 (1.6%) 13.27 (2.3%)
0.6% ( -3% - 4%)
OrHighLow 18.78 (4.1%) 18.91 (5.8%)
0.7% ( -8% - 11%)
OrNotHighHigh 8.87 (4.5%) 8.93 (6.0%)
0.7% ( -9% - 11%)
OrHighNotMed 30.63 (4.1%) 30.85 (5.5%)
0.7% ( -8% - 10%)
OrHighHigh 8.21 (4.1%) 8.27 (5.8%)
0.7% ( -8% - 11%)
MedPhrase 203.10 (6.6%) 204.77 (6.3%)
0.8% ( -11% - 14%)
OrHighNotHigh 11.09 (4.5%) 11.18 (5.9%)
0.8% ( -9% - 11%)
LowTerm 322.74 (5.6%) 325.67 (5.6%)
0.9% ( -9% - 12%)
HighTerm 63.88 (12.8%) 64.55 (12.2%)
1.1% ( -21% - 29%)
MedTerm 100.19 (9.8%) 101.31 (9.5%)
1.1% ( -16% - 22%)
HighSpanNear 8.09 (4.0%) 8.18 (4.9%)
1.1% ( -7% - 10%)
HighPhrase 4.27 (7.1%) 4.32 (6.5%)
1.2% ( -11% - 15%)
OrNotHighMed 19.00 (7.0%) 19.30 (7.6%)
1.6% ( -12% - 17%)
OrNotHighLow 19.63 (7.4%) 19.96 (8.0%)
1.7% ( -12% - 18%)
{noformat}
Looks like just noise!
> Make the Collector API work per-segment
> ---------------------------------------
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but
> there is one of them that I really really like that consists in refactoring
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
> AtomicCollector setNextReader(AtomicReaderContext context) throws
> IOException;
>
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
> void setScorer(Scorer scorer) throws IOException;
> void collect(int doc) throws IOException;
> boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more
> easily decide to use different strategies on different segments. In
> particular, it makes the early-termination collector much cleaner since it
> can return different atomic collectors implementations depending on whether
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it
> easier to migrate by having a Collector that would implement both Collector
> and AtomicCollector, return {{this}} in setNextReader and make current
> concrete Collector implementations extend this class instead of directly
> extending Collector.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]