[ 
https://issues.apache.org/jira/browse/LUCENE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5702:
---------------------------------
    Attachment: SortBench.java
                LUCENE-5702.patch

Updated patch to current trunk. I also did some benchmarking and the removal of 
the one-comparator specialization had a bad impact on performance so I added it 
back, we could discuss the over-specialization of top-field collectors in a 
different issue...

You can find attached the (dummy) benchmark that I used to check the 
performance impact of this patch. Times are in milliseconds (the smaller the 
better).

|| sort || trunk || patch || difference ||
| long asc | 100 | 108 | +8% |
| long desc | 101 | 110 | +9% |
| double asc | 107 | 114 | +7% |
| double desc | 113 | 118 | +4% |
| string asc | 119 | 123 | +3% |
| string desc | 120 | 124 | +3% |
| long asc, double asc | 98 | 87 | -11% |
| long desc, double desc | 102 | 89 | -13% |

Some cases are slightly faster, others are slightly slower. This benchmark only 
runs a sort to find the top 50 hits on a {{MatchAllDocsQuery}}, so differences 
would be even smaller if you run an actual query and/or have other collectors 
(eg. if you also want to compute facets).

This patch is **only** about API. It just splits FieldComparator into
 * FieldComparator:
 ** compare(int slot1, int slot2)
 ** void setTopValue(T value)
 ** T value(int slot)
 ** LeafFieldComparator getLeafComparator(LeafReaderContext context)
 * and LeafFieldComparator:
 ** int compareBottom(int doc)
 ** int compareTop(int doc)
 ** void copy(int slot, int doc)
 ** void setScorer(Scorer scorer)

All the logic about top-field collection is left unchanged. So there is still a 
single top-level priority queue that all leaf collectors update. I think 
changing the API is important for several reasons:
 * it makes the FieldComparator API aligned with the Collector API 
(LeafCollector <-> LeafFieldComparator)
 * it makes the workflow easier to understand: you need to get a 
LeafFieldComparator before you can call setScorer
 * Even if the patch does not contain any optimization, it would make 
per-segment optimizations easier. For instance, if all documents in a segment 
have the same value, we could ignore this sort field in comparisons. Or if an 
index has a single segment, we could decide to only use ordinals for 
comparisons and avoid copying values on each competitive hit.

> Per-segment comparator API
> --------------------------
>
>                 Key: LUCENE-5702
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5702
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: Trunk
>
>         Attachments: LUCENE-5702.patch, LUCENE-5702.patch, SortBench.java
>
>
> As a next step of LUCENE-5527, it would be nice to have per-segment 
> comparators, and maybe even change the default behavior of our top* 
> comparators so that they merge top hits in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to