Hi.
I have opened an issue on Jira about improving the scale() function:
https://issues.apache.org/jira/browse/LUCENE-5637
I was able to improve the performance of the scale function quite a bit, but
this required me to refactor some code in IndexSearcher.Search
There is a loop where scorers are created for each AtomicReaderContext, and
then used to score documents. It looks like this in 4.8:
for (AtomicReaderContext ctx : leaves) { // search each subreader
try {
collector.setNextReader(ctx);
[...]
BulkScorer scorer = weight.bulkScorer(ctx,
!collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
if (scorer != null) {
try {
scorer.score(collector);
[...]
}
I was able to break this up into two for-loops, and this was necessary because
the scale function needed to see each AtomicReaderContext before being asked to
score any documents, in order to determine the scale constant without doing
something like grabbing the top level reader and looking at every document in
the index (previous behavior)
So, new loops like this in 4.8:
ArrayList<BulkScorer> scorers = new ArrayList<BulkScorer>();
for (AtomicReaderContext ctx : leaves) { // search each subreader
BulkScorer scorer = weight.bulkScorer(ctx,
!collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
scorers.add(scorer);
}
for(int i = 0; i < leaves.size(); i++) {
BulkScorer scorer = scorers.get(i);
AtomicReaderContext ctx = leaves.get(i);
try {
collector.setNextReader(ctx);
[...]
if (scorer != null) {
try {
scorer.score(collector);
[...]
}
This seems to work fine and allows the function to gather the metadata it needs.
When trying to bring my code to trunk, I ran into an issue with the recently
introduced LeafCollector interface.
It seems like setNextReader no longer exists, and scorer.score takes in a
LeafCollector now.
In trunk, when I try to break this for-loop into two for-loops, it breaks a ton
of unit tests.
I need the LeafCollectors in the first loop where I am making the scorers
because LeafCollector now has the acceptDocsOutOfOrder method.
I need them in the second loop because that is what .score takes now.
So I tried keeping track of the LeafCollectors I created in the first loop and
using them in the second, which did not work.
I also tried asking the collector for new LeafCollectors in each of the two
loops, and that did not work.
I think this is all because setNextReader went away and there is some side
effect I am encountering related to making a LeafCollector and not immediately
scoring with it? Does asking the passed-in collector for another LeafCollector
for some other context do something to the previous LeafCollector?
All I am trying to do is create all scorers before using them, which seems like
it should be possible logically. This is especially useful for functions that
require metadata.
Any assistance would be appreciated.
-Chris