Issue with functions that require metadata, and LeafCollectors

Chris Russell Thu, 01 May 2014 14:37:41 -0700

Hi.
I have opened an issue on Jira about improving the scale() function: 
https://issues.apache.org/jira/browse/LUCENE-5637


I was able to improve the performance of the scale function quite a bit, but 
this required me to refactor some code in IndexSearcher.Search
There is a loop where scorers are created for each AtomicReaderContext, and 
then used to score documents. It looks like this in 4.8:
    for (AtomicReaderContext ctx : leaves) { // search each subreader
      try {
        collector.setNextReader(ctx);
      [...]
      BulkScorer scorer = weight.bulkScorer(ctx, 
!collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
      if (scorer != null) {
        try {
          scorer.score(collector);
        [...]
    }

I was able to break this up into two for-loops, and this was necessary because 
the scale function needed to see each AtomicReaderContext before being asked to 
score any documents, in order to determine the scale constant without doing 
something like grabbing the top level reader and looking at every document in 
the index (previous behavior)
So, new loops like this in 4.8:

   ArrayList<BulkScorer> scorers = new ArrayList<BulkScorer>();

   for (AtomicReaderContext ctx : leaves) { // search each subreader

     BulkScorer scorer = weight.bulkScorer(ctx, 
!collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());

     scorers.add(scorer);

   }

   for(int i = 0; i < leaves.size(); i++) {

     BulkScorer scorer = scorers.get(i);

     AtomicReaderContext ctx = leaves.get(i);

     try {

       collector.setNextReader(ctx);

     [...]

     if (scorer != null) {

       try {

         scorer.score(collector);

       [...]

   }

This seems to work fine and allows the function to gather the metadata it needs.

When trying to bring my code to trunk, I ran into an issue with the recently 
introduced LeafCollector interface.
It seems like setNextReader no longer exists, and scorer.score takes in a 
LeafCollector now.
In trunk, when I try to break this for-loop into two for-loops, it breaks a ton 
of unit tests.
I need the LeafCollectors in the first loop where I am making the scorers 
because LeafCollector now has the acceptDocsOutOfOrder method.
I need them in the second loop because that is what .score takes now.
So I tried keeping track of the LeafCollectors I created in the first loop and 
using them in the second, which did not work.
I also tried asking the collector for new LeafCollectors in each of the two 
loops, and that did not work.

I think this is all because setNextReader went away and there is some side 
effect I am encountering related to making a LeafCollector and not immediately 
scoring with it?  Does asking the passed-in collector for another LeafCollector 
for some other context do something to the previous LeafCollector?

All I am trying to do is create all scorers before using them, which seems like 
it should be possible logically.  This is especially useful for functions that 
require metadata.
Any assistance would be appreciated.

-Chris

Issue with functions that require metadata, and LeafCollectors

Reply via email to