On 11/20/2012 6:49 AM, Michael McCandless wrote:
On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan
<ravikumar.govindara...@gmail.com> wrote:

Also, for a TopN query sorted by doc-id will the query terminate early?
Actually, it won't!  But it really should ... you could make a
Collector that throws an exception once the N docs have been
collected?

I've never much liked this exception-throwing for early termination - IMO Lucene should really expose an Iterator-style API for pulling matches so that callers can choose when to terminate. I've been writing an XQuery service that uses Lucene as its data storage and retrieval engine. XQuery is entirely design to be lazily evaluated - everything is iterators from top to bottom, and the entire language is designed to be streamed so that all expressions can be terminated early. For this case I really needed early termination to be controlled *by the caller*, since the conditions for early termination are unknowable. So I wrote the attached class, which provides that by extending IndexSearcher.

Of course it would be nice if someone up to speed w/Lucene 4 would like to provide something similar built in to Lucene...

-Mike


package lux.search;

import java.io.IOException;

import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.Weight;
import org.apache.lucene.store.Directory;

public class LuxSearcher extends IndexSearcher {

  public LuxSearcher (Directory dir) throws IOException {
    super (dir);
  }
  
  public LuxSearcher (IndexSearcher searcher) {
      super (searcher.getIndexReader());
  }


  /**
   * @param query the Lucene query
   * @return the unordered results of the query as a Lucene DocIdSetIterator.  
Unordered means the order
   * is not predictable and may change with subsequent calls. 
   * @throws IOException
   */
  public DocIdSetIterator search (Query query) throws IOException {
      return new DocIterator (query, false);
  }

  /**
   * @param query the Lucene query
   * @return the results of the query as a Lucene DocIdSetIterator in docID 
order
   * @throws IOException
   */
  public DocIdSetIterator searchOrdered (Query query) throws IOException {
      return new DocIterator (query, true);
  }
  
  class DocIterator extends DocIdSetIterator {
      
      private final Weight weight;
      private final boolean ordered;
      private int nextReader;
      private int docID;
      private int docBase; // add to docID which is relative to each sub-reader
      private Scorer scorer;
      
      /**
       * @param query the lucene query whose results will be iterated
       * @param ordered whether the docs must be scored in order
       * @throws IOException
       */
      DocIterator (Query query, boolean ordered) throws IOException {
          weight = createNormalizedWeight(query);
          this.ordered = ordered;
          nextReader = 0;
          docID = -1;
          advanceScorer();
      }

      private void advanceScorer () throws IOException {
          while (nextReader < subReaders.length) {
              docBase = docStarts[nextReader];
              scorer = weight.scorer(subReaders[nextReader++], ordered, true);
              if (scorer != null) {
                  return;
              }
          }
          scorer = null;
      }
      
    @Override
    public int docID() {
        return docID;
    }

    @Override
    public int nextDoc() throws IOException {
        while (scorer != null) {
            docID = scorer.nextDoc();
            if (docID != NO_MORE_DOCS) {
                return docID + docBase;
            }
            advanceScorer();
        }
        return NO_MORE_DOCS;
    }

    @Override
    public int advance(int target) throws IOException {
        while (scorer != null) {
            docID = scorer.advance(target - docBase);
            if (docID != NO_MORE_DOCS) {
                return docID + docBase;
            }
            advanceScorer();
        }
        return NO_MORE_DOCS;
    }
      
  }
  

}

/* This Source Code Form is subject to the terms of the Mozilla Public
 * License, v. 2.0. If a copy of the MPL was not distributed with this file,
 * You can obtain one at http://mozilla.org/MPL/2.0/. */

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to