Hi Hilton, Hilton Campbell wrote: > Hello all, > > In my application I want to perform a search over all the documents > that are NOT in a certain subset, and I'm not sure how I should do > it. > > Specifically, the application performs a search and the top N results > are shown to the user. The user may then opt to see the next top N > results. By the time the user chooses to see the next N results, > however, there may be new, highly-relevant documents in the index (as > indexing is occurring concurrently). So instead of just skipping to > the next N, I need to perform a new search and get the top N that > haven't been seen yet. Is anyone aware of an efficient way to > implement this? > > I can think of at least one way: I can keep track of the documents > that have been seen and iterate through all the hits, skipping those > that have already been seen. I just want to see if there isn't a > better way that doesn't iterate through potentially hundreds of > already seen hits, or if anyone has any pointers on an efficient > implementation of this idea.
Conceptually (caveat: untested), you could: 1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per IndexReader. The BitSet would hold one bit per doc[2], each initialized to true. 2. Unset a DejaVuFilter instance's bit for each of your top N docs by walking the TopDocs returned by Searcher.search(Query,Filter,int)[3]. Initially, you could pass in null for the Filter, and then for all following calls, an instance of DejaVuFilter. 3. Repeat step #2 as many times as necessary. Steve [1] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Filter.html> [2] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/IndexReader.html#maxDoc()> [3] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int)> -- Steve Rowe Center for Natural Language Processing http://www.cnlp.org/tech/lucene.asp --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]