Steve, Thanks for the great reply! That worked like a charm. I really appreciate it.
Thanks, Hilton Campbell -----Original Message----- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 05, 2007 2:50 PM To: java-user@lucene.apache.org Subject: Re: How can I search over all documents NOT in a certain subset? Hi Hilton, Hilton Campbell wrote: > Hello all, > > In my application I want to perform a search over all the documents > that are NOT in a certain subset, and I'm not sure how I should do > it. > > Specifically, the application performs a search and the top N results > are shown to the user. The user may then opt to see the next top N > results. By the time the user chooses to see the next N results, > however, there may be new, highly-relevant documents in the index (as > indexing is occurring concurrently). So instead of just skipping to > the next N, I need to perform a new search and get the top N that > haven't been seen yet. Is anyone aware of an efficient way to > implement this? > > I can think of at least one way: I can keep track of the documents > that have been seen and iterate through all the hits, skipping those > that have already been seen. I just want to see if there isn't a > better way that doesn't iterate through potentially hundreds of > already seen hits, or if anyone has any pointers on an efficient > implementation of this idea. Conceptually (caveat: untested), you could: 1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per IndexReader. The BitSet would hold one bit per doc[2], each initialized to true. 2. Unset a DejaVuFilter instance's bit for each of your top N docs by walking the TopDocs returned by Searcher.search(Query,Filter,int)[3]. Initially, you could pass in null for the Filter, and then for all following calls, an instance of DejaVuFilter. 3. Repeat step #2 as many times as necessary. Steve [1] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Filter.htm l> [2] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/IndexReader .html#maxDoc()> [3] <http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Searcher.h tml#search(org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter ,%20int)> -- Steve Rowe Center for Natural Language Processing http://www.cnlp.org/tech/lucene.asp --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]