If your query explicitly excludes certain terms then surely you can be confident that matched docs will not contain those terms, and if your random docs are a subset of those matched docs they won't contain them either.
-- Ian. On Tue, Mar 29, 2011 at 11:01 PM, Patrick Diviacco <patrick.divia...@gmail.com> wrote: > One last thing, how do I check if the random document does not contain the > term ? > > In other words, I cannot just pass the TermsFilter but I need to check if > the retrieved random document is valid or not to know if I have enough. > > Any code example is appreciated.. so far I have this one, to retrieve docs > without that specific term. > > BooleanFilter termsNOTFilter = new BooleanFilter(); > FilterClause notTermClause = new FilterClause(termsFilter, > org.apache.lucene.search.BooleanClause.Occur.MUST_NOT); > termsNOTFilter.add(notTermClause); > > thanks > > > > > On 29 March 2011 22:12, Ian Lea <ian....@gmail.com> wrote: > >> > Plan A sounds better because I don't want to consider the entire >> collection >> > and then remove results from it. >> >> Fine, your choice. >> >> > However, the same code has to work with 2 different collections. The >> first >> > one has 30.000 docs the other one 90.000. >> >> No problem. The number of docs is irrelevant. >> >> > How can I get the total amount of docs from a collection ? >> >> IndexReader.numDocs(). See also maxDoc() and numDeletedDocs(). >> >> >> -- >> Ian. >> >> > On 29 March 2011 21:48, Ian Lea <ian....@gmail.com> wrote: >> > >> >> Here are a couple of ideas. >> >> >> >> Plan A. >> >> >> >> Think of a number, say 10, retrieve n * 10 docids in your search and >> >> then loop round java.util.Random.nextInt(n * 10) until you've got >> >> enough. >> >> >> >> Plan B. >> >> >> >> Reverse your MUST NOT search to get a list of docids that you don't >> >> want, then loop round Random.nextInt(indexreader.numDocs()), selecting >> >> those that are not deleted (!indexreader.isDeleted(docid)) and are not >> >> in your exclusion list. >> >> >> >> >> >> I'm sure there are other ways, probably better. >> >> >> >> >> >> -- >> >> Ian. >> >> >> >> >> >> On Tue, Mar 29, 2011 at 8:00 PM, Patrick Diviacco >> >> <patrick.divia...@gmail.com> wrote: >> >> > Ok I've solved the first part of the problem. I'm now selecting all >> >> > documents that do not contain a given term with a BooleanFilter >> >> > and FilterClause, MUST NOT. >> >> > >> >> > I still have to understand how to retrieve random documents and limit >> the >> >> > number of retrieved docs to N. >> >> > >> >> > thanks >> >> > >> >> > On 29 March 2011 20:40, Patrick Diviacco <patrick.divia...@gmail.com> >> >> wrote: >> >> > >> >> >> Is there a Filter to get a limited number of random collection docs >> from >> >> >> the index which DO NOT contain a specific term ? >> >> >> >> >> >> i.e. term="pizza" >> >> >> >> >> >> I want to run the query against 10 random documents of the collection >> >> that >> >> >> do not contain the term "pizza". >> >> >> >> >> >> thanks >> >> >> >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org