One last thing, how do I check if the random document does not contain the term ?
In other words, I cannot just pass the TermsFilter but I need to check if the retrieved random document is valid or not to know if I have enough. Any code example is appreciated.. so far I have this one, to retrieve docs without that specific term. BooleanFilter termsNOTFilter = new BooleanFilter(); FilterClause notTermClause = new FilterClause(termsFilter, org.apache.lucene.search.BooleanClause.Occur.MUST_NOT); termsNOTFilter.add(notTermClause); thanks On 29 March 2011 22:12, Ian Lea <ian....@gmail.com> wrote: > > Plan A sounds better because I don't want to consider the entire > collection > > and then remove results from it. > > Fine, your choice. > > > However, the same code has to work with 2 different collections. The > first > > one has 30.000 docs the other one 90.000. > > No problem. The number of docs is irrelevant. > > > How can I get the total amount of docs from a collection ? > > IndexReader.numDocs(). See also maxDoc() and numDeletedDocs(). > > > -- > Ian. > > > On 29 March 2011 21:48, Ian Lea <ian....@gmail.com> wrote: > > > >> Here are a couple of ideas. > >> > >> Plan A. > >> > >> Think of a number, say 10, retrieve n * 10 docids in your search and > >> then loop round java.util.Random.nextInt(n * 10) until you've got > >> enough. > >> > >> Plan B. > >> > >> Reverse your MUST NOT search to get a list of docids that you don't > >> want, then loop round Random.nextInt(indexreader.numDocs()), selecting > >> those that are not deleted (!indexreader.isDeleted(docid)) and are not > >> in your exclusion list. > >> > >> > >> I'm sure there are other ways, probably better. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Tue, Mar 29, 2011 at 8:00 PM, Patrick Diviacco > >> <patrick.divia...@gmail.com> wrote: > >> > Ok I've solved the first part of the problem. I'm now selecting all > >> > documents that do not contain a given term with a BooleanFilter > >> > and FilterClause, MUST NOT. > >> > > >> > I still have to understand how to retrieve random documents and limit > the > >> > number of retrieved docs to N. > >> > > >> > thanks > >> > > >> > On 29 March 2011 20:40, Patrick Diviacco <patrick.divia...@gmail.com> > >> wrote: > >> > > >> >> Is there a Filter to get a limited number of random collection docs > from > >> >> the index which DO NOT contain a specific term ? > >> >> > >> >> i.e. term="pizza" > >> >> > >> >> I want to run the query against 10 random documents of the collection > >> that > >> >> do not contain the term "pizza". > >> >> > >> >> thanks > >> >> > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >