One last thing, how do I check if the random document does not contain the
term ?

In other words, I cannot just pass the TermsFilter but I need to check if
the retrieved random document is valid or not to know if I have enough.

Any code example is appreciated.. so far I have this one, to retrieve docs
without that specific term.

BooleanFilter termsNOTFilter = new BooleanFilter();
FilterClause notTermClause = new FilterClause(termsFilter,
org.apache.lucene.search.BooleanClause.Occur.MUST_NOT);
termsNOTFilter.add(notTermClause);

thanks




On 29 March 2011 22:12, Ian Lea <ian....@gmail.com> wrote:

> > Plan A sounds better because I don't want to consider the entire
> collection
> > and then remove results from it.
>
> Fine, your choice.
>
> > However, the same code has to work with 2 different collections. The
> first
> > one has 30.000 docs the other one 90.000.
>
> No problem.  The number of docs is irrelevant.
>
> > How can I get the total amount of docs from a collection ?
>
> IndexReader.numDocs().  See also maxDoc() and numDeletedDocs().
>
>
> --
> Ian.
>
> > On 29 March 2011 21:48, Ian Lea <ian....@gmail.com> wrote:
> >
> >> Here are a couple of ideas.
> >>
> >> Plan A.
> >>
> >> Think of a number, say 10, retrieve n * 10 docids in your search and
> >> then loop round java.util.Random.nextInt(n * 10) until you've got
> >> enough.
> >>
> >> Plan B.
> >>
> >> Reverse your MUST NOT search to get a list of docids that you don't
> >> want, then loop round Random.nextInt(indexreader.numDocs()), selecting
> >> those that are not deleted (!indexreader.isDeleted(docid)) and are not
> >> in your exclusion list.
> >>
> >>
> >> I'm sure there are other ways, probably better.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Tue, Mar 29, 2011 at 8:00 PM, Patrick Diviacco
> >> <patrick.divia...@gmail.com> wrote:
> >> > Ok I've solved the first part of the problem. I'm now selecting all
> >> > documents that do not contain a given term with a BooleanFilter
> >> > and FilterClause, MUST NOT.
> >> >
> >> > I still have to understand how to retrieve random documents and limit
> the
> >> > number of retrieved docs to N.
> >> >
> >> > thanks
> >> >
> >> > On 29 March 2011 20:40, Patrick Diviacco <patrick.divia...@gmail.com>
> >> wrote:
> >> >
> >> >> Is there a Filter to get a limited number of random collection docs
> from
> >> >> the index which DO NOT contain a specific term ?
> >> >>
> >> >> i.e. term="pizza"
> >> >>
> >> >> I want to run the query against 10 random documents of the collection
> >> that
> >> >> do not contain the term "pizza".
> >> >>
> >> >> thanks
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to