I like this approach. This may be what I'm looking for.
Thanks JP!
-Jay
On 6/15/05, Robichaud, Jean-Philippe
<[EMAIL PROTECTED]> wrote:
>
> It may be simpler and more effective to use the Hits object and keep the
> number of time each host was actually "returned" to the user and skip it if
> the limit has been reach. This way, if your users just look at the 10-20
> highest hits, you will save you a lot of processing time, especially if your
> index is huge...
>
> Here is some pseudo code stripped from a class I once wrote
>
>
> Hits hits = iSearcher.search(myQuery);
> IntHash hostFreqCount = new IntHash();
>
> int i=0;
> int j=0;
>
> while(i < hist.length) {
> j=0;
> for(; (i<hits.length && j < 10); i++,j++) {
>
> Document doc = iSearcher.doc(hits.doc(i));
> String host_id = doc.get("host_id");
> hostFreqCount.inc(host_id);
>
> if(hostFreqCount.get(host_id) > 3) continue;
>
> /// show the hit to the use...
>
> }
> }
>
>
> Hope it helped !
>
> Jp
>
>
> -----Original Message-----
> From: Jay Hill [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 15, 2005 2:01 PM
> To: [email protected]
> Subject: Re: Need a way to set a result limit on a particular field
>
> Thanks Tony and Erik for the replies. The trick is we don't know the
> hosts that will be returned in advance, we just don't want more than 3
> from any one host. It's not unlike searching on Google where you might
> see a link that says "More results from foo.com". We essentially want
> to discard any results > 3 for any one host. In some of our searches
> we might get high scores on 20 or 30 documents, but we don't want to
> show page after page from the same host, we'd rather limit it to 3
> from each for more diversity.
>
> I may have to use a brute force approach using HitCollector as Tony
> suggests. I was hoping to avoid the HitCollector, but there may be no
> other way right now.
>
> Many thanks,
> -Jay
>
>
> On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> >
> > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote:
> > > I have a need to limit my Hits returned based on one of the indexed
> > > fields. This is a web application and we want to limit the number of
> > > hits from any one host. We have a field named "host_id" and I'd like
> > > to be able to limit my results to no more than three results for any
> > > one host_id.
> >
> > I may not be fully understanding your question, but I'll go with my
> > assumptions... wrap the users query into a BooleanQuery as a required
> > clause and then add another clause with a TermQuery for the specific
> > host_id. Then simply constrain the number of Hits shown to the first
> > 3. Does that do what you're after?
> >
> > Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]