Re: Limiting search result for web search engine

2010-02-04 Thread mpolzin
> > I think that I'd move your deduping logic to after the search and set > a limit on the number of hits that you check. That way you'd also get > the best hit first. > > > -- > Ian. > > > On Thu, Feb 4, 2010 at 5:23 AM, mpolzin wrote: >>

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
I changed one line below... realized I missed the ! (NOT).. corrected in original reply. if ((hq.Size() < numHits || score >= minScore) && !collectedBaseURLArray.Contains(doc.BaseURL)) { mpolzin wrote: > > >

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl