I changed one line below... realized I missed the ! (NOT).. corrected in original reply.
if ((hq.Size() < numHits || score >= minScore) && !collectedBaseURLArray.Contains(doc.BaseURL)) { mpolzin wrote: > > > if (score > 0.0f) > { > > // Do something here to get the document base URL > (doc.BaseURL) > > if ((hq.Size() < numHits || score >= minScore) && > !collectedBaseURLArray.Contains(doc.BaseURL)) > { > collectedBaseURLArray.Add(doc.BaseURL); > totalHits++; > hq.Insert(new ScoreDoc(doc, score)); > minScore = ((ScoreDoc) hq.Top()).score; // maintain > minScore > } > } > > Does this make sense? > > How could I tell the search to use my extended version of the > TopDocCollector class? Also, how would I pull the URL from the document > inside of the loop above? I didn't see any good documentation anywhere on > how to do that. There seems to be little information out there on how to > build your own custom collector. > > Thanks again, > Mike > > > Anshum-2 wrote: >> >> Hi Mike, >> Not really through queries, but you may do this by writing a custom >> collector. You'd need some supporting data structure to mark/hash the >> occurrence of a domain in your result set. >> >> -- >> Anshum Gupta >> Naukri Labs! >> http://ai-cafe.blogspot.com >> >> The facts expressed here belong to everybody, the opinions to me. The >> distinction is yours to draw............ >> >> >> On Wed, Feb 3, 2010 at 6:56 AM, Mike Polzin <mikepol...@yahoo.com> wrote: >> >>> I am working on building a web search engine and I would like to build a >>> reults page similar to what Google does. The functionality I am looking >>> to >>> include is what I refer to a "rolling up" sites, meaning that even if a >>> particular site (defined by its base URL) has many relevent hits on >>> various >>> pages for the searches keywords, that site is only shown once in the >>> results >>> listing with a link to the most relevent hit on that site. What I do not >>> want is to have one site dominate a search results page. >>> >>> Does it make sense to just do the search, get the hits list and then >>> programatically remove the results which, although they meet the search >>> criteria, are not as relevent? Is there a way to do this through >>> queries? >>> >>> Thanks in advance! >>> >>> Mike >>> >>> >>> >> >> > > -- View this message in context: http://old.nabble.com/Limiting-search-result-for-web-search-engine-tp27430155p27447903.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org