Re: Limiting search result for web search engine

2010-02-04 Thread Ian Lea
Mike Documents do not get passed to Collectors in order of highest score. It is the job of the collector to gather the top scoring docs, as is typically required, and implemented by TopScoreDocCollector for the most commonly used search method calls (according to the javadocs - read the javadocs!

Re: Limiting search result for web search engine

2010-02-04 Thread mpolzin
Ian, Yes, this makes sense, my guess is that by creating a custom collector and in my overridden Collect method looking up each document by the docid to get the base URL is going to create a fairly significant performance hit. And from the sounds of your response there is no guarantee that the d

Re: Limiting search result for web search engine

2010-02-04 Thread Ian Lea
Writing a custom collector is pretty straightforward. There is an example in the javadocs for Collector. Use it via Searcher.search(query, collector) or search(query, filter, collector). The docid is passed to the collect() method and you can use that to get at the document and thus the URL, via

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
I changed one line below... realized I missed the ! (NOT).. corrected in original reply. if ((hq.Size() < numHits || score >= minScore) && !collectedBaseURLArray.Contains(doc.BaseURL)) { mpolzin wrote: > > > if (score > 0.0f) > { > >

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl

Re: Limiting search result for web search engine

2010-02-03 Thread Hayri
Mike Polzin wrote: I am working on building a web search engine and I would like to build a reults page similar to what Google does. The functionality I am looking to include is what I refer to a "rolling up" sites, meaning that even if a particular site (defined by its base URL) has many relevent

Re: Limiting search result for web search engine

2010-02-02 Thread Anshum
Hi Mike, Not really through queries, but you may do this by writing a custom collector. You'd need some supporting data structure to mark/hash the occurrence of a domain in your result set. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the