Mike
Documents do not get passed to Collectors in order of highest score.
It is the job of the collector to gather the top scoring docs, as is
typically required, and implemented by TopScoreDocCollector for the
most commonly used search method calls (according to the javadocs -
read the javadocs!
Ian,
Yes, this makes sense, my guess is that by creating a custom collector and
in my overridden Collect method looking up each document by the docid to get
the base URL is going to create a fairly significant performance hit. And
from the sounds of your response there is no guarantee that the d
Writing a custom collector is pretty straightforward. There is an
example in the javadocs for Collector. Use it via
Searcher.search(query, collector) or search(query, filter, collector).
The docid is passed to the collect() method and you can use that to
get at the document and thus the URL, via
I changed one line below... realized I missed the ! (NOT).. corrected in
original reply.
if ((hq.Size() < numHits || score >= minScore) &&
!collectedBaseURLArray.Contains(doc.BaseURL))
{
mpolzin wrote:
>
>
> if (score > 0.0f)
> {
>
>
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few
more questions on this implementation. I looked at the source code for
Lucene and found the TopDocCollector class. It appears this class derives
from the HitCollector class, so I should be able to simply extend
TopDocColl
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few
more questions on this implementation. I looked at the source code for
Lucene and found the TopDocCollector class. It appears this class derives
from the HitCollector class, so I should be able to simply extend
TopDocColl
Mike Polzin wrote:
I am working on building a web search engine and I would like to build a reults page similar to what Google does. The functionality I am looking to include is what I refer to a "rolling up" sites, meaning that even if a particular site (defined by its base URL) has many relevent
Hi Mike,
Not really through queries, but you may do this by writing a custom
collector. You'd need some supporting data structure to mark/hash the
occurrence of a domain in your result set.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the