Ok, I understand. I will use the HitColector.
Thanks a lot for all the explanations!
Best,
Liat
2009/5/18 Erick Erickson
> As best I understand it, you DO NOT WANT A FILTER. Filters do notcontribute
> to scoring, therefore do not rank your documents. If you use
> a filter, the most irrelevant do
As best I understand it, you DO NOT WANT A FILTER. Filters do notcontribute
to scoring, therefore do not rank your documents. If you use
a filter, the most irrelevant document could be first. You want to use
a HitCollector, see the link in my last e-mail. That link includes an
example of using a bi
Sorry I didn't explain myself well.
The problem I try to address is the following:
Think about the case where you have 100,000 documents indexed. Take word 'a'
- if it appears in 80,000 documents, you want the score to take it into
account. You want only to see how 20,000 documents are close to a
I'm still unclear what you want the statistics *for*. "statistics"
are pretty meaningless as far as I understand. The whole point
of scoring is to use various "statistics" to *rank* documents *for
a specific query*. You cannot, for instance, compare scores
between different queries in any meaningfu
Yes, this is what I need - I don't need to get the scores for the documents
that were filtered.
The statistics I ment are idf(t) for example.
I want these to include the whole index of course.
It will include this info of all the index, right?
if I have a list of ids that the query should look at,
Hmmm, come to think of it, if you pass the Filter to the search I*think* you
don't get scores for that clause, but you may want to
check it out...
So I think you should think about implementing a HitCollector
and collect only the documents you care about.
This is really very little extra work sin
Yes, I have a pre-defined list of documents that I care about.
Then I can do the search on these, but it will take the statictics of the
whole index, right?
2009/5/14 Erick Erickson
> I don't know if I'm understanding what you want, but if you havea
> pre-defined list of documents, couldn't y
I don't know if I'm understanding what you want, but if you havea
pre-defined list of documents, couldn't you form a Filter? Then
your results would only be the documents you care about.
If this is irrelevant, perhaps you could explain a bit more about
the problem you're trying to solve.
Best
Eri