27 nov 2008 kl. 10.15 skrev Toke Eskildsen:

On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote:
The most scary part is that that you will have to score each and every
document that has a source, probably all of the documents in your
corpus.

I now see my query-logic was flawed. In order to avoid matching all
documents every time, the query would have to be
"foo AND (
 groupboost_A:dummy^10 OR
 groupboost_B:dummy OR
 groupboost_C:dummy^0.1 OR
 ...
 groupboost_Z:dummy
)"

With that query, it seems that only documents matching foo will result
in a hit and be scored?

Someone else than me needs to answer this. I know there is no optimization of boolean clauses, that is why I'm saying this: it is possible that the boolean query weight actually will be visiting all the inner clauses even though "foo" was not matched, i.e. all documents in the index are visited but might not all be scored.

A cosmetic remark, I would personally choose a single field for the boosts and then one token per source. (groupboost:A^10 groupboost:B^1 groupboost:C^0.1).

I think you are looking for CustomScoreQuery.

Possibly, but my understanding is too weak to see how I can avoid a
substantial performance-hit for the check for source?

If I'm not misstaken CustomScoreQuery is a non matching query, that it only touch the score of something that is already matched by a subquery. If these statments are true then it feels like there are clock ticks to save here.

But maybe all of these things are a bit too preemptive. Just use what you have if it seems to work well enough.



   karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to