27 nov 2008 kl. 10.15 skrev Toke Eskildsen:
On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote:
The most scary part is that that you will have to score each and
every
document that has a source, probably all of the documents in your
corpus.
I now see my query-logic was flawed. In order to avoid matching all
documents every time, the query would have to be
"foo AND (
groupboost_A:dummy^10 OR
groupboost_B:dummy OR
groupboost_C:dummy^0.1 OR
...
groupboost_Z:dummy
)"
With that query, it seems that only documents matching foo will result
in a hit and be scored?
Someone else than me needs to answer this. I know there is no
optimization of boolean clauses, that is why I'm saying this: it is
possible that the boolean query weight actually will be visiting all
the inner clauses even though "foo" was not matched, i.e. all
documents in the index are visited but might not all be scored.
A cosmetic remark, I would personally choose a single field for the
boosts and then one token per source. (groupboost:A^10 groupboost:B^1
groupboost:C^0.1).
I think you are looking for CustomScoreQuery.
Possibly, but my understanding is too weak to see how I can avoid a
substantial performance-hit for the check for source?
If I'm not misstaken CustomScoreQuery is a non matching query, that it
only touch the score of something that is already matched by a
subquery. If these statments are true then it feels like there are
clock ticks to save here.
But maybe all of these things are a bit too preemptive. Just use what
you have if it seems to work well enough.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]