Re: Query time document group boosting

Karl Wettin Thu, 27 Nov 2008 11:56:29 -0800


27 nov 2008 kl. 10.15 skrev Toke Eskildsen:

On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote:

The most scary part is that that you will have to score each andevery
document that has a source, probably all of the documents in your
corpus.


I now see my query-logic was flawed. In order to avoid matching all
documents every time, the query would have to be
"foo AND (
 groupboost_A:dummy^10 OR
 groupboost_B:dummy OR
 groupboost_C:dummy^0.1 OR
 ...
 groupboost_Z:dummy
)"

With that query, it seems that only documents matching foo will result
in a hit and be scored?

Someone else than me needs to answer this. I know there is nooptimization of boolean clauses, that is why I'm saying this: it ispossible that the boolean query weight actually will be visiting allthe inner clauses even though "foo" was not matched, i.e. alldocuments in the index are visited but might not all be scored.

A cosmetic remark, I would personally choose a single field for theboosts and then one token per source. (groupboost:A^10 groupboost:B^1groupboost:C^0.1).

I think you are looking for CustomScoreQuery.


Possibly, but my understanding is too weak to see how I can avoid a
substantial performance-hit for the check for source?

If I'm not misstaken CustomScoreQuery is a non matching query, that itonly touch the score of something that is already matched by asubquery. If these statments are true then it feels like there areclock ticks to save here.

But maybe all of these things are a bit too preemptive. Just use whatyou have if it seems to work well enough.




   karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query time document group boosting

Reply via email to