What the heck does is the JavaDoc for DisjunctionMaxQuery saying:

"A query that generates the union of documents produced by its subqueries, and 
that scores each document with the maximum score for that document as produced 
by any subquery, plus a tie breaking increment for any additional matching 
subqueries. This is useful when searching for a word in multiple fields with 
different boost factors (so that the fields cannot be combined equivalently 
into a single search field). We want the primary score to be the one associated 
with the highest boost, not the sum of the field scores (as BooleanQuery would 
give). If the query is "albino elephant" this ensures that "albino" matching 
one field and "elephant" matching another gets a higher score than "albino" 
matching both fields. To get this result, use both BooleanQuery and 
DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in 
each field, while the set of these DisjunctionMaxQuery's is combined into a 
BooleanQuery. The tie breaker capability allows results that include the same 
term in multiple fields to be judged better than results that include this term 
in only the best of those multiple fields, without confusing this with the 
better case of two different terms in the multiple fields."

"Maximum ...  as produced by any subquery", OK that makes sense.  We pick the 
score that is the highest
If you have
DMQ ( Q1, Q2, Q3 )
And the subquery scores are ( 0.1, 0.2, 0.1) then Q2 wins and the overall score 
is 0.2 right?
But then what is the meaning of "any additional matching subqueries"?
Is the description then

(1)    Running with the idea that something has to tie to involve a 
tie-breaker, I might say "If two subqueries are both the maximum of all the 
subqueries, the score will be the maximum score increased by the tie breaker 
increment"
Example: DMAQ with an increment of 0.15 and three subqueries ( Q1, Q2, Q3 ) 
which score (0.1, 0.2, 0.2) then
because there are two 0.2 score then the score for this query will be 0.2 + 
0.15 or 0.35.  If the scores are (0.1,0.1, 0.2) the overall score is 0.2, 
because we had only one maximum.

OR alternately forgetting the idea that anything is tied within the set of 
subqueries


(2)    "if in addition to the maximum subquery score there are any other 
subqueries with nonzero scores, the overall score is increased by the 
tiebreaker increment."

Example: Using the same increment of 0.15, if the score are (0.0, 0.0, 0.2) the 
result is score 0.2, but (0.0, 0.1, 0.2 ) scores 0.35.

I'm leaning toward interpretation #2, but "tie breaking for ... additional 
matching..." does not say that to me, because I don't see any tie.
Once I understand that I'll ask about the how to "use both BooleanQuery and 
DisjunctionMaxQuery".

-Paul

Reply via email to