Re: alternative scoring algorithm for PhraseQuery

Chris Hostetter Wed, 07 Mar 2007 10:31:51 -0800

: Query: a b c d
: Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
: Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75
:
: Doc would score higher. I guess that might be a valid solution.
: There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5
: So a perfect match with one insertion would score less than a 3 of 4
: match with no slop.


but now you've put the control in the hands of the client: they can choose
a Similarity based on what is more important too them: if matching more
clauses is important, they can have a strict coord function, if matching
with less slop is more important they can have a strict sloppyFreq method.

: don't know the inner workings of SpanQueries, but what you describe
: sounds alot like what the PhraseQuery does as well (i.e. calculate max
: distance between last and first term, and use that with sloppyFreq()).

correct, the big advantage of Span queries is that while a SpanNearQuery
is roughly equivilent to a PhraseQuery, a PhraseQuery can only contain
Terms, whilea SPanNearQuery can contain other spans ... so a spannear
query for: "a b c d" can function even if "a" is a complicated sub query
(like "x OR y OR (p near q but not with z between them)")



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: alternative scoring algorithm for PhraseQuery

Reply via email to