Thanks for your answers. Your input is really appreciated :-)
@Paul Elschot: Thanks for the hint. I guess I could use coord() to penalize missing terms like this: Query: a b c d Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 Doc would score higher. I guess that might be a valid solution. There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5 So a perfect match with one insertion would score less than a 3 of 4 match with no slop. As for spanqueries: My implementation is based of the default PhraseQuery with slop > 0. I don't know the inner workings of SpanQueries, but what you describe sounds alot like what the PhraseQuery does as well (i.e. calculate max distance between last and first term, and use that with sloppyFreq()). I chose PhraseQuery as base of my work, because I felt that it would offer better performance than firing off a plethora of spanqueries to express the same query. Long story short: My problem would generalize to spanqueries if spanqueries would face the problem of deleted terms. But I guess they don't?! @Chris Hostetter: You are absolutely right. But it shows off into which direction it could go to. Perhaps I could add +1 (or some other amount) as additional penalty to the maximum error for missing terms to distinguish between these cases further. But still this could lead to a case where Doc A: a b c x1 x2 [more x...] xn d will be scored lower than Doc B: a b c (because the distance of A can exceed the penalty for the missing term - its only a matter of choosing the right n) which is questionable as well. 2007/3/6, Chris Hostetter <[EMAIL PROTECTED]>:
: My initial idea was to penalize a missing term position with its maximum error. : : Consider this: : Query: a b c d : Document A: b c d : : Term a is missing, score it as if it was at the worst position possible : : result: b c d a : pos. diffs: -1 -1 -1 +3 side comment: this doesn't sound very useful, a document containing "b c d" matches equally to a doc containing "b c d a" ? ... shouldn't a doc containing "b c d a" be considered a much better match since it at least contains all of the terms close together? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]