On Wednesday 07 March 2007 18:12, Philipp Nanz wrote: > Thanks for your answers. Your input is really appreciated :-) > > @Paul Elschot: > Thanks for the hint. I guess I could use coord() to penalize missing > terms like this: > > Query: a b c d > Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1 > Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75 > > Doc would score higher. I guess that might be a valid solution. > > There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5 > > So a perfect match with one insertion would score less than a 3 of 4 > match with no slop.
Your examples are based on DefaultSimilarity. With a Similarity in your Scorer you can leave the tradeoff between these factors to the user of your query by letting them provide the Similarity at query time. > > As for spanqueries: > My implementation is based of the default PhraseQuery with slop > 0. I > don't know the inner workings of SpanQueries, but what you describe > sounds alot like what the PhraseQuery does as well (i.e. calculate max > distance between last and first term, and use that with sloppyFreq()). > > I chose PhraseQuery as base of my work, because I felt that it would > offer better performance than firing off a plethora of spanqueries to > express the same query. > > Long story short: My problem would generalize to spanqueries if > spanqueries would face the problem of deleted terms. But I guess they > don't?! Correct, they don't. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]