Hello folks, Maybe one of you can help me with this (sorry, long read).
I have implemented a FuzzyPhraseQuery that works similar to Lucene's native PhraseQuery. I.e. it can retrieve phrases for a query, with respect to insertions and term order. But in addition it can also find matches with terms missing (deletions). Scoring is implemented as described here: http://www.gossamer-threads.com/lists/lucene/java-user/33558#33558 So the scorer uses the total error rather than the maximum error for insertions and out-of-order. That part works all fine (eventhough the total errors I'm observing quickly lead to very low frequencies returned by sloppyFreq() ) Now my problem is with scoring the deletion cases. My initial idea was to penalize a missing term position with its maximum error. Consider this: Query: a b c d Document A: b c d Term a is missing, score it as if it was at the worst position possible result: b c d a pos. diffs: -1 -1 -1 +3 It can be observed that the max error for the nth missing term is 2n - 2 If you have a query given with 100 terms and say 10 of them are not found, I would have a penalty of 190 + 192 + 194 etc. for extreme cases, this is rather simple to calculate. in the middle of a phrase, things get tricky though. Also the penalty becomes higher as the number of terms increases. So I think this is no viable solution for my problem. Does anyone know a better solution for scoring deletion cases? Thanks for your input, Philipp --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]