Thank you both for the assistance. I ended up going the tf(float) override route rather than sloppyFreq. I want to keep the ability to specify how far part something is allowed to be, but from what I understood of Doron's response, I might lose that if I overrode sloppyFreq.
Because my application is a matching tool rather than a searching tool, it is okay for a term or phrase that matches multiple times to have the same score as a single match. Multiple matches don't mean anything good in my application. On 5/30/07, Doron Cohen <[EMAIL PROTECTED]> wrote:
Chris Hostetter <[EMAIL PROTECTED]> wrote on 29/05/2007 12:51:38: > > : I've found that trying to specify a near query using something like: > : actor_name_mv:"Foster, Jody"~2 > : matches "Foster, Jody" with a tf score of 1, but it matches "Jody > : Foster" with a tf score of .577 The phraseFreq in the first case is 1 > : and the phraseFreq in the second is 1/3. > > as i recall, phraseFreq is passed to the tf(float) function of your > similarity, you can get differnet behavior between the tf() for phrase > queries and simple term queries by having different tf impls > for tf(float) > - used for phrases; and tf(int) - used for terms. by default, tf(int) > calls tf(float) > > so you could make tf(float) round up to hte nearest int, and then > fractional phraseFreqs should score the same as exact phraseFreqs. do > some testing of cases where the phrase matches more then once on the same > field though ... it may not be what you expect ( i believe the > phraseFreqs are summed before calling tf() so there is no way to tell the > differnce between 1 exact match with a phraseFreq of "1" and 2 sloppy > matches each with phraseFreqs of 0.5. Yes they are summed before calling tf(). Would perhaps be better to override Similarity.sloppyFreq(int) to return 1 (when searching those queries) - this would actually mean that the sloppiness degree is ignored. It would not be symmetric though, in the sense that eg query "A B"~3, while it would score the same these docs: "A B"; "B A"; "A X B"; "B X A", it would find match "A X Y Z B" but not "B Z Y X A". In other words, this would not be equivalent to having SpanQuery's inOrder = false. Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]