Thank you both for the assistance.
I ended up going the tf(float) override route rather than sloppyFreq.
I want to keep the ability to specify how far part something is
allowed to be, but from what I understood of Doron's response, I might
lose that if I overrode sloppyFreq.

Because my application is a matching tool rather than a searching
tool, it is okay for a term or phrase that matches multiple times to
have the same score as a single match.  Multiple matches don't mean
anything good in my application.

On 5/30/07, Doron Cohen <[EMAIL PROTECTED]> wrote:
Chris Hostetter <[EMAIL PROTECTED]> wrote on 29/05/2007 12:51:38:
>
> : I've found that trying to specify a near query using something like:
> : actor_name_mv:"Foster, Jody"~2
> : matches "Foster, Jody" with a tf score of 1, but it matches "Jody
> : Foster" with a tf score of .577  The phraseFreq in the first case is 1
> : and the phraseFreq in the second is 1/3.
>
> as i recall, phraseFreq is passed to the tf(float) function of your
> similarity, you can get differnet behavior between the tf() for phrase
> queries and simple term queries by having different tf impls
> for tf(float)
> - used for phrases; and tf(int) - used for terms.  by default, tf(int)
> calls tf(float)
>
> so you could make tf(float) round up to hte nearest int, and then
> fractional phraseFreqs should score the same as exact phraseFreqs.  do
> some testing of cases where the phrase matches more then once on the same
> field though ... it may not be what you expect ( i believe the
> phraseFreqs are summed before calling tf() so there is no way to tell the
> differnce between 1 exact match with a phraseFreq of "1" and 2 sloppy
> matches each with phraseFreqs of 0.5.

Yes they are summed before calling tf(). Would perhaps be
better to override Similarity.sloppyFreq(int) to return 1
(when searching those queries) - this would actually mean
that the sloppiness degree is ignored. It would not be symmetric
though, in the sense that eg query "A B"~3, while it would
score the same these docs: "A B"; "B A"; "A X B"; "B X A",
it would find match "A X Y Z B" but not "B Z Y X A". In
other words, this would not be equivalent to having
SpanQuery's inOrder = false.

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to