Re: sloppyFreq question

2009-04-03 Thread Chris Hostetter
: Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the : terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the : terms are indexed at the same position. In doc 2, the terms are indexed in : adjacent positions (normal way). For the query "the quick brow

Re: sloppyFreq question

2009-03-20 Thread Peter Keegan
Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the terms are indexed at the same position. In doc 2, the terms are indexed in adjacent positions (normal way). For the query "the quick brown fox", d

Re: sloppyFreq question

2009-03-17 Thread Chris Hostetter
: > I suppose SpanTermQuery could override the weight/scorer methods so that : > it behaved more like a TermQuery if it was executed directly ... but : > that's really not what it's intended for. : : This is currently the only way to boost a term via payloads. : BoostingTermQuery extends SpanTerm

Re: sloppyFreq question

2009-03-11 Thread Peter Keegan
> I suppose SpanTermQuery could override the weight/scorer methods so that > it behaved more like a TermQuery if it was executed directly ... but > that's really not what it's intended for. This is currently the only way to boost a term via payloads. BoostingTermQuery extends SpanTermQuery. > if

Re: sloppyFreq question

2009-03-11 Thread Chris Hostetter
: For a SpanNearQuery that contains SpanTermQueries, the score for a match on : "the quick brown fox" would be lower than a match on "brown fox" because of : the edit distance (4 vs 2). This seems counter intuitive, too. you have to clarify what you mean ... if you're talking about a SpanNearQu

Re: sloppyFreq question

2009-03-11 Thread Chris Hostetter
: For a 'SpanNearQuery', this reduces the effect of the term frequency on the : score as the number of terms in the span increases. So, for a simple phrase : query (using spans), the longer the phrase, the lower the TF. For a simple : SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1). : : I

Re: sloppyFreq question

2009-03-09 Thread Peter Keegan
The reason I asked about Span scoring is that the behavior changed when I switched from TermQuery to BoostingTermQuery to take advantage of payloads. It seems to me that a SpanTermQuery and BoostingTermQuery should behave the same as TermQuery with respect to term frequency. The 'edit distance' is

sloppyFreq question

2009-03-03 Thread Peter Keegan
The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number of terms in the span increases. So, for a simple phrase query (using spans),