Thanks for the discussion, I really appreciate you pointing out that the > Code here ignores PhraseQuery (PQ) 's positions:
And by "here" you mean my original code not your suggestion. > To accommodate for this, the overall extra gap can be added to the slope: > int gap = (pp[pp.length] - pp[0]) - (pp.length - 1); // (+/- boundary > cases) > slope += gap; At 1st I was thinking my refinement of this would be to consider the original slop provided by the user and only extend it when necessary. For example: "The Importance of Being Earnest"~2 Already has enough slop to take into consideration the stop words 'the' and 'of', so no need to just add more to the slop. But a slop of 2 really means the user would accept. [The Importance of Really Truly Being Earnest] but I see that requires a slop of 3 to skip [of] [Really] [Truly] But I'm not sure if I understand the 'edit distance' for a phrase with more than 2 words. Does it apply to _all_the_edits_combined to bring the quoted phrase to match the index phrase as suggested by your calculation? Also, do any "boundary cases" (as mentioned in your comment) come to mind? > Also, this code suggestion simplifies in the case that the analyzer in effect > may emit more than one > term at the same position - for example when expanding the query with > synonyms, or when keeping > originals and stemmed forms - in that case just comparing pp[0] and > pp[pp.length-1] is insufficient, > and the positions should be examined while looping the phrase terms, > something like this: I don't understand what you mean that it simplifies, since you already listed the simplification in your first example which I think would work in cases with or without synonyms, so no need to walk through each distance as shown in your later code. > int dpos = pp[i+1] - p[i]; // (i>0) > if (dpos > 1) > slope += (dpos -1); > > Haven't tested this - just to give you an idea what to try next. Thanks for your input, I will experiment with some code that considers the original PQ positions when considering the slop value of any generated SpanNearQuery. -Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org