Thanks for the discussion,  I really appreciate you pointing out that the

> Code here ignores  PhraseQuery (PQ) 's positions:

And by "here" you mean my original code not your suggestion.

> To accommodate for this, the overall extra gap can be added to the slope:
>     int gap = (pp[pp.length] - pp[0]) - (pp.length - 1);  // (+/- boundary
> cases)
>     slope += gap;

At 1st I was thinking my refinement of this would be to consider the original 
slop provided by the user and only extend it when necessary.
For example:
"The Importance of Being Earnest"~2
Already has enough slop to take into consideration the stop words 'the' and 
'of', so no need to just add more to the slop. 
But a slop of 2 really means the user would accept.
[The Importance of Really Truly Being Earnest]  but I see that requires a slop 
of 3 to skip [of] [Really] [Truly]

But I'm not sure if I understand the 'edit distance' for a phrase with more 
than 2 words.  Does it apply to _all_the_edits_combined to bring the quoted 
phrase to match the index phrase as suggested by your calculation?

Also, do any "boundary cases"  (as mentioned in your comment) come to mind?

> Also, this code suggestion simplifies in the case that the analyzer in effect 
> may emit more than one
> term at the same position - for example when expanding the query with 
> synonyms, or when keeping
> originals and stemmed forms - in that case just comparing pp[0] and 
> pp[pp.length-1] is insufficient,
> and the positions should be examined while looping the phrase terms, 
> something like this:

I don't understand what you mean that it simplifies, since you already listed 
the simplification in your first example which I think would work in cases with 
or without synonyms, so no need to walk through each distance as shown in your 
later code.

>    int dpos = pp[i+1] - p[i]; // (i>0)
>    if (dpos > 1)
>        slope += (dpos -1);
> 
> Haven't tested this - just to give you an idea what to try next.

Thanks for your input, I will experiment with some code that considers the 
original PQ positions when considering the slop value of any generated 
SpanNearQuery.

-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to