In Lucene, 3.4 I recently implemented "Translating PhraseQuery to 
SpanNearQuery" (see Lucene in Action, page 220) because I wanted _order_ to 
matter.

Here is my exact code called from getFieldsQuery once I know I'm looking at a 
PhraseQuery, but I think it is exactly from the book.

    static Query buildSpanNearQuery(PhraseQuery phraseQ, int slop) {
        Term[] terms = phraseQ.getTerms();
        SpanTermQuery[] clauses = new SpanTermQuery[terms.length];
        for (int i = 0; i < terms.length; i++) {
            clauses[i] = new SpanTermQuery(terms[i]);
        }
        SpanNearQuery query = new SpanNearQuery(clauses, slop, 
PHRASE_ORDER_MATTERS);
        return query;
    }

I put in my own QueryParser and things looked good until I try a phrase with 
stop words.
Using the old PhraseQuery I got results on a phrase with stop words without 
extending the slop, but with SpanNearQuery unless the query includes some slop, 
nothing is found.
This conflicts with the typical use case of a user taking a phrase, pasting 
into the search bar with quotes and expecting to find his document.
I can't just add some more slop, because it depends on how many stop words are 
in any sequence in the phrase.

Any suggestions on how to solve the problem of combining the idea of SpanNear 
(so that words in order in a phrase is better) with text that has stop words 
removed, so that I can to support the simple use of quotes for exact quoted 
text matching?

Any Ideas?

-Paul

Reply via email to