Hi all. I am writing a custom query parser which strongly resembles StandardQueryParser (I use a lot of the same processors and builders, with a slightly customised config handler and a completely new syntax parser written as an ANTLR grammar.) My parser has additional syntax for span queries. The SyntaxParser is pretty much done and now I'm up to the stage where I have to process this into a valid Query object.
Of course, span queries cannot accept any other kind of query inside them (at least not yet - I realise work is already being done to unify the two kinds of query), so any query the user might put inside there needs to be transformed into an equivalent span query. For some of these, this is straight-forward TermQuery -> convert to SpanTermQuery WildcardQuery, PrefixQuery, FuzzyQuery, RegexQuery -> wrap in SpanMultiTermQueryWrapper For PhraseQuery and MultiPhraseQuery, as long as the slop is 0, it seems like you can rewrite as follows: phrase-query( term-query('this'), term-query('is'), term-query('my'), term-query('cat') ) -> span-near-query({slop=0, forwards-only=true} span-term-query('this'), span-term-query('is'), span-term-query('my'), span-term-query('cat') ) (For MultiPhraseQuery the inner queries would be rewritten to SpanMultiTermQueryWrapper but aside from that, it's the same.) When the slop is non-zero, I'm not sure what to do. Does it still translate directly? I suspect not, because PhraseQuery slop is asymmetrical (centred around the term *after* the previous match) whereas SpanNearQuery slop is symmetrical (centred around the previous match, although the term to either side is numbered 0 instead of 1 as one might expect.) Q1: Is there some way to (precisely) simulate phrase query behaviour in spans? For boolean queries, it depends... If it's a pure OR query, you can rewrite like this: within(2, 'my', or('cat', 'dog')) -> or( within(2, 'my', 'cat'), within(2, 'my', 'dog') ) This doesn't appear to change the semantics of the query. I notice there is a SpanOrQuery as well, which I could probably use instead... but it doesn't seem to make a difference. For AND (and for any "default boolean" queries which aren't equivalent to OR) queries, I have problems. For instance, you can't do this: within(5, 'my', and('cat', 'dog')) -> and( within(5, 'my', 'cat'), within(5, 'my', 'dog') ) The problem is that this changes the semantics - the original query implies that the same "my" span is used when matching the other two, whereas the rewritten form allows it to be any "my" in the document. This problem doesn't exist with OR queries because it doesn't have to match both terms. Q2: Is there some way to "pin this down" such that the "my" matched by each is the same position? TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org