Re: New query parser?

Roman Chyla Wed, 15 May 2013 04:35:32 -0700

Hi Jan,

Thanks for thumbs up



On Tue, May 14, 2013 at 11:14 AM, Jan Høydahl <[email protected]> wrote:

> Hello :)
>
> I think it has been the intention of the dev community for a long time to
> start using the flex parser framework, and in this regard this contribution
> is much welcome as a kickstarter for that.
> I have not looked much at the code, but I hope it could be a starting
> point for writing future parsers in a less "spaghetti" way.
>
> One question. Say we want to add a new operator such as NEAR/N. Ideally
> this should be added in Lucene, then all the Solr QParsers extending the
> lucene flex parser would benefit from the same new operator. Would this be
> easily achieved with your code you think? We also have a ton of
>


to add a new operator is very simple on the syntax level -- ie. when I want
the NEAR/x operator, I just change the ANTLR grammar, which produces the
approripate abstract syntax tree. The flex parser is consuming this.

Yet, imagine the following query

dog NEAR/5 cat

if you are using synonyms, an analyzer could have expanded dog with
synonyms, it becomes something like

(dog | canin) NEAR/5 cat

and since Lucene cannot handle these queries, the flex builder must rewrite
them, effectively producing

SpanNear(SpanOr(dog | cat), SpanTerm(cat), 5)

but you could also argue, that a better way to handle this query is:

SpanNear(dog, cat, 5) OR SpanNear(canin, cat, 5)

If that is the case, then a different builder will have to be used -

Just an example where syntax is relatively simple, but the semantics is the
hard part. But I believe the flex parser gives all necessary tools to deal
with that and avoid the spaghetti problem


--roman



> feature requests on the eDisMax parser for new kinds of query syntax
> support. Before we start implementing that on top of the
> already-hard-to-maintain eDismax code, we should think about
> re-implementing eDismax on top of flex, perhaps on top of Roman's contrib
> here?
>

btw: i am using edismax in one of my grammars -- ie. users can type: query
AND edismax(foo OR (dog AND cat)) -- and the "edismax(....)" will be parsed
by edismax, but I hit the problems there as well, it is not doing such a
nice job with operators and of course it doesn't know how to handle
multi-token synonym expansion, but I think it could be nicely extracted
into a flex processor and effectively become a plugin for a solr parser
(now, it is a parser of its own, which makes it hard to extend)





>
>  --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 14. mai 2013 kl. 17:07 skrev Roman Chyla <[email protected]>:
>
> Hello World!
>
> Following the recommended practice I'd like to let you know that I am
> about to start porting our existing query parser into JIRA with the aim of
> making it available to Lucene/SOLR community.
>
> The query parser is built on top of the flexible query parser, but it
> separates the parsing (ANTLR) and the query building - it allows for a very
> sophisticated custom logic and has self-retrospecting methods, so one can
> actually 'see' what is going on - I have had lots of FUN working with it
> (which I consider to be a feature, not a shameless plug ;)).
>
> Some write up is here:
> http://29min.wordpress.com/category/antlrqueryparser/
>
> You can see the source code at:
>
> https://github.com/romanchyla/montysolr/tree/master/contrib/antlrqueryparser
>
>
> If you think this project is duplicating something or even being useless
> (I hope not!) please let me know, stop me, say something...
>
> Thank you!
>
>   roman
>
>
>

Re: New query parser?

Reply via email to