[
https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698985#comment-13698985
]
Roman Chyla commented on LUCENE-5014:
-------------------------------------
will it be OK to include the solr parts in this ticket? besides the jira name,
that seems s aa best option to me.
> ANTLR Lucene query parser
> -------------------------
>
> Key: LUCENE-5014
> URL: https://issues.apache.org/jira/browse/LUCENE-5014
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/queryparser, modules/queryparser
> Affects Versions: 4.3
> Environment: all
> Reporter: Roman Chyla
> Labels: antlr, query, queryparser
> Attachments: LUCENE-5014.txt, LUCENE-5014.txt, LUCENE-5014.txt,
> LUCENE-5014.txt
>
>
> I would like to propose a new way of building query parsers for Lucene.
> Currently, most Lucene parsers are hard to extend because they are either
> written in Java (ie. the SOLR query parser, or edismax) or the parsing logic
> is 'married' with the query building logic (i.e. the standard lucene parser,
> generated by JavaCC) - which makes any extension really hard.
> Few years back, Lucene got the contrib/modern query parser (later renamed to
> 'flexible'), yet that parser didn't become a star (it must be very confusing
> for many users). However, that parsing framework is very powerful! And it is
> a real pity that there aren't more parsers already using it - because it
> allows us to add/extend/change almost any aspect of the query parsing.
> So, if we combine ANTLR + queryparser.flexible, we can get very powerful
> framework for building almost any query language one can think of. And I hope
> this extension can become useful.
> The details:
> - every new query syntax is written in EBNF, it lives in separate files (and
> can be tested/developed independently - using 'gunit')
> - ANTLR parser generates parsing code (and it can generate parsers in
> several languages, the main target is Java, but it can also do Python - which
> may be interesting for pylucene)
> - the parser generates AST (abstract syntax tree) which is consumed by a
> 'pipeline' of processors, users can easily modify this pipeline to add a
> desired functionality
> - the new parser contains a few (very important) debugging functions; it can
> print results of every stage of the build, generate AST's as graphical
> charts; ant targets help to build/test/debug grammars
> - I've tried to reuse the existing queryparser.flexible components as much
> as possible, only adding new processors when necessary
> Assumptions about the grammar:
> - every grammar must have one top parse rule called 'mainQ'
> - parsers must generate AST (Abstract Syntax Tree)
> The structure of the AST is left open, there are components which make
> assumptions about the shape of the AST (ie. that MODIFIER is parent of a a
> FIELD) however users are free to choose/write different processors with
> different assumptions about the AST shape.
> More documentation on how to use the parser can be seen here:
> http://29min.wordpress.com/category/antlrqueryparser/
> The parser has been created more than one year back and is used in production
> (http://labs.adsabs.harvard.edu/adsabs/). A different dialects of query
> languages (with proximity operatos, functions, special logic etc) - can be
> seen here:
> https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs
> https://github.com/romanchyla/montysolr/tree/master/contrib/invenio
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]