Hey everyone, My organization uses our own homebrew QueryParser class, unrelated to Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently use anything from Solr. Our QueryParser class has gotten quite cumbersome, and I'm looking into alternatives. Grammar-based parsing seems like the way to go, but I've got some questions:
- ANTLR seems to be very well-supported and well-liked, but I see that Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have experience writing a Lucene or Solr parser using ANTLR? Any thoughts on whether it would be helpful to stick with JavaCC, or problematic to use ANTLR, in light of Lucene's default usage of JavaCC? - Any experience using ANTLR for tokenization? - I was told that Solr might be componentizing its query parsing in such a way that we might be able to use that instead of a homebrew grammar-based solution. However, I haven't found anything written about that. I don't know much about Solr's query parsing, other than what I saw looking at QParser.java and QParserPlugin.java: it seems that one can plug in any parser needed. That doesn't really help us, as our goal is to simplify our parsing logic. Is there any way to structure our query parsing logic without needing to write a grammar from scratch, whether it's a Solr component or something else? In a nutshell, I'm trying to get a sense of the best practices in this situation (namely, custom query parsing that's getting very complex) before I dive into implementing a solution. Thanks! Tavi
