Kirill, I was assuming gandiva was using the same expression code as the datasets. If that's not the case then that issue isn't relevant. I don't have strong views on how the parsing is done. Is there any interest in this being added to the C++ or Python dataset api? Is there any point in me submitting a PR for a Python only implementation?
Josh On Mon, Feb 8, 2021 at 4:45 PM Kirill Lykov <lykov.kir...@gmail.com> wrote: > The jira issue is about gandiva. Do you mean filtering in this module only? > If yes, I did some experiments with tatsu and gandiva instead of antlr. And > it was pretty nice and clean to develop PoC for sql-like language. In c++, > antlr model is a bit old school. Also I prefer PEG grammar over lalr. Don't > know a nice CC in cpp for peg grammars. Probably, boost spirit? > > On Mon, Feb 8, 2021, 10:23 PM Josh Mayer <joshuaama...@gmail.com> wrote: > > > It would be useful to be able to create a filter expression from a > string, > > e.g. "date == '2020-01-01' and value >= 1" instead of (field("date") == > > '2020-01-01') & (field("value") >= 1). > > > > There are some existing libraries that make it pretty easy to do in > Python > > (see here < > https://gist.github.com/josham/e5a13a16e9f18d7b9056127ac522cf23 > > >) > > though an old issue ARROW-3458 > > <https://issues.apache.org/jira/browse/ARROW-3458> suggests using Antlr > > and > > C++. If a Python only solution is OK I'd be happy to work on adding the > > feature. If Antlr/C++ is preferred I can help with the grammar and > testing > > but probably not the best person to do the C++ work. > > > > Josh > > >