Kirill,

I was assuming gandiva was using the same expression code as the datasets.
If that's not the case then that issue isn't relevant. I don't have strong
views on how the parsing is done. Is there any interest in this being added
to the C++ or Python dataset api? Is there any point in me submitting a PR
for a Python only implementation?

Josh

On Mon, Feb 8, 2021 at 4:45 PM Kirill Lykov <lykov.kir...@gmail.com> wrote:

> The jira issue is about gandiva. Do you mean filtering in this module only?
> If yes, I did some experiments with tatsu and gandiva instead of antlr. And
> it was pretty nice and clean to develop PoC for sql-like language. In c++,
> antlr model is a bit old school. Also I prefer PEG grammar over lalr. Don't
> know a nice CC in cpp for peg grammars. Probably, boost spirit?
>
> On Mon, Feb 8, 2021, 10:23 PM Josh Mayer <joshuaama...@gmail.com> wrote:
>
> > It would be useful to be able to create a filter expression from a
> string,
> > e.g. "date == '2020-01-01' and value >= 1" instead of (field("date") ==
> > '2020-01-01') & (field("value") >= 1).
> >
> > There are some existing libraries that make it pretty easy to do in
> Python
> > (see here <
> https://gist.github.com/josham/e5a13a16e9f18d7b9056127ac522cf23
> > >)
> > though an old issue ARROW-3458
> > <https://issues.apache.org/jira/browse/ARROW-3458> suggests using Antlr
> > and
> > C++.  If a Python only solution is OK I'd be happy to work on adding the
> > feature.  If Antlr/C++ is preferred I can help with the grammar and
> testing
> > but probably not the best person to do the C++ work.
> >
> > Josh
> >
>

Reply via email to