I have a library that uses Python’s AST module to parse Python expressions and map them to Arrow Dataset expressions. I could extract the AST bits into a repo if you’re interested. It’s really simple but could serve as inspiration.
It allows us to do things like: table = path.read_table(“valid_from < date <= valid_to and security_id in [...]”) which is pretty handy when you’re in IPython or Jupyter. > On Feb 8, 2021, at 15:23, Josh Mayer <joshuaama...@gmail.com> wrote: > > It would be useful to be able to create a filter expression from a string, > e.g. "date == '2020-01-01' and value >= 1" instead of (field("date") == > '2020-01-01') & (field("value") >= 1). > > There are some existing libraries that make it pretty easy to do in Python > (see here <https://gist.github.com/josham/e5a13a16e9f18d7b9056127ac522cf23>) > though an old issue ARROW-3458 > <https://issues.apache.org/jira/browse/ARROW-3458> suggests using Antlr and > C++. If a Python only solution is OK I'd be happy to work on adding the > feature. If Antlr/C++ is preferred I can help with the grammar and testing > but probably not the best person to do the C++ work. > > Josh