Hi everyone, I’ve noticed on the mailing list a few times people asking for a more convenient way to construct an Expression, namely using a string of some sort. I’ve found myself wishing for something like this too when constructing ExecPlans, and so I’ve gone ahead and implemented a parser [0]. I was wondering if anyone had any thoughts about the design of the language?
The current implementation parses a lisp-like language. This language has three types of expressions (mirroring the current Expression API): - A call is a normal s-expression, it has the name of the kernel and the list of arguments. Its arguments can be any expression. - A literal (i.e. scalar) starts with a $ and specifies a type and a value, separated by a colon. For example, `$decimal(12,2):10.01` specifies a literal of type decimal(12, 2) and a value of 10.01. - A field_ref starts with a ! and is an identifier in the schema following the DotPath syntax we already have [1]. So for example, the expression (add $int32:1 (multiply !.a !.b)) computes a*b+1 given a batch with columns named a and b. The reason I chose a lisp-like language is that it very directly translates to the current Expression API and that it feels more natural to use a prefix notation for a language where all functions have a name (i.e. no +, -, *, etc.). I’m currently working on a followup PR for specifying ExecPlans from a string (mainly for easier testing), and would like that language to be an extension of this one. Looking forward to hearing everyone’s thoughts! Thanks, Sasha Krassovsky [0] https://github.com/apache/arrow/pull/14287 <https://github.com/apache/arrow/pull/14287> [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L1726 <https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L1726>