Currently Substrait only has a binary (protobuf) serialization (and a protobuf JSON one but that's not really human writable and barely human readable). Substrait does not have a text serialization. I believe there is some desire for one (maybe Sasha wants to give it a try?). A text format for Substrait would solve this problem because you could go "text expression" -> "substrait expression" -> "arrow expression".
Since no text format exists for Substrait I think that Substrait does not currently solve this problem or overlap with your work. However, at some point (hopefully), it will. There was also a fairly recent proposal for a parser for gandiva expressions[1]. Compared with [1] I think this proposal is simpler to parse but lacks some of the shortcut conveniences (e.g. implicit types for literals, support for common infix operators (+, -, /, ...)). Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I think would be useful to have as one could then do something like `auto arrow_expr = Parse(my_expr, threshold)`. Does a field_ref have to be a field name or can it be a field index? The latter is quite useful when the schema has duplicate field names. I'm +0.5 on this change. I worry a bit about having (eventually) three different syntaxes. However, at the moment we have zero. [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > > Hi David, > Could you elaborate on which part of my proposal overlaps with Substrait? I > don’t see anything in Substrait that allows me to do something along the > lines of > > Expression e = Expression::FromString(“(add !.a $int32:1)”); > > in the code. > > Sasha > > > On Oct 5, 2022, at 1:35 PM, Lee, David <david....@blackrock.com.INVALID> > > wrote: > > > > I believe this is what substrait.io <http://substrait.io/> is trying to > > accomplish.. > > > > Here's some additional info: > > https://substrait.io/ <https://substrait.io/> > > > > https://www.youtube.com/watch?v=5JjaB7p3Sjk > > <https://www.youtube.com/watch?v=5JjaB7p3Sjk> > > > > -----Original Message----- > > From: Sasha Krassovsky <krassovskysa...@gmail.com > > <mailto:krassovskysa...@gmail.com>> > > Sent: Wednesday, October 5, 2022 11:29 AM > > To: dev@arrow.apache.org <mailto:dev@arrow.apache.org> > > Subject: Parser for expressions > > > > External Email: Use caution with links and attachments > > > > > > Hi everyone, > > I’ve noticed on the mailing list a few times people asking for a more > > convenient way to construct an Expression, namely using a string of some > > sort. I’ve found myself wishing for something like this too when > > constructing ExecPlans, and so I’ve gone ahead and implemented a parser > > [0]. I was wondering if anyone had any thoughts about the design of the > > language? > > > > The current implementation parses a lisp-like language. This language has > > three types of expressions (mirroring the current Expression API): > > > > - A call is a normal s-expression, it has the name of the kernel and the > > list of arguments. Its arguments can be any expression. > > - A literal (i.e. scalar) starts with a $ and specifies a type and a value, > > separated by a colon. For example, `$decimal(12,2):10.01` specifies a > > literal of type decimal(12, 2) and a value of 10.01. > > - A field_ref starts with a ! and is an identifier in the schema following > > the DotPath syntax we already have [1]. > > > > So for example, the expression > > > > (add $int32:1 (multiply !.a !.b)) > > > > computes a*b+1 given a batch with columns named a and b. > > > > The reason I chose a lisp-like language is that it very directly translates > > to the current Expression API and that it feels more natural to use a > > prefix notation for a language where all functions have a name (i.e. no +, > > -, *, etc.). > > > > I’m currently working on a followup PR for specifying ExecPlans from a > > string (mainly for easier testing), and would like that language to be an > > extension of this one. Looking forward to hearing everyone’s thoughts! > > > > Thanks, > > Sasha Krassovsky > > > > [0] > > https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> > > > > > [1] > > https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ > > > > <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> > > > > > > > > > > > This message may contain information that is confidential or privileged. If > > you are not the intended recipient, please advise the sender immediately > > and delete this message. See > > http://www.blackrock.com/corporate/compliance/email-disclaimers > > <http://www.blackrock.com/corporate/compliance/email-disclaimers> for > > further information. Please refer to > > http://www.blackrock.com/corporate/compliance/privacy-policy > > <http://www.blackrock.com/corporate/compliance/privacy-policy> for more > > information about BlackRock’s Privacy Policy. > > > > > > For a list of BlackRock's office addresses worldwide, see > > http://www.blackrock.com/corporate/about-us/contacts-locations > > <http://www.blackrock.com/corporate/about-us/contacts-locations>. > > > > © 2022 BlackRock, Inc. All rights reserved. >