SQL is nearly universally understood so unless there is a compelling reason I tend to use that as my default.
I don't see any particular advantage to favoring "(add x y)" over "add(x, y)" I will acknowledge that there are downsides to supporting x + y, I think you listed these out already. So, for exprssions, I think it'd be fine if Acero initially supported "add(x, y)" without supporting infix operators (and gandiva supported both) as long as there is a clear error message (e.g. "please use add(x,y) instead of x+y"). This simplifies parsing and should avoid confusion between the two. If you want to then provide support for nodes / relations I think we will need to deviate from SQL as it is simply not expressive enough. On Mon, Oct 10, 2022 at 12:17 PM Antoine Pitrou <anto...@python.org> wrote: > > > I don't see the point of having two different syntaxes. > > Also, IMHO lisp-style is harder for many people, so I would rather a > more "traditional" syntax (though Lisp is historically traditional, of > course ;-)). > > > Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit : > > Yes that makes a lot of sense! I’d agree that it would probably be fine to > > have two different syntaxes, seeing as the use-cases are a bit different. > > > > Did anyone else have any thoughts? Either on the lisp-style syntax for > > Arrow’s Expressions or on having two different syntaxes? (Weston or > > Antoine?) > > > > Sasha > > > >> On Oct 9, 2022, at 5:38 AM, Jin Shang <shangjin1...@gmail.com> wrote: > >> > >> Hi Sasha, > >> > >> I agree with your points. However Gandiva is kind of specialized in > >> computing arithmetic expressions and it offers little to none > >> non-arithmetic operations. So it is very helpful if its parser understands > >> natural math expressions. > >> > >> Considering that Gandiva is a relatively independent component within the > >> arrow project, and that it’s only a math expression compiler rather than a > >> fully functioned compute engine, maybe it’s acceptable for Gandiva to have > >> its own grammar different from compute/Acero/Substrait etc. > >> > >> Best, > >> Jin > >> > >>> 2022年10月8日 03:01,Sasha Krassovsky <krassovskysa...@gmail.com> 写道: > >>> > >>> Hi Jin, > >>> I agree it would be good to standardize on a syntax. To me, the > >>> advantages of the lisp-style syntax are: > >>> - don’t have to define/implement any kind of precedence rules > >>> - has a uniform syntax (no distinction between prefix and infix operators) > >>> - avoids having “special” functions that have an associated arithmetic > >>> symbol > >>> - translates directly to the underlying Expression infrastructure. > >>> > >>> The advantage of the Python-style syntax is that it’s more natural to use > >>> for arithmetic expressions. However, I think for non-arithmetic > >>> expressions this syntax would be more cumbersome. > >>> > >>> Either would work of course, I guess it just depends on the goal. I was > >>> thinking the string representation wouldn’t represent any significant > >>> level of abstraction, it is just a convenience to save on clutter when > >>> typing out expressions. > >>> > >>> Sasha > >>> > >>>> 6 окт. 2022 г., в 22:20, Jin Shang <shangjin1...@gmail.com> написал(а): > >>>> > >>>> Hi Sasha and Weston, > >>>> > >>>> I'm the author of the mentioned Gandiva parser. I agree that having one > >>>> unified syntax is ideal. I think one critical divergence between Sasha's > >>>> and my proposals is that mine is with C++/Python imperative style (foo(x, > >>>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a > >>>> b)…). I feel like it'll be better for us to settle on one of the styles > >>>> before we start implementing the parsers. > >>>> > >>>> Best, > >>>> Jin > >>>> > >>>>> On Friday, October 7, 2022, Sasha Krassovsky <krassovskysa...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> Hi Weston, > >>>>> I’d be happy to donate something like this to Sunstrait if that’s > >>>>> useful, > >>>>> I was thinking of proving out a design here before going there. However > >>>>> we > >>>>> could also just go straight there :) > >>>>> > >>>>> Regarding infix operators and such the edge case I was thinking of is > >>>>> that > >>>>> a user could potentially add a kernel to the registry called e.g. “+”. > >>>>> Would the parser implicitly convert any instances of “+” to “add” and > >>>>> break > >>>>> that? > >>>>> > >>>>> Implicit typing for literals and parameters can probably also be added > >>>>> without issues to the current scheme. Would the parameters be passed as > >>>>> an > >>>>> std::unordered_map? > >>>>> > >>>>>> Does a field_ref have to be a field name or can it be a field index? > >>>>> > >>>>> It can be a field index or even a field path. The field ref is parsed > >>>>> using FieldRef::FromDotPath ([1] in my original message), which can > >>>>> express > >>>>> any FieldRef. > >>>>> > >>>>> Sasha > >>>>> > >>>>>>> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> > >>>>>>> написал(а): > >>>>>> > >>>>>> Currently Substrait only has a binary (protobuf) serialization (and a > >>>>>> protobuf JSON one but that's not really human writable and barely > >>>>>> human readable). Substrait does not have a text serialization. I > >>>>>> believe there is some desire for one (maybe Sasha wants to give it a > >>>>>> try?). A text format for Substrait would solve this problem because > >>>>>> you could go "text expression" -> "substrait expression" -> "arrow > >>>>>> expression". > >>>>>> > >>>>>> Since no text format exists for Substrait I think that Substrait does > >>>>>> not currently solve this problem or overlap with your work. However, > >>>>>> at some point (hopefully), it will. > >>>>>> > >>>>>> There was also a fairly recent proposal for a parser for gandiva > >>>>> expressions[1]. > >>>>>> > >>>>>> Compared with [1] I think this proposal is simpler to parse but lacks > >>>>>> some of the shortcut conveniences (e.g. implicit types for literals, > >>>>>> support for common infix operators (+, -, /, ...)). > >>>>>> > >>>>>> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I > >>>>> think > >>>>>> would be useful to have as one could then do something like `auto > >>>>>> arrow_expr = Parse(my_expr, threshold)`. > >>>>>> > >>>>>> Does a field_ref have to be a field name or can it be a field index? > >>>>>> The latter is quite useful when the schema has duplicate field names. > >>>>>> > >>>>>> I'm +0.5 on this change. I worry a bit about having (eventually) > >>>>>> three different syntaxes. However, at the moment we have zero. > >>>>>> > >>>>>> [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn > >>>>>> > >>>>>>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky > >>>>>>> <krassovskysa...@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi David, > >>>>>>> Could you elaborate on which part of my proposal overlaps with > >>>>> Substrait? I don’t see anything in Substrait that allows me to do > >>>>> something > >>>>> along the lines of > >>>>>>> > >>>>>>> Expression e = Expression::FromString(“(add !.a $int32:1)”); > >>>>>>> > >>>>>>> in the code. > >>>>>>> > >>>>>>> Sasha > >>>>>>> > >>>>>>>>> On Oct 5, 2022, at 1:35 PM, Lee, David > >>>>>>>>> <david....@blackrock.com.INVALID> > >>>>> wrote: > >>>>>>>> > >>>>>>>> I believe this is what substrait.io <http://substrait.io/> is trying > >>>>> to accomplish.. > >>>>>>>> > >>>>>>>> Here's some additional info: > >>>>>>>> https://substrait.io/ <https://substrait.io/> > >>>>>>>> > >>>>>>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk <https://www.youtube.com/ > >>>>> watch?v=5JjaB7p3Sjk> > >>>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto: > >>>>> krassovskysa...@gmail.com>> > >>>>>>>> Sent: Wednesday, October 5, 2022 11:29 AM > >>>>>>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org> > >>>>>>>> Subject: Parser for expressions > >>>>>>>> > >>>>>>>> External Email: Use caution with links and attachments > >>>>>>>> > >>>>>>>> > >>>>>>>> Hi everyone, > >>>>>>>> I’ve noticed on the mailing list a few times people asking for a more > >>>>> convenient way to construct an Expression, namely using a string of some > >>>>> sort. I’ve found myself wishing for something like this too when > >>>>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser > >>>>> [0]. I was wondering if anyone had any thoughts about the design of the > >>>>> language? > >>>>>>>> > >>>>>>>> The current implementation parses a lisp-like language. This language > >>>>> has three types of expressions (mirroring the current Expression API): > >>>>>>>> > >>>>>>>> - A call is a normal s-expression, it has the name of the kernel and > >>>>> the list of arguments. Its arguments can be any expression. > >>>>>>>> - A literal (i.e. scalar) starts with a $ and specifies a type and a > >>>>> value, separated by a colon. For example, `$decimal(12,2):10.01` > >>>>> specifies > >>>>> a literal of type decimal(12, 2) and a value of 10.01. > >>>>>>>> - A field_ref starts with a ! and is an identifier in the schema > >>>>> following the DotPath syntax we already have [1]. > >>>>>>>> > >>>>>>>> So for example, the expression > >>>>>>>> > >>>>>>>> (add $int32:1 (multiply !.a !.b)) > >>>>>>>> > >>>>>>>> computes a*b+1 given a batch with columns named a and b. > >>>>>>>> > >>>>>>>> The reason I chose a lisp-like language is that it very directly > >>>>> translates to the current Expression API and that it feels more natural > >>>>> to > >>>>> use a prefix notation for a language where all functions have a name > >>>>> (i.e. > >>>>> no +, -, *, etc.). > >>>>>>>> > >>>>>>>> I’m currently working on a followup PR for specifying ExecPlans from > >>>>>>>> a > >>>>> string (mainly for easier testing), and would like that language to be > >>>>> an > >>>>> extension of this one. Looking forward to hearing everyone’s thoughts! > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Sasha Krassovsky > >>>>>>>> > >>>>>>>> [0] https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < > >>>>> https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> < > >>>>> https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < > >>>>> https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> > > >>>>>>>> [1] https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! > >>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ > >>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__ > >>>>> https://github.com/apache/arrow/blob/master/cpp/src/ > >>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> < > >>>>> https://urldefense.com/v3/__https://github.com/apache/ > >>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! > >>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ > >>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__ > >>>>> https://github.com/apache/arrow/blob/master/cpp/src/ > >>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 > >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> > > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> This message may contain information that is confidential or > >>>>> privileged. If you are not the intended recipient, please advise the > >>>>> sender > >>>>> immediately and delete this message. See http://www.blackrock.com/ > >>>>> corporate/compliance/email-disclaimers <http://www.blackrock.com/ > >>>>> corporate/compliance/email-disclaimers> for further information. Please > >>>>> refer to http://www.blackrock.com/corporate/compliance/privacy-policy < > >>>>> http://www.blackrock.com/corporate/compliance/privacy-policy> for more > >>>>> information about BlackRock’s Privacy Policy. > >>>>>>>> > >>>>>>>> > >>>>>>>> For a list of BlackRock's office addresses worldwide, see > >>>>> http://www.blackrock.com/corporate/about-us/contacts-locations < > >>>>> http://www.blackrock.com/corporate/about-us/contacts-locations>. > >>>>>>>> > >>>>>>>> © 2022 BlackRock, Inc. All rights reserved. > >>>>>>> > >>>>> > >> > >