SQL is nearly universally understood so unless there is a compelling
reason I tend to use that as my default.

I don't see any particular advantage to favoring "(add x y)" over "add(x, y)"

I will acknowledge that there are downsides to supporting x + y, I
think you listed these out already.

So, for exprssions, I think it'd be fine if Acero initially supported
"add(x, y)" without supporting infix operators (and gandiva supported
both) as long as there is a clear error message (e.g. "please use
add(x,y) instead of x+y").  This simplifies parsing and should avoid
confusion between the two.

If you want to then provide support for nodes / relations I think we
will need to deviate from SQL as it is simply not expressive enough.

On Mon, Oct 10, 2022 at 12:17 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> I don't see the point of having two different syntaxes.
>
> Also, IMHO lisp-style is harder for many people, so I would rather a
> more "traditional" syntax (though Lisp is historically traditional, of
> course ;-)).
>
>
> Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit :
> > Yes that makes a lot of sense! I’d agree that it would probably be fine to 
> > have two different syntaxes, seeing as the use-cases are a bit different.
> >
> > Did anyone else have any thoughts? Either on the lisp-style syntax for 
> > Arrow’s Expressions or on having two different syntaxes? (Weston or 
> > Antoine?)
> >
> > Sasha
> >
> >> On Oct 9, 2022, at 5:38 AM, Jin Shang <shangjin1...@gmail.com> wrote:
> >>
> >> Hi Sasha,
> >>
> >> I agree with your points. However Gandiva is kind of specialized in 
> >> computing arithmetic expressions and it offers little to none 
> >> non-arithmetic operations. So it is very helpful if its parser understands 
> >> natural math expressions.
> >>
> >> Considering that Gandiva is a relatively independent component within the 
> >> arrow project, and that it’s only a math expression compiler rather than a 
> >> fully functioned compute engine, maybe it’s acceptable for Gandiva to have 
> >> its own grammar different from compute/Acero/Substrait etc.
> >>
> >> Best,
> >> Jin
> >>
> >>> 2022年10月8日 03:01,Sasha Krassovsky <krassovskysa...@gmail.com> 写道:
> >>>
> >>> Hi Jin,
> >>> I agree it would be good to standardize on a syntax. To me, the 
> >>> advantages of the lisp-style syntax are:
> >>> - don’t have to define/implement any kind of precedence rules
> >>> - has a uniform syntax (no distinction between prefix and infix operators)
> >>> - avoids having “special” functions that have an associated arithmetic 
> >>> symbol
> >>> - translates directly to the underlying Expression infrastructure.
> >>>
> >>> The advantage of the Python-style syntax is that it’s more natural to use 
> >>> for arithmetic expressions. However, I think for non-arithmetic 
> >>> expressions this syntax would be more cumbersome.
> >>>
> >>> Either would work of course, I guess it just depends on the goal. I was 
> >>> thinking the string representation wouldn’t represent any significant 
> >>> level of abstraction, it is just a convenience to save on clutter when 
> >>> typing out expressions.
> >>>
> >>> Sasha
> >>>
> >>>> 6 окт. 2022 г., в 22:20, Jin Shang <shangjin1...@gmail.com> написал(а):
> >>>>
> >>>> Hi Sasha and Weston,
> >>>>
> >>>> I'm the author of the mentioned Gandiva parser. I agree that having one
> >>>> unified syntax is ideal. I think one critical divergence between Sasha's
> >>>> and my proposals is that mine is with C++/Python imperative style (foo(x,
> >>>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a
> >>>> b)…). I feel like it'll be better for us to settle on one of the styles
> >>>> before we start implementing the parsers.
> >>>>
> >>>> Best,
> >>>> Jin
> >>>>
> >>>>> On Friday, October 7, 2022, Sasha Krassovsky <krassovskysa...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> Hi Weston,
> >>>>> I’d be happy to donate something like this to Sunstrait if that’s 
> >>>>> useful,
> >>>>> I was thinking of proving out a design here before going there. However 
> >>>>> we
> >>>>> could also just go straight there :)
> >>>>>
> >>>>> Regarding infix operators and such the edge case I was thinking of is 
> >>>>> that
> >>>>> a user could potentially add a kernel to the registry called e.g. “+”.
> >>>>> Would the parser implicitly convert any instances of “+” to “add” and 
> >>>>> break
> >>>>> that?
> >>>>>
> >>>>> Implicit typing for literals and parameters can probably also be added
> >>>>> without issues to the current scheme. Would the parameters be passed as 
> >>>>> an
> >>>>> std::unordered_map?
> >>>>>
> >>>>>> Does a field_ref have to be a field name or can it be a field index?
> >>>>>
> >>>>> It can be a field index or even a field path. The field ref is parsed
> >>>>> using FieldRef::FromDotPath ([1] in my original message), which can 
> >>>>> express
> >>>>> any FieldRef.
> >>>>>
> >>>>> Sasha
> >>>>>
> >>>>>>> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> 
> >>>>>>> написал(а):
> >>>>>>
> >>>>>> Currently Substrait only has a binary (protobuf) serialization (and a
> >>>>>> protobuf JSON one but that's not really human writable and barely
> >>>>>> human readable).  Substrait does not have a text serialization.  I
> >>>>>> believe there is some desire for one (maybe Sasha wants to give it a
> >>>>>> try?).  A text format for Substrait would solve this problem because
> >>>>>> you could go "text expression" -> "substrait expression" -> "arrow
> >>>>>> expression".
> >>>>>>
> >>>>>> Since no text format exists for Substrait I think that Substrait does
> >>>>>> not currently solve this problem or overlap with your work.  However,
> >>>>>> at some point (hopefully), it will.
> >>>>>>
> >>>>>> There was also a fairly recent proposal for a parser for gandiva
> >>>>> expressions[1].
> >>>>>>
> >>>>>> Compared with [1] I think this proposal is simpler to parse but lacks
> >>>>>> some of the shortcut conveniences (e.g. implicit types for literals,
> >>>>>> support for common infix operators (+, -, /, ...)).
> >>>>>>
> >>>>>> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I
> >>>>> think
> >>>>>> would be useful to have as one could then do something like `auto
> >>>>>> arrow_expr = Parse(my_expr, threshold)`.
> >>>>>>
> >>>>>> Does a field_ref have to be a field name or can it be a field index?
> >>>>>> The latter is quite useful when the schema has duplicate field names.
> >>>>>>
> >>>>>> I'm +0.5 on this change.  I worry a bit about having (eventually)
> >>>>>> three different syntaxes.  However, at the moment we have zero.
> >>>>>>
> >>>>>> [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn
> >>>>>>
> >>>>>>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky
> >>>>>>> <krassovskysa...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi David,
> >>>>>>> Could you elaborate on which part of my proposal overlaps with
> >>>>> Substrait? I don’t see anything in Substrait that allows me to do 
> >>>>> something
> >>>>> along the lines of
> >>>>>>>
> >>>>>>> Expression e = Expression::FromString(“(add !.a $int32:1)”);
> >>>>>>>
> >>>>>>> in the code.
> >>>>>>>
> >>>>>>> Sasha
> >>>>>>>
> >>>>>>>>> On Oct 5, 2022, at 1:35 PM, Lee, David 
> >>>>>>>>> <david....@blackrock.com.INVALID>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> I believe this is what substrait.io <http://substrait.io/> is trying
> >>>>> to accomplish..
> >>>>>>>>
> >>>>>>>> Here's some additional info:
> >>>>>>>> https://substrait.io/ <https://substrait.io/>
> >>>>>>>>
> >>>>>>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk <https://www.youtube.com/
> >>>>> watch?v=5JjaB7p3Sjk>
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto:
> >>>>> krassovskysa...@gmail.com>>
> >>>>>>>> Sent: Wednesday, October 5, 2022 11:29 AM
> >>>>>>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org>
> >>>>>>>> Subject: Parser for expressions
> >>>>>>>>
> >>>>>>>> External Email: Use caution with links and attachments
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>> I’ve noticed on the mailing list a few times people asking for a more
> >>>>> convenient way to construct an Expression, namely using a string of some
> >>>>> sort. I’ve found myself wishing for something like this too when
> >>>>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser
> >>>>> [0]. I was wondering if anyone had any thoughts about the design of the
> >>>>> language?
> >>>>>>>>
> >>>>>>>> The current implementation parses a lisp-like language. This language
> >>>>> has three types of expressions (mirroring the current Expression API):
> >>>>>>>>
> >>>>>>>> - A call is a normal s-expression, it has the name of the kernel and
> >>>>> the list of arguments. Its arguments can be any expression.
> >>>>>>>> - A literal (i.e. scalar) starts with a $ and specifies a type and a
> >>>>> value, separated by a colon. For example, `$decimal(12,2):10.01` 
> >>>>> specifies
> >>>>> a literal of type decimal(12, 2) and a value of 10.01.
> >>>>>>>> - A field_ref starts with a ! and is an identifier in the schema
> >>>>> following the DotPath syntax we already have [1].
> >>>>>>>>
> >>>>>>>> So for example, the expression
> >>>>>>>>
> >>>>>>>> (add $int32:1 (multiply !.a !.b))
> >>>>>>>>
> >>>>>>>> computes a*b+1 given a batch with columns named a and b.
> >>>>>>>>
> >>>>>>>> The reason I chose a lisp-like language is that it very directly
> >>>>> translates to the current Expression API and that it feels more natural 
> >>>>> to
> >>>>> use a prefix notation for a language where all functions have a name 
> >>>>> (i.e.
> >>>>> no +, -, *, etc.).
> >>>>>>>>
> >>>>>>>> I’m currently working on a followup PR for specifying ExecPlans from 
> >>>>>>>> a
> >>>>> string (mainly for easier testing), and would like that language to be 
> >>>>> an
> >>>>> extension of this one. Looking forward to hearing everyone’s thoughts!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Sasha Krassovsky
> >>>>>>>>
> >>>>>>>> [0] https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
> >>>>> https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>   <
> >>>>> https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
> >>>>> https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>  >
> >>>>>>>> [1] https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
> >>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
> >>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
> >>>>> https://github.com/apache/arrow/blob/master/cpp/src/
> >>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>   <
> >>>>> https://urldefense.com/v3/__https://github.com/apache/
> >>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
> >>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
> >>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
> >>>>> https://github.com/apache/arrow/blob/master/cpp/src/
> >>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> >>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>  >
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This message may contain information that is confidential or
> >>>>> privileged. If you are not the intended recipient, please advise the 
> >>>>> sender
> >>>>> immediately and delete this message. See http://www.blackrock.com/
> >>>>> corporate/compliance/email-disclaimers <http://www.blackrock.com/
> >>>>> corporate/compliance/email-disclaimers> for further information.  Please
> >>>>> refer to http://www.blackrock.com/corporate/compliance/privacy-policy <
> >>>>> http://www.blackrock.com/corporate/compliance/privacy-policy> for more
> >>>>> information about BlackRock’s Privacy Policy.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> For a list of BlackRock's office addresses worldwide, see
> >>>>> http://www.blackrock.com/corporate/about-us/contacts-locations <
> >>>>> http://www.blackrock.com/corporate/about-us/contacts-locations>.
> >>>>>>>>
> >>>>>>>> © 2022 BlackRock, Inc. All rights reserved.
> >>>>>>>
> >>>>>
> >>
> >

Reply via email to