Hi everyone, I’d be fine with switching it to add(x, y). I’ll look into round-trip support, I imagine we can massage the ToString implementation a bit as well to make it easier to parse back.
Did anyone have opinions about the syntax for FieldRefs or Scalars? Scalars of the form $type:value make them particularly easy to parse and I think are easy enough to read/write. Sasha > 12 окт. 2022 г., в 09:03, Joris Van den Bossche > <jorisvandenboss...@gmail.com> написал(а): > > Another advantage of "add(x, y)" is that this matches our current string > representation for expressions. > > Although that might give the impression that we support anything that we > output as string, and so that raises the question if we want to make this > explicit: if we add parsing capabilities, would it be a goal to be able to > roundtrip (simple) expressions in ToString -> parse again? > > Joris > >> On Tue, 11 Oct 2022 at 18:59, Weston Pace <weston.p...@gmail.com> wrote: >> >> SQL is nearly universally understood so unless there is a compelling >> reason I tend to use that as my default. >> >> I don't see any particular advantage to favoring "(add x y)" over "add(x, >> y)" >> >> I will acknowledge that there are downsides to supporting x + y, I >> think you listed these out already. >> >> So, for exprssions, I think it'd be fine if Acero initially supported >> "add(x, y)" without supporting infix operators (and gandiva supported >> both) as long as there is a clear error message (e.g. "please use >> add(x,y) instead of x+y"). This simplifies parsing and should avoid >> confusion between the two. >> >> If you want to then provide support for nodes / relations I think we >> will need to deviate from SQL as it is simply not expressive enough. >> >>> On Mon, Oct 10, 2022 at 12:17 PM Antoine Pitrou <anto...@python.org> >>> wrote: >>> >>> >>> I don't see the point of having two different syntaxes. >>> >>> Also, IMHO lisp-style is harder for many people, so I would rather a >>> more "traditional" syntax (though Lisp is historically traditional, of >>> course ;-)). >>> >>> >>> Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit : >>>> Yes that makes a lot of sense! I’d agree that it would probably be >> fine to have two different syntaxes, seeing as the use-cases are a bit >> different. >>>> >>>> Did anyone else have any thoughts? Either on the lisp-style syntax for >> Arrow’s Expressions or on having two different syntaxes? (Weston or >> Antoine?) >>>> >>>> Sasha >>>> >>>>> On Oct 9, 2022, at 5:38 AM, Jin Shang <shangjin1...@gmail.com> wrote: >>>>> >>>>> Hi Sasha, >>>>> >>>>> I agree with your points. However Gandiva is kind of specialized in >> computing arithmetic expressions and it offers little to none >> non-arithmetic operations. So it is very helpful if its parser understands >> natural math expressions. >>>>> >>>>> Considering that Gandiva is a relatively independent component within >> the arrow project, and that it’s only a math expression compiler rather >> than a fully functioned compute engine, maybe it’s acceptable for Gandiva >> to have its own grammar different from compute/Acero/Substrait etc. >>>>> >>>>> Best, >>>>> Jin >>>>> >>>>>> 2022年10月8日 03:01,Sasha Krassovsky <krassovskysa...@gmail.com> 写道: >>>>>> >>>>>> Hi Jin, >>>>>> I agree it would be good to standardize on a syntax. To me, the >> advantages of the lisp-style syntax are: >>>>>> - don’t have to define/implement any kind of precedence rules >>>>>> - has a uniform syntax (no distinction between prefix and infix >> operators) >>>>>> - avoids having “special” functions that have an associated >> arithmetic symbol >>>>>> - translates directly to the underlying Expression infrastructure. >>>>>> >>>>>> The advantage of the Python-style syntax is that it’s more natural >> to use for arithmetic expressions. However, I think for non-arithmetic >> expressions this syntax would be more cumbersome. >>>>>> >>>>>> Either would work of course, I guess it just depends on the goal. I >> was thinking the string representation wouldn’t represent any significant >> level of abstraction, it is just a convenience to save on clutter when >> typing out expressions. >>>>>> >>>>>> Sasha >>>>>> >>>>>>> 6 окт. 2022 г., в 22:20, Jin Shang <shangjin1...@gmail.com> >> написал(а): >>>>>>> >>>>>>> Hi Sasha and Weston, >>>>>>> >>>>>>> I'm the author of the mentioned Gandiva parser. I agree that having >> one >>>>>>> unified syntax is ideal. I think one critical divergence between >> Sasha's >>>>>>> and my proposals is that mine is with C++/Python imperative style >> (foo(x, >>>>>>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y >> z), (+ a >>>>>>> b)…). I feel like it'll be better for us to settle on one of the >> styles >>>>>>> before we start implementing the parsers. >>>>>>> >>>>>>> Best, >>>>>>> Jin >>>>>>> >>>>>>>> On Friday, October 7, 2022, Sasha Krassovsky < >> krassovskysa...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Weston, >>>>>>>> I’d be happy to donate something like this to Sunstrait if that’s >> useful, >>>>>>>> I was thinking of proving out a design here before going there. >> However we >>>>>>>> could also just go straight there :) >>>>>>>> >>>>>>>> Regarding infix operators and such the edge case I was thinking of >> is that >>>>>>>> a user could potentially add a kernel to the registry called e.g. >> “+”. >>>>>>>> Would the parser implicitly convert any instances of “+” to “add” >> and break >>>>>>>> that? >>>>>>>> >>>>>>>> Implicit typing for literals and parameters can probably also be >> added >>>>>>>> without issues to the current scheme. Would the parameters be >> passed as an >>>>>>>> std::unordered_map? >>>>>>>> >>>>>>>>> Does a field_ref have to be a field name or can it be a field >> index? >>>>>>>> >>>>>>>> It can be a field index or even a field path. The field ref is >> parsed >>>>>>>> using FieldRef::FromDotPath ([1] in my original message), which >> can express >>>>>>>> any FieldRef. >>>>>>>> >>>>>>>> Sasha >>>>>>>> >>>>>>>>>> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> >> написал(а): >>>>>>>>> >>>>>>>>> Currently Substrait only has a binary (protobuf) serialization >> (and a >>>>>>>>> protobuf JSON one but that's not really human writable and barely >>>>>>>>> human readable). Substrait does not have a text serialization. I >>>>>>>>> believe there is some desire for one (maybe Sasha wants to give >> it a >>>>>>>>> try?). A text format for Substrait would solve this problem >> because >>>>>>>>> you could go "text expression" -> "substrait expression" -> "arrow >>>>>>>>> expression". >>>>>>>>> >>>>>>>>> Since no text format exists for Substrait I think that Substrait >> does >>>>>>>>> not currently solve this problem or overlap with your work. >> However, >>>>>>>>> at some point (hopefully), it will. >>>>>>>>> >>>>>>>>> There was also a fairly recent proposal for a parser for gandiva >>>>>>>> expressions[1]. >>>>>>>>> >>>>>>>>> Compared with [1] I think this proposal is simpler to parse but >> lacks >>>>>>>>> some of the shortcut conveniences (e.g. implicit types for >> literals, >>>>>>>>> support for common infix operators (+, -, /, ...)). >>>>>>>>> >>>>>>>>> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" >> which I >>>>>>>> think >>>>>>>>> would be useful to have as one could then do something like `auto >>>>>>>>> arrow_expr = Parse(my_expr, threshold)`. >>>>>>>>> >>>>>>>>> Does a field_ref have to be a field name or can it be a field >> index? >>>>>>>>> The latter is quite useful when the schema has duplicate field >> names. >>>>>>>>> >>>>>>>>> I'm +0.5 on this change. I worry a bit about having (eventually) >>>>>>>>> three different syntaxes. However, at the moment we have zero. >>>>>>>>> >>>>>>>>> [1] >> https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn >>>>>>>>> >>>>>>>>>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky >>>>>>>>>> <krassovskysa...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi David, >>>>>>>>>> Could you elaborate on which part of my proposal overlaps with >>>>>>>> Substrait? I don’t see anything in Substrait that allows me to do >> something >>>>>>>> along the lines of >>>>>>>>>> >>>>>>>>>> Expression e = Expression::FromString(“(add !.a $int32:1)”); >>>>>>>>>> >>>>>>>>>> in the code. >>>>>>>>>> >>>>>>>>>> Sasha >>>>>>>>>> >>>>>>>>>>>> On Oct 5, 2022, at 1:35 PM, Lee, David < >> david....@blackrock.com.INVALID> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I believe this is what substrait.io <http://substrait.io/> is >> trying >>>>>>>> to accomplish.. >>>>>>>>>>> >>>>>>>>>>> Here's some additional info: >>>>>>>>>>> https://substrait.io/ <https://substrait.io/> >>>>>>>>>>> >>>>>>>>>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk < >> https://www.youtube.com/ >>>>>>>> watch?v=5JjaB7p3Sjk> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto: >>>>>>>> krassovskysa...@gmail.com>> >>>>>>>>>>> Sent: Wednesday, October 5, 2022 11:29 AM >>>>>>>>>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org> >>>>>>>>>>> Subject: Parser for expressions >>>>>>>>>>> >>>>>>>>>>> External Email: Use caution with links and attachments >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> I’ve noticed on the mailing list a few times people asking for >> a more >>>>>>>> convenient way to construct an Expression, namely using a string >> of some >>>>>>>> sort. I’ve found myself wishing for something like this too when >>>>>>>> constructing ExecPlans, and so I’ve gone ahead and implemented a >> parser >>>>>>>> [0]. I was wondering if anyone had any thoughts about the design >> of the >>>>>>>> language? >>>>>>>>>>> >>>>>>>>>>> The current implementation parses a lisp-like language. This >> language >>>>>>>> has three types of expressions (mirroring the current Expression >> API): >>>>>>>>>>> >>>>>>>>>>> - A call is a normal s-expression, it has the name of the >> kernel and >>>>>>>> the list of arguments. Its arguments can be any expression. >>>>>>>>>>> - A literal (i.e. scalar) starts with a $ and specifies a type >> and a >>>>>>>> value, separated by a colon. For example, `$decimal(12,2):10.01` >> specifies >>>>>>>> a literal of type decimal(12, 2) and a value of 10.01. >>>>>>>>>>> - A field_ref starts with a ! and is an identifier in the schema >>>>>>>> following the DotPath syntax we already have [1]. >>>>>>>>>>> >>>>>>>>>>> So for example, the expression >>>>>>>>>>> >>>>>>>>>>> (add $int32:1 (multiply !.a !.b)) >>>>>>>>>>> >>>>>>>>>>> computes a*b+1 given a batch with columns named a and b. >>>>>>>>>>> >>>>>>>>>>> The reason I chose a lisp-like language is that it very directly >>>>>>>> translates to the current Expression API and that it feels more >> natural to >>>>>>>> use a prefix notation for a language where all functions have a >> name (i.e. >>>>>>>> no +, -, *, etc.). >>>>>>>>>>> >>>>>>>>>>> I’m currently working on a followup PR for specifying ExecPlans >> from a >>>>>>>> string (mainly for easier testing), and would like that language >> to be an >>>>>>>> extension of this one. Looking forward to hearing everyone’s >> thoughts! >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sasha Krassovsky >>>>>>>>>>> >>>>>>>>>>> [0] https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < >>>>>>>> https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> >> < >>>>>>>> https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < >>>>>>>> https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> > >>>>>>>>>>> [1] https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! >>>>>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ >>>>>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ < >> https://urldefense.com/v3/__ >>>>>>>> https://github.com/apache/arrow/blob/master/cpp/src/ >>>>>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> >> < >>>>>>>> https://urldefense.com/v3/__https://github.com/apache/ >>>>>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! >>>>>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ >>>>>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ < >> https://urldefense.com/v3/__ >>>>>>>> https://github.com/apache/arrow/blob/master/cpp/src/ >>>>>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>>>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This message may contain information that is confidential or >>>>>>>> privileged. If you are not the intended recipient, please advise >> the sender >>>>>>>> immediately and delete this message. See http://www.blackrock.com/ >>>>>>>> corporate/compliance/email-disclaimers <http://www.blackrock.com/ >>>>>>>> corporate/compliance/email-disclaimers> for further information. >> Please >>>>>>>> refer to >> http://www.blackrock.com/corporate/compliance/privacy-policy < >>>>>>>> http://www.blackrock.com/corporate/compliance/privacy-policy> for >> more >>>>>>>> information about BlackRock’s Privacy Policy. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For a list of BlackRock's office addresses worldwide, see >>>>>>>> http://www.blackrock.com/corporate/about-us/contacts-locations < >>>>>>>> http://www.blackrock.com/corporate/about-us/contacts-locations>. >>>>>>>>>>> >>>>>>>>>>> © 2022 BlackRock, Inc. All rights reserved. >>>>>>>>>> >>>>>>>> >>>>> >>>> >>