Hi Sasha and Weston,

I'm the author of the mentioned Gandiva parser. I agree that having one
unified syntax is ideal. I think one critical divergence between Sasha's
and my proposals is that mine is with C++/Python imperative style (foo(x,
y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a
b)…). I feel like it'll be better for us to settle on one of the styles
before we start implementing the parsers.

Best,
Jin

On Friday, October 7, 2022, Sasha Krassovsky <krassovskysa...@gmail.com>
wrote:

> Hi Weston,
> I’d be happy to donate something like this to Sunstrait if that’s useful,
> I was thinking of proving out a design here before going there. However we
> could also just go straight there :)
>
> Regarding infix operators and such the edge case I was thinking of is that
> a user could potentially add a kernel to the registry called e.g. “+”.
> Would the parser implicitly convert any instances of “+” to “add” and break
> that?
>
> Implicit typing for literals and parameters can probably also be added
> without issues to the current scheme. Would the parameters be passed as an
> std::unordered_map?
>
> > Does a field_ref have to be a field name or can it be a field index?
>
> It can be a field index or even a field path. The field ref is parsed
> using FieldRef::FromDotPath ([1] in my original message), which can express
> any FieldRef.
>
> Sasha
>
> > 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> написал(а):
> >
> > Currently Substrait only has a binary (protobuf) serialization (and a
> > protobuf JSON one but that's not really human writable and barely
> > human readable).  Substrait does not have a text serialization.  I
> > believe there is some desire for one (maybe Sasha wants to give it a
> > try?).  A text format for Substrait would solve this problem because
> > you could go "text expression" -> "substrait expression" -> "arrow
> > expression".
> >
> > Since no text format exists for Substrait I think that Substrait does
> > not currently solve this problem or overlap with your work.  However,
> > at some point (hopefully), it will.
> >
> > There was also a fairly recent proposal for a parser for gandiva
> expressions[1].
> >
> > Compared with [1] I think this proposal is simpler to parse but lacks
> > some of the shortcut conveniences (e.g. implicit types for literals,
> > support for common infix operators (+, -, /, ...)).
> >
> > Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I
> think
> > would be useful to have as one could then do something like `auto
> > arrow_expr = Parse(my_expr, threshold)`.
> >
> > Does a field_ref have to be a field name or can it be a field index?
> > The latter is quite useful when the schema has duplicate field names.
> >
> > I'm +0.5 on this change.  I worry a bit about having (eventually)
> > three different syntaxes.  However, at the moment we have zero.
> >
> > [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn
> >
> >> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky
> >> <krassovskysa...@gmail.com> wrote:
> >>
> >> Hi David,
> >> Could you elaborate on which part of my proposal overlaps with
> Substrait? I don’t see anything in Substrait that allows me to do something
> along the lines of
> >>
> >> Expression e = Expression::FromString(“(add !.a $int32:1)”);
> >>
> >> in the code.
> >>
> >> Sasha
> >>
> >>>> On Oct 5, 2022, at 1:35 PM, Lee, David <david....@blackrock.com.INVALID>
> wrote:
> >>>
> >>> I believe this is what substrait.io <http://substrait.io/> is trying
> to accomplish..
> >>>
> >>> Here's some additional info:
> >>> https://substrait.io/ <https://substrait.io/>
> >>>
> >>> https://www.youtube.com/watch?v=5JjaB7p3Sjk <https://www.youtube.com/
> watch?v=5JjaB7p3Sjk>
> >>>
> >>> -----Original Message-----
> >>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto:
> krassovskysa...@gmail.com>>
> >>> Sent: Wednesday, October 5, 2022 11:29 AM
> >>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org>
> >>> Subject: Parser for expressions
> >>>
> >>> External Email: Use caution with links and attachments
> >>>
> >>>
> >>> Hi everyone,
> >>> I’ve noticed on the mailing list a few times people asking for a more
> convenient way to construct an Expression, namely using a string of some
> sort. I’ve found myself wishing for something like this too when
> constructing ExecPlans, and so I’ve gone ahead and implemented a parser
> [0]. I was wondering if anyone had any thoughts about the design of the
> language?
> >>>
> >>> The current implementation parses a lisp-like language. This language
> has three types of expressions (mirroring the current Expression API):
> >>>
> >>> - A call is a normal s-expression, it has the name of the kernel and
> the list of arguments. Its arguments can be any expression.
> >>> - A literal (i.e. scalar) starts with a $ and specifies a type and a
> value, separated by a colon. For example, `$decimal(12,2):10.01` specifies
> a literal of type decimal(12, 2) and a value of 10.01.
> >>> - A field_ref starts with a ! and is an identifier in the schema
> following the DotPath syntax we already have [1].
> >>>
> >>> So for example, the expression
> >>>
> >>> (add $int32:1 (multiply !.a !.b))
> >>>
> >>> computes a*b+1 given a batch with columns named a and b.
> >>>
> >>> The reason I chose a lisp-like language is that it very directly
> translates to the current Expression API and that it feels more natural to
> use a prefix notation for a language where all functions have a name (i.e.
> no +, -, *, etc.).
> >>>
> >>> I’m currently working on a followup PR for specifying ExecPlans from a
> string (mainly for easier testing), and would like that language to be an
> extension of this one. Looking forward to hearing everyone’s thoughts!
> >>>
> >>> Thanks,
> >>> Sasha Krassovsky
> >>>
> >>> [0] https://urldefense.com/v3/__https://github.com/apache/
> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
> https://urldefense.com/v3/__https://github.com/apache/
> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>   <
> https://urldefense.com/v3/__https://github.com/apache/
> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
> https://urldefense.com/v3/__https://github.com/apache/
> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>  >
> >>> [1] https://urldefense.com/v3/__https://github.com/apache/
> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>   <
> https://urldefense.com/v3/__https://github.com/apache/
> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>  >
> >>>
> >>>
> >>>
> >>> This message may contain information that is confidential or
> privileged. If you are not the intended recipient, please advise the sender
> immediately and delete this message. See http://www.blackrock.com/
> corporate/compliance/email-disclaimers <http://www.blackrock.com/
> corporate/compliance/email-disclaimers> for further information.  Please
> refer to http://www.blackrock.com/corporate/compliance/privacy-policy <
> http://www.blackrock.com/corporate/compliance/privacy-policy> for more
> information about BlackRock’s Privacy Policy.
> >>>
> >>>
> >>> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/about-us/contacts-locations <
> http://www.blackrock.com/corporate/about-us/contacts-locations>.
> >>>
> >>> © 2022 BlackRock, Inc. All rights reserved.
> >>
>

Reply via email to