dtenedor commented on PR #50284:
URL: https://github.com/apache/spark/pull/50284#issuecomment-2730207770

   Hi @dongjoon-hyun, this is a good question. We talked with Jeff Shute, the 
author of the SQL pipe syntax paper from Google. He mentions that they went 
with `|>` because all implementing engines can support it without any issues 
with their parsers. Specifically, Google uses an LALR parser [1] which makes it 
impossible to use `|` as the token due to ambiguity with bit operations. They 
won't be able to add support for this.
   
   It seems like a growing consensus in the industry is that we should all 
support `|>` as the primary token, but some engines may also decide to support 
alternative tokens in addition. This blog [2] describes the situation and 
mentions how another engine decided to go this direction. Please let us know 
your thoughts on this -- we can certainly hold off on merging this PR until we 
figure out together what we want the plan to be.
   
   [1] https://github.com/google/zetasql/blob/master/bazel/bison.bzl
   
   [2] https://superdb.org/docs/language/pipe-ambiguity/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to