looks fine except that processing all Unicode whitespace characters might
add overhead to the parsing process, potentially impacting performance.
Although I think this is a moot point
+1
Mich Talebzadeh,
Technologist | Solutions Architect | Data Engineer | Generative AI
London
United Kingdom
+1, this is a reasonable change.
Gengliang
On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com wrote:
> Going once, going twice, …. last call for objections
> On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com ,
> wrote:
>
> Hello,
>
> I have a PR https://github.com/apache/spark/pull/45620 ready
Going once, going twice, …. last call for objections
On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , wrote:
Hello,
I have a PR https://github.com/apache/spark/pull/45620 ready to go that will
extend the definition of whitespace (what separates token) from the small set
of ASCII characters
Yeah I heard about that. This IMHO is a bit more worrying, and we do not have
teh "excuse" that it is transparent.
Also, which of these would be STRING and which IDENTIFIER?
On Mar 25, 2024 at 1:06 PM -0700, Alex Cruise , wrote:
While we're at it, maybe consider allowing "smart quotes" too :)
-0
While we're at it, maybe consider allowing "smart quotes" too :)
-0xe1a
On Sat, Mar 23, 2024 at 5:29 PM serge rielau.com wrote:
> Hello,
>
> I have a PR https://github.com/apache/spark/pull/45620 ready to go that
> will extend the definition of whitespace (what separates token) from the
> smal
Hello,
I have a PR https://github.com/apache/spark/pull/45620 ready to go that will
extend the definition of whitespace (what separates token) from the small set
of ASCII characters space, tab, linefeed to those defined in Unicode.
While this is a small and safe change, it is one where we would