Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Mich Talebzadeh
looks fine except that processing all Unicode whitespace characters might add overhead to the parsing process, potentially impacting performance. Although I think this is a moot point +1 Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Gengliang Wang
+1, this is a reasonable change. Gengliang On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com wrote: > Going once, going twice, …. last call for objections > On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , > wrote: > > Hello, > > I have a PR https://github.com/apache/spark/pull/45620 ready

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Going once, going twice, …. last call for objections On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , wrote: Hello, I have a PR https://github.com/apache/spark/pull/45620 ready to go that will extend the definition of whitespace (what separates token) from the small set of ASCII characters

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Yeah I heard about that. This IMHO is a bit more worrying, and we do not have teh "excuse" that it is transparent. Also, which of these would be STRING and which IDENTIFIER? On Mar 25, 2024 at 1:06 PM -0700, Alex Cruise , wrote: While we're at it, maybe consider allowing "smart quotes" too :) -0

Re: Allowing Unicode Whitespace in Lexer

2024-03-25 Thread Alex Cruise
While we're at it, maybe consider allowing "smart quotes" too :) -0xe1a On Sat, Mar 23, 2024 at 5:29 PM serge rielau.com wrote: > Hello, > > I have a PR https://github.com/apache/spark/pull/45620 ready to go that > will extend the definition of whitespace (what separates token) from the > smal