Allowing Unicode Whitespace in Lexer

2024-03-23 Thread serge rielau . com
Hello, I have a PR https://github.com/apache/spark/pull/45620 ready to go that will extend the definition of whitespace (what separates token) from the small set of ASCII characters space, tab, linefeed to those defined in Unicode. While this is a small and safe change, it is one where we would

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Yeah I heard about that. This IMHO is a bit more worrying, and we do not have teh "excuse" that it is transparent. Also, which of these would be STRING and which IDENTIFIER? On Mar 25, 2024 at 1:06 PM -0700, Alex Cruise , wrote: While we're at it, maybe consider allowing "smart quotes" too :) -0

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Going once, going twice, …. last call for objections On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , wrote: Hello, I have a PR https://github.com/apache/spark/pull/45620 ready to go that will extend the definition of whitespace (what separates token) from the small set of ASCII characters

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread serge rielau . com
+1 it‘s the wrapping on math overflows that does it for me. Sent from my iPhone On Apr 12, 2024, at 9:36 AM, huaxin gao wrote:  +1 On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh mailto:vii...@gmail.com>> wrote: +1 I believe ANSI mode is well developed after many releases. No doubt it could be

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread serge rielau . com
I think some of the issues raised here are not really common. Examples should follow best practice. It would be odd to have an example that exploits ansi.enabled=false to e.g. overflow an integer. Instead an example that works with ansi mode will typically work perfectly fine in an older version,

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-01-31 Thread serge rielau . com
FYI: Allison On Jan 31, 2025, at 7:04 AM, Mich Talebzadeh wrote: Hi Frank, I think this would be for the Spark dev team. I have added to the email. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR [https://ci3.googleusercontent.com/mail-sig/AIor

Re: [DISCUSS] Use plain text logs by default

2024-11-22 Thread serge rielau . com
It doesn’t have to be very easy. It just has to be easier than maintaining two infrastrictures forever. If we can’t easily parse the json log to emmit the existing text content, I’d say we have a bigger problem. On Nov 22, 2024 at 2:17 PM -0800, Jungtaek Lim , wrote: I'm not sure it is very eas

Re: [DISCUSS] Use plain text logs by default

2024-11-22 Thread serge rielau . com
Shouldn’t we differentiate between teh logging and the reading of the log. The problem appears to be in the presentation layer. We could provide a basic log reader, insteda of supporting longterm two different ways to log. On Nov 22, 2024, at 6:37 AM, Martin Grund wrote: I'm generally supporti

SQL Procedures and SQL Scripting

2025-01-10 Thread serge rielau . com
Hi all, Some of you may have noticed a lot of activity in the grammar around a new BEGIN … END statement. For those interested please find here the complete specification for your amusement and review. TL;DR; The goal is to add SQL Procedures and "anonymous blocks" based on ANSI SQL/PSM. Che

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread serge rielau . com
+1 (non binding) On Mar 21, 2025, at 12:52 PM, Jules Damji wrote: +1 (non-binding) — Sent from my iPhone Pardon the dumb thumb typos :) On Mar 21, 2025, at 11:47 AM, Anton Okolnychyi wrote:  Hi all, I would like to start a vote on adding support for constraints to DSv2. Discussion thread:

Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-17 Thread serge rielau . com
What are you comparing performance against? On Mar 17, 2025 at 11:54 AM -0700, Reynold Xin , wrote: Any thoughts on how to deal with performance here? Initially we didn't do nano level precision because of performance (would not be able to fit everything into a 64 bit int). On Mon, Mar 17, 2025

Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-17 Thread serge rielau . com
IMHO that’s not a good comparison. By that logic we shouldn’t have double because it’s slower than int. We should compare against the competition first. Maybe as part of this effort we’ll need to prototype two competing solutions. The vast majority of differences should be related to storage cos

Re: [VOTE] SPIP: Add the TIME data type

2025-02-23 Thread serge rielau . com
+1 it’s abt time. Sent from my iPhone > On Feb 23, 2025, at 12:25 PM, L. C. Hsieh wrote: > > +1 > >> On Sun, Feb 23, 2025 at 7:51 AM Max Gekk wrote: >> >> Hi Spark devs, >> >> Following the discussion [1], I'd like to start the vote for the SPIP [2]. >> The SPIP aims to add a new data typ