RE: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-23 Thread Balaji Sudharsanam V
Do we have a Java Client for Spark Connect which is something like PySpark? From: Mich Talebzadeh Sent: 22 January 2025 15:05 To: Hyukjin Kwon Cc: Martin Grund ; Holden Karau ; Dongjoon Hyun ; dev Subject: [EXTERNAL] Re: FYI: A Hallucination about Spark Connect Stability in Spark 4 CI

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Mich Talebzadeh
CI broken is really an operational aspect albeit in this case was quote temporary. We should put that aside and move on as 1) product is sound and 2) spark connect is strategic for the future of Spark. HTH Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Hyukjin Kwon
While it might be a bit too much to talk about its stability, it is true that the CI dedicated for Spark Connect compat was broken there for a couple of weeks, and the errors from the tests look confusing. I agree that tests and builds could be one of the easiest measurements to tell the state of a

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Martin Grund
I'm very confused about how we use stability in CI as a measure to discuss the strategy of a particular feature, particularly because we call these "hallucinations." >From real-world experience, I can say that we have thousands of clients using Spark Connect across many different versions in our i

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Jules Damji
Thanks for update and looking into it. Excuse the thumb typos On Tue, 21 Jan 2025 at 4:09 PM, Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in development. > I

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Ángel
I'm passionate about and have lots of experience fixing OOMs. Contact me if you need some help. El mié, 22 ene 2025, 1:10, Hyukjin Kwon escribió: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that i

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
Thank you, Hyukjin! Dongjoon On Tue, Jan 21, 2025 at 16:10 Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in development. > I made an umbrella JIRA (https://issue

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Just a quick note on that: the major reason is 1. OOM we should figure out and fix the CI environment. 2. structured streaming test failure that is still in development. I made an umbrella JIRA (https://issues.apache.org/jira/browse/SPARK-50907), and I will work there. Should be easier to look at w

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Let me take a look. shouldn't be a major issue. On Wed, 22 Jan 2025 at 08:31, Mich Talebzadeh wrote: > As discussed on a thread over the weekend, we agreed among us including > Matei on a shift towards a more stable and version-independent APIs. > Spark Connect IMO is a key enabler of this shi

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Mich Talebzadeh
As discussed on a thread over the weekend, we agreed among us including Matei on a shift towards a more stable and version-independent APIs. Spark Connect IMO is a key enabler of this shift, allowing users and developers to build applications and libraries that are more resilient to changes in Sp

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
To be clear, (1) is `PySpark 4.0 Client` + `Spark 4.0 Server`, which is more severe. And, your point matches with (2) exactly. Thank you for your reply, Holden. Dongjoon. On 2025/01/21 22:38:20 Holden Karau wrote: > Interesting. So given one of the features of Spark connect should be > simpler

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Holden Karau
Interesting. So given one of the features of Spark connect should be simpler migrations we should (in my mind) only declare it stable once we’ve gone through two releases where the previous client + its code can talk to the new server. Twitter: https://twitter.com/holdenkarau Fight Health Insuranc

FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
It seems that there is misinformation about the stability of Spark Connect in Spark 4. I would like to reduce the gap in our dev mailing list. Frequently, some people claim `Spark Connect` is stable because it uses Protobuf. Yes, we standardize the interface layer. However, may I ask if it implies