[FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-18 Thread Dongjoon Hyun
Hi, All. This is a kind of head-up as a part of Apache Spark 4.0.0 preparation. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) It would be great if we are able to fix long-standing `Spark Connect` test flakiness together during the QA period (2025-02-01 ~) in orde

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Holden Karau
I would say the short answer is "mostly not" and the longer answer is that the connect APIs are explicitly not covering many, what we would call, "paved paths." Because we're more likely to have JAR conflicts with advanced users who are more likely to use some of the non-supported APIs. For example

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Ángel
What about introducing isolated class loaders, similar to the approach used by web servers? Perhaps OSGi bundles or something similar? El sáb, 18 ene 2025, 22:43, Holden Karau escribió: > I would say the short answer is "mostly not" and the longer answer is that > the connect APIs are explicitly

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Matei Zaharia
We definitely need to move the “advanced” users to stable APIs if we want Spark to have a good future, such as the Spark Connect plugin APIs. The RDD API was the wrong abstraction in my opinion — hopefully I can say that since I worked on it. It was too tightly bound to Java types and to interna

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Matei Zaharia
Yup, it will definitely take a while, but I’d love to start tracing down the things that prevent people from moving (RDD API is one, but I’m worried there are also other internal hooks), and also start encouraging library and plugin developers to use more forward-compatible APIs. Hopefully we ca

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Denny Lee
That's what I'm hoping for - that going forward we can have more non-JVM clients (Python, GoLang, Rust, etc.) and make it simpler for JVM-based clients. I appreciate your call out on 90%/10% Holden - completely fair. I guess I would just love to see more traction on this so that way we can minim

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Denny Lee
BTW, one of many reasons Spark Connect was developed was to potentially simplify this process around shading (i.e. not need to do it). I’m wondering if utilizing Spark Connect could be a potential solution here? On Fri, Jan 17, 2025 at 12:27 Holden Karau wrote: > +1 I think this is great. If

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Mich Talebzadeh
I think your view highlights the need for a shift towards more stable and version-independent APIs. Spark Connect IMO is a key enabler of this shift, allowing users and developers to build applications and libraries that are more resilient to changes in Spark's internals as opposed to RDDs. As I s

RE: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Jules Damji
On 2025/01/18 22:35:59 Mich Talebzadeh wrote: > I think your view highlights the need for a shift towards more stable and > version-independent APIs. Spark Connect IMO is a key enabler of this shift, > allowing users and developers to build applications and libraries that are > more resilient to ch