Re: Re: Increasing Shading & Relocating for 4.0

2025-01-19 Thread Ángel
The higher the level of abstraction, the less control and insight you typically have into its internal workings. If the goal is to create users rather than developers, Spark Connect is the right API to achieve that purpose. El dom, 19 ene 2025, 13:10, Mich Talebzadeh escribió: > I believe by act

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-19 Thread Mich Talebzadeh
I believe by actively involving the user community, we can create a more user-centric and successful path for the future of Spark in this respect. At the moment, the discussion is confined to this dev group but we ought to gather feedback from trenches so we can gain a sense of exit barriers and ti

RE: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Jules Damji
On 2025/01/18 22:35:59 Mich Talebzadeh wrote: > I think your view highlights the need for a shift towards more stable and > version-independent APIs. Spark Connect IMO is a key enabler of this shift, > allowing users and developers to build applications and libraries that are > more resilient to ch

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Denny Lee
That's what I'm hoping for - that going forward we can have more non-JVM clients (Python, GoLang, Rust, etc.) and make it simpler for JVM-based clients. I appreciate your call out on 90%/10% Holden - completely fair. I guess I would just love to see more traction on this so that way we can minim

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Matei Zaharia
Yup, it will definitely take a while, but I’d love to start tracing down the things that prevent people from moving (RDD API is one, but I’m worried there are also other internal hooks), and also start encouraging library and plugin developers to use more forward-compatible APIs. Hopefully we ca

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Mich Talebzadeh
I think your view highlights the need for a shift towards more stable and version-independent APIs. Spark Connect IMO is a key enabler of this shift, allowing users and developers to build applications and libraries that are more resilient to changes in Spark's internals as opposed to RDDs. As I s

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Ángel
What about introducing isolated class loaders, similar to the approach used by web servers? Perhaps OSGi bundles or something similar? El sáb, 18 ene 2025, 22:43, Holden Karau escribió: > I would say the short answer is "mostly not" and the longer answer is that > the connect APIs are explicitly

Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Matei Zaharia
We definitely need to move the “advanced” users to stable APIs if we want Spark to have a good future, such as the Spark Connect plugin APIs. The RDD API was the wrong abstraction in my opinion — hopefully I can say that since I worked on it. It was too tightly bound to Java types and to interna

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Holden Karau
I would say the short answer is "mostly not" and the longer answer is that the connect APIs are explicitly not covering many, what we would call, "paved paths." Because we're more likely to have JAR conflicts with advanced users who are more likely to use some of the non-supported APIs. For example

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Denny Lee
BTW, one of many reasons Spark Connect was developed was to potentially simplify this process around shading (i.e. not need to do it). I’m wondering if utilizing Spark Connect could be a potential solution here? On Fri, Jan 17, 2025 at 12:27 Holden Karau wrote: > +1 I think this is great. If

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-17 Thread Holden Karau
+1 I think this is great. If you’ve got any shading you’d be open to upstreaming I’d be happy to review it. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performa

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-17 Thread John Zhuge
Thanks for sharing the insightful context! On Fri, Jan 17, 2025 at 11:47 AM Regina Lee wrote: > Hi, > > I’d like to share insights from our Spark team at LinkedIn. We recently > moved to a mostly shaded Spark 3 client internally. Our goal was to > minimize dependency conflicts that could hinder

RE: Re: Increasing Shading & Relocating for 4.0

2025-01-17 Thread Regina Lee
Hi, I’d like to share insights from our Spark team at LinkedIn. We recently moved to a mostly shaded Spark 3 client internally. Our goal was to minimize dependency conflicts that could hinder Spark upgrades, especially given our previous efforts to migrate our users from Spark 2 to Spark 3, an

Re: Increasing Shading & Relocating for 4.0

2024-12-07 Thread Mich Talebzadeh
General comment without specifics. I think shading should be used* on a case by case basis* when the benefits outweigh the drawbacks. How about exploring alternatives such as modularization, dependency management, or careful dependency selection, before resorting to shading? My point is that shadin