BTW, one of many reasons Spark Connect was developed was to potentially simplify this process around shading (i.e. not need to do it). I’m wondering if utilizing Spark Connect could be a potential solution here?
On Fri, Jan 17, 2025 at 12:27 Holden Karau <holden.ka...@gmail.com> wrote: > +1 I think this is great. If you’ve got any shading you’d be open to > upstreaming I’d be happy to review it. > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Fri, Jan 17, 2025 at 12:25 PM John Zhuge <jzh...@apache.org> wrote: > >> Thanks for sharing the insightful context! >> >> On Fri, Jan 17, 2025 at 11:47 AM Regina Lee <re...@linkedin.com.invalid> >> wrote: >> >>> Hi, >>> >>> I’d like to share insights from our Spark team at LinkedIn. We recently >>> moved to a mostly shaded Spark 3 client internally. Our goal was to >>> minimize dependency conflicts that could hinder Spark upgrades, especially >>> given our previous efforts to migrate our users from Spark 2 to Spark 3, >>> and LinkedIn’s heavy Scala / Java use cases with complicated dependency >>> trees. We shaded rather aggressively (100+ relocations) given our specific >>> ecosystem needs – Hadoop 2.10 with no current/planned support for Spark >>> streaming / connect modules. >>> >>> At a high level, some notable shaded prefixes included org.json, >>> com.google.common / protobuf, org.apache.commons, and org.antlr. Key >>> dependencies *not* shaded were avro, jackson, datanucleus, logging / >>> JRE / scala dependencies (in general, any dependencies exposed in Spark’s / >>> other dependencies’ public APIs). >>> >>> There is an expected one-time cost in onboarding our Spark users to the >>> shaded client. Most issues require importing missing dependencies >>> originally provided by Spark/Hadoop. We are generally in favor of shading >>> more of Spark’s dependencies because it has helped reduce developer toil >>> and troubleshooting efforts. >>> >>> Thanks, >>> >>> Regina >>> >>> On 2024/12/07 15:30:20 Mich Talebzadeh wrote: >>> > General comment without specifics. I think shading should be used* on a >>> > case by case basis* when the benefits outweigh the drawbacks. How about >>> > exploring alternatives such as modularization, dependency management, >>> or >>> > careful dependency selection, before resorting to shading? My point is >>> that >>> > shading will introduce more debugging and testing as packages will be >>> > renamed impacting flexibility. Case in point, things like unit and >>> > integration tests may need adjustments to account for the renamed >>> packages. >>> > >>> > HTH >>> > >>> > Mich Talebzadeh, >>> > >>> > Architect | Data Science | Financial Crime | GDPR & Compliance >>> Specialist >>> > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>> College >>> > London <https://en.wikipedia.org/wiki/Imperial_College_London> >>> > London, United Kingdom >>> > >>> > >>> > view my Linkedin profile >>> > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> > >>> > >>> > https://en.everybodywiki.com/Mich_Talebzadeh >>> > >>> > >>> > >>> > *Disclaimer:* The information provided is correct to the best of my >>> > knowledge but of course cannot be guaranteed . It is essential to note >>> > that, as with any advice, quote "one test result is worth one-thousand >>> > expert opinions (Werner < >>> https://en.wikipedia.org/wiki/Wernher_von_Braun>Von >>> > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> > >>> > >>> > On Sat, 7 Dec 2024 at 06:21, Holden Karau <ho...@gmail.com> wrote: >>> > >>> > > Hi Y'all, >>> > > >>> > > As we're getting closer to 4.0 I was thinking now is a good time for >>> us to >>> > > try and reduce the class path we expose for JVM users. Are there any >>> common >>> > > classes/packages folks would like to see shaded? >>> > > >>> > > Cheers, >>> > > >>> > > Holden :) >>> > > >>> > > -- >>> > > Twitter: https://twitter.com/holdenkarau >>> > > Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> > > <https://www.fighthealthinsurance.com/?q=hk_email> >>> > > Books (Learning Spark, High Performance Spark, etc.): >>> > > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> > > Pronouns: she/her >>> > > >>> > >>> >> >> >> -- >> John Zhuge >> >