Hi,
I’d like to share insights from our Spark team at LinkedIn. We recently moved to a mostly shaded Spark 3 client internally. Our goal was to minimize dependency conflicts that could hinder Spark upgrades, especially given our previous efforts to migrate our users from Spark 2 to Spark 3, and LinkedIn’s heavy Scala / Java use cases with complicated dependency trees. We shaded rather aggressively (100+ relocations) given our specific ecosystem needs – Hadoop 2.10 with no current/planned support for Spark streaming / connect modules. At a high level, some notable shaded prefixes included org.json, com.google.common / protobuf, org.apache.commons, and org.antlr. Key dependencies not shaded were avro, jackson, datanucleus, logging / JRE / scala dependencies (in general, any dependencies exposed in Spark’s / other dependencies’ public APIs). There is an expected one-time cost in onboarding our Spark users to the shaded client. Most issues require importing missing dependencies originally provided by Spark/Hadoop. We are generally in favor of shading more of Spark’s dependencies because it has helped reduce developer toil and troubleshooting efforts. Thanks, Regina On 2024/12/07 15:30:20 Mich Talebzadeh wrote: > General comment without specifics. I think shading should be used* on a > case by case basis* when the benefits outweigh the drawbacks. How about > exploring alternatives such as modularization, dependency management, or > careful dependency selection, before resorting to shading? My point is that > shading will introduce more debugging and testing as packages will be > renamed impacting flexibility. Case in point, things like unit and > integration tests may need adjustments to account for the renamed packages. > > HTH > > Mich Talebzadeh, > > Architect | Data Science | Financial Crime | GDPR & Compliance Specialist > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Sat, 7 Dec 2024 at 06:21, Holden Karau > <ho...@gmail.com<mailto:ho...@gmail.com>> wrote: > > > Hi Y'all, > > > > As we're getting closer to 4.0 I was thinking now is a good time for us to > > try and reduce the class path we expose for JVM users. Are there any common > > classes/packages folks would like to see shaded? > > > > Cheers, > > > > Holden :) > > > > -- > > Twitter: https://twitter.com/holdenkarau > > Fight Health Insurance: https://www.fighthealthinsurance.com/ > > <https://www.fighthealthinsurance.com/?q=hk_email> > > Books (Learning Spark, High Performance Spark, etc.): > > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > Pronouns: she/her > > >