Thanks for sharing the insightful context! On Fri, Jan 17, 2025 at 11:47 AM Regina Lee <re...@linkedin.com.invalid> wrote:
> Hi, > > I’d like to share insights from our Spark team at LinkedIn. We recently > moved to a mostly shaded Spark 3 client internally. Our goal was to > minimize dependency conflicts that could hinder Spark upgrades, especially > given our previous efforts to migrate our users from Spark 2 to Spark 3, > and LinkedIn’s heavy Scala / Java use cases with complicated dependency > trees. We shaded rather aggressively (100+ relocations) given our specific > ecosystem needs – Hadoop 2.10 with no current/planned support for Spark > streaming / connect modules. > > At a high level, some notable shaded prefixes included org.json, > com.google.common / protobuf, org.apache.commons, and org.antlr. Key > dependencies *not* shaded were avro, jackson, datanucleus, logging / JRE > / scala dependencies (in general, any dependencies exposed in Spark’s / > other dependencies’ public APIs). > > There is an expected one-time cost in onboarding our Spark users to the > shaded client. Most issues require importing missing dependencies > originally provided by Spark/Hadoop. We are generally in favor of shading > more of Spark’s dependencies because it has helped reduce developer toil > and troubleshooting efforts. > > Thanks, > > Regina > > On 2024/12/07 15:30:20 Mich Talebzadeh wrote: > > General comment without specifics. I think shading should be used* on a > > case by case basis* when the benefits outweigh the drawbacks. How about > > exploring alternatives such as modularization, dependency management, or > > careful dependency selection, before resorting to shading? My point is > that > > shading will introduce more debugging and testing as packages will be > > renamed impacting flexibility. Case in point, things like unit and > > integration tests may need adjustments to account for the renamed > packages. > > > > HTH > > > > Mich Talebzadeh, > > > > Architect | Data Science | Financial Crime | GDPR & Compliance Specialist > > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial > College > > London <https://en.wikipedia.org/wiki/Imperial_College_London> > > London, United Kingdom > > > > > > view my Linkedin profile > > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > > > > > *Disclaimer:* The information provided is correct to the best of my > > knowledge but of course cannot be guaranteed . It is essential to note > > that, as with any advice, quote "one test result is worth one-thousand > > expert opinions (Werner < > https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > > > > On Sat, 7 Dec 2024 at 06:21, Holden Karau <ho...@gmail.com> wrote: > > > > > Hi Y'all, > > > > > > As we're getting closer to 4.0 I was thinking now is a good time for > us to > > > try and reduce the class path we expose for JVM users. Are there any > common > > > classes/packages folks would like to see shaded? > > > > > > Cheers, > > > > > > Holden :) > > > > > > -- > > > Twitter: https://twitter.com/holdenkarau > > > Fight Health Insurance: https://www.fighthealthinsurance.com/ > > > <https://www.fighthealthinsurance.com/?q=hk_email> > > > Books (Learning Spark, High Performance Spark, etc.): > > > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > Pronouns: she/her > > > > > > -- John Zhuge