Hi,

I’d like to share insights from our Spark team at LinkedIn. We recently moved 
to a mostly shaded Spark 3 client internally. Our goal was to minimize 
dependency conflicts that could hinder Spark upgrades, especially given our 
previous efforts to migrate our users from Spark 2 to Spark 3, and LinkedIn’s 
heavy Scala / Java use cases with complicated dependency trees. We shaded 
rather aggressively (100+ relocations) given our specific ecosystem needs – 
Hadoop 2.10 with no current/planned support for Spark streaming / connect 
modules.


At a high level, some notable shaded prefixes included org.json, 
com.google.common / protobuf, org.apache.commons, and org.antlr. Key 
dependencies not shaded were avro, jackson, datanucleus, logging / JRE / scala 
dependencies (in general, any dependencies exposed in Spark’s / other 
dependencies’ public APIs).


There is an expected one-time cost in onboarding our Spark users to the shaded 
client. Most issues require importing missing dependencies originally provided 
by Spark/Hadoop. We are generally in favor of shading more of Spark’s 
dependencies because it has helped reduce developer toil and troubleshooting 
efforts.


Thanks,

Regina

On 2024/12/07 15:30:20 Mich Talebzadeh wrote:
> General comment without specifics. I think shading should be used* on a
> case by case basis* when the benefits outweigh the drawbacks. How about
> exploring alternatives such as modularization, dependency management, or
> careful dependency selection, before resorting to shading? My point is that
> shading will introduce more debugging and testing as packages will be
> renamed impacting flexibility. Case in point, things like unit and
> integration tests may need adjustments to account for the renamed packages.
>
> HTH
>
> Mich Talebzadeh,
>
> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
> London <https://en.wikipedia.org/wiki/Imperial_College_London>
> London, United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Sat, 7 Dec 2024 at 06:21, Holden Karau 
> <ho...@gmail.com<mailto:ho...@gmail.com>> wrote:
>
> > Hi Y'all,
> >
> > As we're getting closer to 4.0 I was thinking now is a good time for us to
> > try and reduce the class path we expose for JVM users. Are there any common
> > classes/packages folks would like to see shaded?
> >
> > Cheers,
> >
> > Holden :)
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > <https://www.fighthealthinsurance.com/?q=hk_email>
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
> >
>

Reply via email to