Thanks for sharing the insightful context!

On Fri, Jan 17, 2025 at 11:47 AM Regina Lee <re...@linkedin.com.invalid>
wrote:

> Hi,
>
> I’d like to share insights from our Spark team at LinkedIn. We recently
> moved to a mostly shaded Spark 3 client internally. Our goal was to
> minimize dependency conflicts that could hinder Spark upgrades, especially
> given our previous efforts to migrate our users from Spark 2 to Spark 3,
> and LinkedIn’s heavy Scala / Java use cases with complicated dependency
> trees. We shaded rather aggressively (100+ relocations) given our specific
> ecosystem needs – Hadoop 2.10 with no current/planned support for Spark
> streaming / connect modules.
>
> At a high level, some notable shaded prefixes included org.json,
> com.google.common / protobuf, org.apache.commons, and org.antlr. Key
> dependencies *not* shaded were avro, jackson, datanucleus, logging / JRE
> / scala dependencies (in general, any dependencies exposed in Spark’s /
> other dependencies’ public APIs).
>
> There is an expected one-time cost in onboarding our Spark users to the
> shaded client. Most issues require importing missing dependencies
> originally provided by Spark/Hadoop. We are generally in favor of shading
> more of Spark’s dependencies because it has helped reduce developer toil
> and troubleshooting efforts.
>
> Thanks,
>
> Regina
>
> On 2024/12/07 15:30:20 Mich Talebzadeh wrote:
> > General comment without specifics. I think shading should be used* on a
> > case by case basis* when the benefits outweigh the drawbacks. How about
> > exploring alternatives such as modularization, dependency management, or
> > careful dependency selection, before resorting to shading? My point is
> that
> > shading will introduce more debugging and testing as packages will be
> > renamed impacting flexibility. Case in point, things like unit and
> > integration tests may need adjustments to account for the renamed
> packages.
> >
> > HTH
> >
> > Mich Talebzadeh,
> >
> > Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
> > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
> College
> > London <https://en.wikipedia.org/wiki/Imperial_College_London>
> > London, United Kingdom
> >
> >
> >    view my Linkedin profile
> > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> >
> >
> >  https://en.everybodywiki.com/Mich_Talebzadeh
> >
> >
> >
> > *Disclaimer:* The information provided is correct to the best of my
> > knowledge but of course cannot be guaranteed . It is essential to note
> > that, as with any advice, quote "one test result is worth one-thousand
> > expert opinions (Werner  <
> https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
> >
> >
> > On Sat, 7 Dec 2024 at 06:21, Holden Karau <ho...@gmail.com> wrote:
> >
> > > Hi Y'all,
> > >
> > > As we're getting closer to 4.0 I was thinking now is a good time for
> us to
> > > try and reduce the class path we expose for JVM users. Are there any
> common
> > > classes/packages folks would like to see shaded?
> > >
> > > Cheers,
> > >
> > > Holden :)
> > >
> > > --
> > > Twitter: https://twitter.com/holdenkarau
> > > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > > <https://www.fighthealthinsurance.com/?q=hk_email>
> > > Books (Learning Spark, High Performance Spark, etc.):
> > > https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > > Pronouns: she/her
> > >
> >
>


-- 
John Zhuge

Reply via email to