BTW, one of many reasons Spark Connect was developed was to potentially
simplify this process around shading (i.e. not need to do it).   I’m
wondering if utilizing Spark Connect could be a potential solution here?


On Fri, Jan 17, 2025 at 12:27 Holden Karau <holden.ka...@gmail.com> wrote:

> +1 I think this is great. If you’ve got any shading you’d be open to
> upstreaming I’d be happy to review it.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Fri, Jan 17, 2025 at 12:25 PM John Zhuge <jzh...@apache.org> wrote:
>
>> Thanks for sharing the insightful context!
>>
>> On Fri, Jan 17, 2025 at 11:47 AM Regina Lee <re...@linkedin.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> I’d like to share insights from our Spark team at LinkedIn. We recently
>>> moved to a mostly shaded Spark 3 client internally. Our goal was to
>>> minimize dependency conflicts that could hinder Spark upgrades, especially
>>> given our previous efforts to migrate our users from Spark 2 to Spark 3,
>>> and LinkedIn’s heavy Scala / Java use cases with complicated dependency
>>> trees. We shaded rather aggressively (100+ relocations) given our specific
>>> ecosystem needs – Hadoop 2.10 with no current/planned support for Spark
>>> streaming / connect modules.
>>>
>>> At a high level, some notable shaded prefixes included org.json,
>>> com.google.common / protobuf, org.apache.commons, and org.antlr. Key
>>> dependencies *not* shaded were avro, jackson, datanucleus, logging /
>>> JRE / scala dependencies (in general, any dependencies exposed in Spark’s /
>>> other dependencies’ public APIs).
>>>
>>> There is an expected one-time cost in onboarding our Spark users to the
>>> shaded client. Most issues require importing missing dependencies
>>> originally provided by Spark/Hadoop. We are generally in favor of shading
>>> more of Spark’s dependencies because it has helped reduce developer toil
>>> and troubleshooting efforts.
>>>
>>> Thanks,
>>>
>>> Regina
>>>
>>> On 2024/12/07 15:30:20 Mich Talebzadeh wrote:
>>> > General comment without specifics. I think shading should be used* on a
>>> > case by case basis* when the benefits outweigh the drawbacks. How about
>>> > exploring alternatives such as modularization, dependency management,
>>> or
>>> > careful dependency selection, before resorting to shading? My point is
>>> that
>>> > shading will introduce more debugging and testing as packages will be
>>> > renamed impacting flexibility. Case in point, things like unit and
>>> > integration tests may need adjustments to account for the renamed
>>> packages.
>>> >
>>> > HTH
>>> >
>>> > Mich Talebzadeh,
>>> >
>>> > Architect | Data Science | Financial Crime | GDPR & Compliance
>>> Specialist
>>> > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>> College
>>> > London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>> > London, United Kingdom
>>> >
>>> >
>>> >    view my Linkedin profile
>>> > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>> >
>>> >
>>> >  https://en.everybodywiki.com/Mich_Talebzadeh
>>> >
>>> >
>>> >
>>> > *Disclaimer:* The information provided is correct to the best of my
>>> > knowledge but of course cannot be guaranteed . It is essential to note
>>> > that, as with any advice, quote "one test result is worth one-thousand
>>> > expert opinions (Werner  <
>>> https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
>>> > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>> >
>>> >
>>> > On Sat, 7 Dec 2024 at 06:21, Holden Karau <ho...@gmail.com> wrote:
>>> >
>>> > > Hi Y'all,
>>> > >
>>> > > As we're getting closer to 4.0 I was thinking now is a good time for
>>> us to
>>> > > try and reduce the class path we expose for JVM users. Are there any
>>> common
>>> > > classes/packages folks would like to see shaded?
>>> > >
>>> > > Cheers,
>>> > >
>>> > > Holden :)
>>> > >
>>> > > --
>>> > > Twitter: https://twitter.com/holdenkarau
>>> > > Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> > > <https://www.fighthealthinsurance.com/?q=hk_email>
>>> > > Books (Learning Spark, High Performance Spark, etc.):
>>> > > https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> > > Pronouns: she/her
>>> > >
>>> >
>>>
>>
>>
>> --
>> John Zhuge
>>
>

Reply via email to