Re: Question Regarding Spark Dependencies in Scala

Nimrod Ofek Tue, 03 Jun 2025 10:55:26 -0700

Thanks Sean.
There are other dependencies that you need to align with Spark if you need
to use them as well - like Guava, Jackson etc.
I find them more difficult to use - because you need to go to Spark repo to
check the correct version used - and if there are upgrades between versions
you need to check that to upgrade as well.
What do you think?


Thanks!
Nimrod

On Tue, Jun 3, 2025 at 8:51 PM Sean Owen <sro...@gmail.com> wrote:

> I think this is already how it works. Most apps would depend on just
> spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull
> in streaming or mllib.
> I don't think it's intended that you pull in all submodules for any one
> app, although you could.
> I don't know if there's some common subset that is both large and commonly
> used.
>
> Maven/SBT already pull in all transitive dependencies.
>
> On Tue, Jun 3, 2025 at 12:41 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>
>> Hi all,
>>
>> Sorry for bumping this again - just trying to understand if it's worth
>> adding a small feature for this - I think it can help Spark users and Spark
>> libraries upgrade and support Spark versions a lot easier :)
>> If instead of adding many provided dependencies we'll have one that will
>> include them all - that's a lot easier to maintain...
>>
>>
>>
>> Thanks!
>>
>> Nimrod
>>
>>
>> On Sun, Jun 1, 2025 at 12:23 AM Nimrod Ofek <ofek.nim...@gmail.com>
>> wrote:
>>
>>> No
>>> K8s deployment, nothing special.
>>> I just don't see why when I'm developing and compiling or let's say
>>> upgrade from spark 3.5 to spark 4.0 I need to upgrade all the dependencies
>>> I use but don't actually deploy- but use from the regular spark runtime...
>>>
>>> Thanks,
>>> Nimrod
>>>
>>> בתאריך שבת, 31 במאי 2025, 23:44, מאת Mich Talebzadeh ‏<
>>> mich.talebza...@gmail.com>:
>>>
>>>> Are you running in YARN mode and you want to put these jar files into
>>>> HDFS in a distributed cluster?
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh,
>>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, 31 May 2025 at 19:47, Nimrod Ofek <ofek.nim...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Apologies if this is a basic question—I’ve searched around but haven’t
>>>>> found a clear answer.
>>>>>
>>>>> I'm currently developing a Spark application using Scala, and I’m
>>>>> looking for a way to include all the JARs typically bundled in a standard
>>>>> Spark installation as a single provided dependency.
>>>>>
>>>>> From what I’ve seen, most examples add each Spark module individually
>>>>> (like spark-core, spark-sql, spark-mllib, etc.) as separate provided 
>>>>> dependencies.
>>>>> However, since these are all included in the Spark runtime environment, 
>>>>> I’m
>>>>> wondering why there isn’t a more aggregated dependency—something like a
>>>>> parent project or BOM (Bill of Materials) that pulls in all the commonly
>>>>> included Spark libraries (along with compatible versions of Log4j, Guava,
>>>>> Jackson, and so on) - which is being used in projects.
>>>>>
>>>>> Is there a particular reason this approach isn’t commonly used? Does
>>>>> it cause issues with transitive dependencies or version mismatches? If so 
>>>>> -
>>>>> I'm sure those can be addressed as well...
>>>>>
>>>>>
>>>>> Thanks in advance for any insights!
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Nimrod
>>>>>
>>>>>

Re: Question Regarding Spark Dependencies in Scala

Reply via email to