I think this is already how it works. Most apps would depend on just
spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull
in streaming or mllib.
I don't think it's intended that you pull in all submodules for any one
app, although you could.
I don't know if there's some common subset that is both large and commonly
used.

Maven/SBT already pull in all transitive dependencies.

On Tue, Jun 3, 2025 at 12:41 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:

> Hi all,
>
> Sorry for bumping this again - just trying to understand if it's worth
> adding a small feature for this - I think it can help Spark users and Spark
> libraries upgrade and support Spark versions a lot easier :)
> If instead of adding many provided dependencies we'll have one that will
> include them all - that's a lot easier to maintain...
>
>
>
> Thanks!
>
> Nimrod
>
>
> On Sun, Jun 1, 2025 at 12:23 AM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>
>> No
>> K8s deployment, nothing special.
>> I just don't see why when I'm developing and compiling or let's say
>> upgrade from spark 3.5 to spark 4.0 I need to upgrade all the dependencies
>> I use but don't actually deploy- but use from the regular spark runtime...
>>
>> Thanks,
>> Nimrod
>>
>> בתאריך שבת, 31 במאי 2025, 23:44, מאת Mich Talebzadeh ‏<
>> mich.talebza...@gmail.com>:
>>
>>> Are you running in YARN mode and you want to put these jar files into
>>> HDFS in a distributed cluster?
>>> HTH
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, 31 May 2025 at 19:47, Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Apologies if this is a basic question—I’ve searched around but haven’t
>>>> found a clear answer.
>>>>
>>>> I'm currently developing a Spark application using Scala, and I’m
>>>> looking for a way to include all the JARs typically bundled in a standard
>>>> Spark installation as a single provided dependency.
>>>>
>>>> From what I’ve seen, most examples add each Spark module individually
>>>> (like spark-core, spark-sql, spark-mllib, etc.) as separate provided 
>>>> dependencies.
>>>> However, since these are all included in the Spark runtime environment, I’m
>>>> wondering why there isn’t a more aggregated dependency—something like a
>>>> parent project or BOM (Bill of Materials) that pulls in all the commonly
>>>> included Spark libraries (along with compatible versions of Log4j, Guava,
>>>> Jackson, and so on) - which is being used in projects.
>>>>
>>>> Is there a particular reason this approach isn’t commonly used? Does it
>>>> cause issues with transitive dependencies or version mismatches? If so -
>>>> I'm sure those can be addressed as well...
>>>>
>>>>
>>>> Thanks in advance for any insights!
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Nimrod
>>>>
>>>>

Reply via email to