I think this is already how it works. Most apps would depend on just spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull in streaming or mllib. I don't think it's intended that you pull in all submodules for any one app, although you could. I don't know if there's some common subset that is both large and commonly used.
Maven/SBT already pull in all transitive dependencies. On Tue, Jun 3, 2025 at 12:41 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > Hi all, > > Sorry for bumping this again - just trying to understand if it's worth > adding a small feature for this - I think it can help Spark users and Spark > libraries upgrade and support Spark versions a lot easier :) > If instead of adding many provided dependencies we'll have one that will > include them all - that's a lot easier to maintain... > > > > Thanks! > > Nimrod > > > On Sun, Jun 1, 2025 at 12:23 AM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > >> No >> K8s deployment, nothing special. >> I just don't see why when I'm developing and compiling or let's say >> upgrade from spark 3.5 to spark 4.0 I need to upgrade all the dependencies >> I use but don't actually deploy- but use from the regular spark runtime... >> >> Thanks, >> Nimrod >> >> בתאריך שבת, 31 במאי 2025, 23:44, מאת Mich Talebzadeh < >> mich.talebza...@gmail.com>: >> >>> Are you running in YARN mode and you want to put these jar files into >>> HDFS in a distributed cluster? >>> HTH >>> >>> Dr Mich Talebzadeh, >>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> >>> >>> On Sat, 31 May 2025 at 19:47, Nimrod Ofek <ofek.nim...@gmail.com> wrote: >>> >>>> Hi everyone, >>>> >>>> Apologies if this is a basic question—I’ve searched around but haven’t >>>> found a clear answer. >>>> >>>> I'm currently developing a Spark application using Scala, and I’m >>>> looking for a way to include all the JARs typically bundled in a standard >>>> Spark installation as a single provided dependency. >>>> >>>> From what I’ve seen, most examples add each Spark module individually >>>> (like spark-core, spark-sql, spark-mllib, etc.) as separate provided >>>> dependencies. >>>> However, since these are all included in the Spark runtime environment, I’m >>>> wondering why there isn’t a more aggregated dependency—something like a >>>> parent project or BOM (Bill of Materials) that pulls in all the commonly >>>> included Spark libraries (along with compatible versions of Log4j, Guava, >>>> Jackson, and so on) - which is being used in projects. >>>> >>>> Is there a particular reason this approach isn’t commonly used? Does it >>>> cause issues with transitive dependencies or version mismatches? If so - >>>> I'm sure those can be addressed as well... >>>> >>>> >>>> Thanks in advance for any insights! >>>> >>>> >>>> Best regards, >>>> >>>> Nimrod >>>> >>>>