Thanks Sean. There are other dependencies that you need to align with Spark if you need to use them as well - like Guava, Jackson etc. I find them more difficult to use - because you need to go to Spark repo to check the correct version used - and if there are upgrades between versions you need to check that to upgrade as well. What do you think?
Thanks! Nimrod On Tue, Jun 3, 2025 at 8:51 PM Sean Owen <sro...@gmail.com> wrote: > I think this is already how it works. Most apps would depend on just > spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull > in streaming or mllib. > I don't think it's intended that you pull in all submodules for any one > app, although you could. > I don't know if there's some common subset that is both large and commonly > used. > > Maven/SBT already pull in all transitive dependencies. > > On Tue, Jun 3, 2025 at 12:41 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > >> Hi all, >> >> Sorry for bumping this again - just trying to understand if it's worth >> adding a small feature for this - I think it can help Spark users and Spark >> libraries upgrade and support Spark versions a lot easier :) >> If instead of adding many provided dependencies we'll have one that will >> include them all - that's a lot easier to maintain... >> >> >> >> Thanks! >> >> Nimrod >> >> >> On Sun, Jun 1, 2025 at 12:23 AM Nimrod Ofek <ofek.nim...@gmail.com> >> wrote: >> >>> No >>> K8s deployment, nothing special. >>> I just don't see why when I'm developing and compiling or let's say >>> upgrade from spark 3.5 to spark 4.0 I need to upgrade all the dependencies >>> I use but don't actually deploy- but use from the regular spark runtime... >>> >>> Thanks, >>> Nimrod >>> >>> בתאריך שבת, 31 במאי 2025, 23:44, מאת Mich Talebzadeh < >>> mich.talebza...@gmail.com>: >>> >>>> Are you running in YARN mode and you want to put these jar files into >>>> HDFS in a distributed cluster? >>>> HTH >>>> >>>> Dr Mich Talebzadeh, >>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, 31 May 2025 at 19:47, Nimrod Ofek <ofek.nim...@gmail.com> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> Apologies if this is a basic question—I’ve searched around but haven’t >>>>> found a clear answer. >>>>> >>>>> I'm currently developing a Spark application using Scala, and I’m >>>>> looking for a way to include all the JARs typically bundled in a standard >>>>> Spark installation as a single provided dependency. >>>>> >>>>> From what I’ve seen, most examples add each Spark module individually >>>>> (like spark-core, spark-sql, spark-mllib, etc.) as separate provided >>>>> dependencies. >>>>> However, since these are all included in the Spark runtime environment, >>>>> I’m >>>>> wondering why there isn’t a more aggregated dependency—something like a >>>>> parent project or BOM (Bill of Materials) that pulls in all the commonly >>>>> included Spark libraries (along with compatible versions of Log4j, Guava, >>>>> Jackson, and so on) - which is being used in projects. >>>>> >>>>> Is there a particular reason this approach isn’t commonly used? Does >>>>> it cause issues with transitive dependencies or version mismatches? If so >>>>> - >>>>> I'm sure those can be addressed as well... >>>>> >>>>> >>>>> Thanks in advance for any insights! >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Nimrod >>>>> >>>>>