> I may not need anything from spark but if I'll declare a dependency in Jackson or guava with a different version than spark already use and package- I might break things...
In that case I would recommend you to use assembly / assemblyShadeRules for sbt-assembly or maven-shade-plugin for maven and shade dependencies like jackson or guava to avoid conflicts with spark when you pack everything to the ubber jar. On Wed, 2025-06-04 at 11:52 +0300, Nimrod Ofek wrote: > Yes, that was my point. > If I'm directly using something or not - it is really there, so it > would be beneficial for me to have a way of knowing what are the > exact dependencies that I have even if I don't use them directly in > case a or b- because they are there. > For instance, if I am creating a library for Delta that helps track > the lag in structured streaming delta to delta tables streams - I may > not need anything from spark but if I'll declare a dependency in > Jackson or guava with a different version than spark already use and > package- I might break things... Because I'll add Jackson or guava in > my ubber jar- and that will cause issues with the out of the box > deployed jars... > > בתאריך יום ד׳, 4 ביוני 2025, 01:38, מאת Sean Owen > <sro...@gmail.com>: > > Yes, you're just saying that if your app depends on Foo, and Spark > > depends on Foo, then ideally you depend on the exact same version > > Spark uses. Otherwise it's up to Maven/SBT to pick one or the other > > version, which might or might not be suitable. Yes, dependency > > conflicts are painful to deal with and a real thing everywhere, and > > this gets into discussions like, why isn't everything shaded? but > > that's not the point here I think. > > > > But if your app depends on Foo, then Foo is in your POM regardless > > of what Spark does. It gets painful to figure out if that conflicts > > with Spark's dependencies, sure, but you can figure it out with > > dependency:tree or similar, but I also don't think adding a POM- > > only module changes any of that? You still have the same problem > > even if there is a spark-uber package depending on every module. > > > > KNowing which submodule is of interest - that does take some work. > > It's hopefully in the docs, and most apps just need spark-sql, but > > I can see this as an issue. > > > > I could see an argument for declaring a single POM-only artifact > > that depends on all Spark modules. Then you depend on that as > > 'provided' and you have all of Spark in compile scope only. (This > > is almost what spark-parent does but I don't think it works that > > way). It feels inaccurate, and not helpful for most use cases, but > > I don't see a major problem with it actually. Your dependency graph > > gets a lot bigger with stuff you don't need, but it's all in > > provided scope anyway. > > > > On Tue, Jun 3, 2025 at 5:23 PM Nimrod Ofek <ofek.nim...@gmail.com> > > wrote: > > > You don't add dependencies you don't use- but you do need to > > > declare dependencies you do use, and if the platform you are > > > running use a specific version you need to use that version- you > > > can't break comparability. > > > Since spark uses a lot of dependencies - I don't expect the user > > > to check if spark use for instance Jackson, and what version. > > > I also didn't expect the ordinary user to know if spark > > > structured streaming uses spark sql or not when they need both- > > > especially when they are already packaged together in the spark > > > server. > > > Having said that, I guess that they will just try adding packages > > > and is something won't compile they will use courser to fix the > > > dependencies... > > > Thanks anyway! > > > > > > בתאריך יום ג׳, 3 ביוני 2025, 22:09, מאת Sean Owen > > > <sro...@gmail.com>: > > > > Do you have an example of what you mean? > > > > > > > > Yes, a deployment of Spark has all the modules. You do not need > > > > to (should not in fact) deploy Spark code with your Spark app > > > > for this reason. > > > > You still need to express dependencies on the Spark code that > > > > your app uses at compile time however, in order to compile, or > > > > else how can it compile? > > > > You do not add dependencies that you do not directly use, no. > > > > This is like any other multi-module project in the Maven/SBT > > > > ecosystem. > > > > > > > > On Tue, Jun 3, 2025 at 1:59 PM Nimrod Ofek > > > > <ofek.nim...@gmail.com> wrote: > > > > > It does not compile if I don't add spark -sql. > > > > > In usual projects I'd agree with you, but since Spark comes > > > > > complete with all dependencies unlike other programs where > > > > > you deploy certain dependencies only- I see no reason for > > > > > users to select specific dependencies that are already > > > > > bundled in the spark server up front. > > > > > > > > > > בתאריך יום ג׳, 3 ביוני 2025, 21:44, מאת Sean Owen > > > > > <sro...@gmail.com>: > > > > > > I think Spark, like any project, is large enough to > > > > > > decompose into modules, and it has been. A single app > > > > > > almost surely doesn't need all the modules. So yes you have > > > > > > to depend on the modules you actually need, and I think > > > > > > that's normal. See Jackson for example. > > > > > > (spark-sql is not necessary as it's required by the modules > > > > > > you depend on already) > > > > > > > > > > > > What's the name for this new convenience package? spark- > > > > > > avro-sql-kafka? that seems too specific. And what about the > > > > > > 100 other variations that other apps need? > > > > > > For example, some apps will not need spark-sql-kafka but > > > > > > will need spark-streaming-kafka. > > > > > > > > > > > > You do not have to depend on exactly the same versions of > > > > > > dependencies that Spark does, although that's the safest > > > > > > thing to do. For example, unless you use Avro directly and > > > > > > its version matters to you, you do not declare this in your > > > > > > POM. If you do, that's fine, Maven/SBT decides on what > > > > > > version to use based on what you say and what Spark says. > > > > > > And this could be wrong, but, that's life in the world of > > > > > > dependencies. Much of the time, it works. > > > > > > > > > > > > On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek > > > > > > <ofek.nim...@gmail.com> wrote: > > > > > > > > > > > > > > I'll five an example: > > > > > > > If I have a project that reads from Kafka topic avro > > > > > > > messages - and writes them to Delta tables, I would > > > > > > > expect to set only: > > > > > > > > > > > > > > libraryDependencies ++= Seq( > > > > > > > > > > > > > > "io.delta" %% "delta-spark" % deltaVersion % Provided, > > > > > > > "org.apache.spark" %% "spark-avro" % sparkVersion, > > > > > > > "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion, > > > > > > > "org.apache.spark" %% "spark-streaming-kafka-0-10" % > > > > > > > sparkVersion, > > > > > > > "za.co.absa" %% "abris" % "6.4.0", > > > > > > > "org.apache.avro" % "avro" % apacheAvro, > > > > > > > "io.confluent" % "kafka-schema-registry-client" % "7.5.1", > > > > > > > "com.github.pureconfig" %% "pureconfig" % "0.17.5" > > > > > > > ) > > > > > > > And not to add also > > > > > > > "org.apache.spark" %% "spark-sql" % sparkVersion % > > > > > > > Provided, > > > > > > > > > > > > > > And to be honest - I don't think that the users really > > > > > > > need to understand the internal structure to know what > > > > > > > jar they need to add to use each feature... > > > > > > > I don't think they need to know what project they need to > > > > > > > depend on - as long as it's already provided... They just > > > > > > > need to configure spark-provided :) > > > > > > > > > > > > > > Thanks, > > > > > > > Nimrod > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 3, 2025 at 8:57 PM Sean Owen > > > > > > > <sro...@gmail.com> wrote: > > > > > > > > For sure, but, that is what Maven/SBT do. It resolves > > > > > > > > your project dependencies, looking at all their > > > > > > > > transitive dependencies, according to some rules. > > > > > > > > You do not need to re-declare Spark's dependencies in > > > > > > > > your project, no. > > > > > > > > I'm not quite sure what you mean. > > > > > > > > > > > > > > > > On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek > > > > > > > > <ofek.nim...@gmail.com> wrote: > > > > > > > > > Thanks Sean. > > > > > > > > > There are other dependencies that you need to align > > > > > > > > > with Spark if you need to use them as well - like > > > > > > > > > Guava, Jackson etc. > > > > > > > > > I find them more difficult to use - because you need > > > > > > > > > to go to Spark repo to check the correct version used > > > > > > > > > - and if there are upgrades between versions you need > > > > > > > > > to check that to upgrade as well. > > > > > > > > > What do you think?