> I may not need anything from spark but if I'll declare a dependency
in Jackson or guava with a different version than spark already use and
package- I might break things...

In that case I would recommend you to use assembly / assemblyShadeRules
for sbt-assembly or maven-shade-plugin for maven and shade dependencies
like jackson or guava to avoid conflicts with spark when you pack
everything to the ubber jar.

On Wed, 2025-06-04 at 11:52 +0300, Nimrod Ofek wrote:
> Yes, that was my point. 
> If I'm directly using something or not - it is really there, so it
> would be beneficial for me to have a way of knowing what are the
> exact dependencies that I have even if I don't use them directly in
> case a or b- because they are there. 
> For instance, if I am creating a library for Delta that helps track
> the lag in structured streaming delta to delta tables streams - I may
> not need anything from spark but if I'll declare a dependency in
> Jackson or guava with a different version than spark already use and
> package- I might break things... Because I'll add Jackson or guava in
> my ubber jar- and that will cause issues with the out of the box
> deployed jars...
> 
> בתאריך יום ד׳, 4 ביוני 2025, 01:38, מאת Sean Owen
> ‏<sro...@gmail.com>:
> > Yes, you're just saying that if your app depends on Foo, and Spark
> > depends on Foo, then ideally you depend on the exact same version
> > Spark uses. Otherwise it's up to Maven/SBT to pick one or the other
> > version, which might or might not be suitable. Yes, dependency
> > conflicts are painful to deal with and a real thing everywhere, and
> > this gets into discussions like, why isn't everything shaded? but
> > that's not the point here I think.
> > 
> > But if your app depends on Foo, then Foo is in your POM regardless
> > of what Spark does. It gets painful to figure out if that conflicts
> > with Spark's dependencies, sure, but you can figure it out with
> > dependency:tree or similar, but I also don't think adding a POM-
> > only module changes any of that? You still have the same problem
> > even if there is a spark-uber package depending on every module.
> > 
> > KNowing which submodule is of interest - that does take some work.
> > It's hopefully in the docs, and most apps just need spark-sql, but
> > I can see this as an issue.
> > 
> > I could see an argument for declaring a single POM-only artifact
> > that depends on all Spark modules. Then you depend on that as
> > 'provided' and you have all of Spark in compile scope only. (This
> > is almost what spark-parent does but I don't think it works that
> > way). It feels inaccurate, and not helpful for most use cases, but
> > I don't see a major problem with it actually. Your dependency graph
> > gets a lot bigger with stuff you don't need, but it's all in
> > provided scope anyway.
> > 
> > On Tue, Jun 3, 2025 at 5:23 PM Nimrod Ofek <ofek.nim...@gmail.com>
> > wrote:
> > > You don't add dependencies you don't use- but you do need to
> > > declare dependencies you do use, and if the platform you are
> > > running use a specific version you need to use that version- you
> > > can't break comparability. 
> > > Since spark uses a lot of dependencies - I don't expect the user
> > > to check if spark use for instance Jackson, and what version. 
> > > I also didn't expect the ordinary user to know if spark
> > > structured streaming uses spark sql or not when they need both-
> > > especially when they are already packaged together in the spark
> > > server. 
> > > Having said that, I guess that they will just try adding packages
> > > and is something won't compile they will use courser to fix the
> > > dependencies...
> > > Thanks anyway!
> > > 
> > > בתאריך יום ג׳, 3 ביוני 2025, 22:09, מאת Sean Owen
> > > ‏<sro...@gmail.com>:
> > > > Do you have an example of what you mean?
> > > > 
> > > > Yes, a deployment of Spark has all the modules. You do not need
> > > > to (should not in fact) deploy Spark code with your Spark app
> > > > for this reason.
> > > > You still need to express dependencies on the Spark code that
> > > > your app uses at compile time however, in order to compile, or
> > > > else how can it compile?
> > > > You do not add dependencies that you do not directly use, no.
> > > > This is like any other multi-module project in the Maven/SBT
> > > > ecosystem.
> > > > 
> > > > On Tue, Jun 3, 2025 at 1:59 PM Nimrod Ofek
> > > > <ofek.nim...@gmail.com> wrote:
> > > > > It does not compile if I don't add spark -sql.
> > > > > In usual projects I'd agree with you, but since Spark comes
> > > > > complete with all dependencies unlike other programs where
> > > > > you deploy certain dependencies only- I see no reason for
> > > > > users to select specific dependencies that are already
> > > > > bundled in the spark server up front.
> > > > > 
> > > > > בתאריך יום ג׳, 3 ביוני 2025, 21:44, מאת Sean Owen
> > > > > ‏<sro...@gmail.com>:
> > > > > > I think Spark, like any project, is large enough to
> > > > > > decompose into modules, and it has been. A single app
> > > > > > almost surely doesn't need all the modules. So yes you have
> > > > > > to depend on the modules you actually need, and I think
> > > > > > that's normal. See Jackson for example.
> > > > > > (spark-sql is not necessary as it's required by the modules
> > > > > > you depend on already)
> > > > > > 
> > > > > > What's the name for this new convenience package? spark-
> > > > > > avro-sql-kafka? that seems too specific. And what about the
> > > > > > 100 other variations that other apps need?
> > > > > > For example, some apps will not need spark-sql-kafka but
> > > > > > will need spark-streaming-kafka.
> > > > > > 
> > > > > > You do not have to depend on exactly the same versions of
> > > > > > dependencies that Spark does, although that's the safest
> > > > > > thing to do. For example, unless you use Avro directly and
> > > > > > its version matters to you, you do not declare this in your
> > > > > > POM. If you do, that's fine, Maven/SBT decides on what
> > > > > > version to use based on what you say and what Spark says.
> > > > > > And this could be wrong, but, that's life in the world of
> > > > > > dependencies. Much of the time, it works.
> > > > > > 
> > > > > > On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek
> > > > > > <ofek.nim...@gmail.com> wrote:
> > > > > > > 
> > > > > > > I'll five an example:
> > > > > > > If I have a project that reads from Kafka topic avro
> > > > > > > messages - and writes them to Delta tables, I would
> > > > > > > expect to set only:
> > > > > > > 
> > > > > > > libraryDependencies ++= Seq(
> > > > > > > 
> > > > > > >   "io.delta" %% "delta-spark" % deltaVersion % Provided,
> > > > > > >   "org.apache.spark" %% "spark-avro" % sparkVersion,
> > > > > > >   "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
> > > > > > >   "org.apache.spark" %% "spark-streaming-kafka-0-10" % 
> > > > > > > sparkVersion,
> > > > > > >   "za.co.absa" %% "abris" % "6.4.0",
> > > > > > >   "org.apache.avro" % "avro" % apacheAvro,
> > > > > > >   "io.confluent" % "kafka-schema-registry-client" % "7.5.1",
> > > > > > >   "com.github.pureconfig" %% "pureconfig" % "0.17.5"
> > > > > > > )
> > > > > > > And not to add also
> > > > > > > "org.apache.spark" %% "spark-sql" % sparkVersion %
> > > > > > > Provided,
> > > > > > > 
> > > > > > > And to be honest - I don't think that the users really
> > > > > > > need to understand the internal structure to know what
> > > > > > > jar they need to add to use each feature...
> > > > > > > I don't think they need to know what project they need to
> > > > > > > depend on - as long as it's already provided... They just
> > > > > > > need to configure spark-provided :)
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Nimrod
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, Jun 3, 2025 at 8:57 PM Sean Owen
> > > > > > > <sro...@gmail.com> wrote:
> > > > > > > > For sure, but, that is what Maven/SBT do. It resolves
> > > > > > > > your project dependencies, looking at all their
> > > > > > > > transitive dependencies, according to some rules.
> > > > > > > > You do not need to re-declare Spark's dependencies in
> > > > > > > > your project, no.
> > > > > > > > I'm not quite sure what you mean.
> > > > > > > > 
> > > > > > > > On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek
> > > > > > > > <ofek.nim...@gmail.com> wrote:
> > > > > > > > > Thanks Sean.
> > > > > > > > > There are other dependencies that you need to align
> > > > > > > > > with Spark if you need to use them as well - like
> > > > > > > > > Guava, Jackson etc.
> > > > > > > > > I find them more difficult to use - because you need
> > > > > > > > > to go to Spark repo to check the correct version used
> > > > > > > > > - and if there are upgrades between versions you need
> > > > > > > > > to check that to upgrade as well.
> > > > > > > > > What do you think?

Reply via email to