I think Spark, like any project, is large enough to decompose into modules, and it has been. A single app almost surely doesn't need all the modules. So yes you have to depend on the modules you actually need, and I think that's normal. See Jackson for example. (spark-sql is not necessary as it's required by the modules you depend on already)
What's the name for this new convenience package? spark-avro-sql-kafka? that seems too specific. And what about the 100 other variations that other apps need? For example, some apps will not need spark-sql-kafka but will need spark-streaming-kafka. You do not have to depend on exactly the same versions of dependencies that Spark does, although that's the safest thing to do. For example, unless you use Avro directly and its version matters to you, you do not declare this in your POM. If you do, that's fine, Maven/SBT decides on what version to use based on what you say and what Spark says. And this could be wrong, but, that's life in the world of dependencies. Much of the time, it works. On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > > I'll five an example: > If I have a project that reads from Kafka topic avro messages - and writes > them to Delta tables, I would expect to set only: > > libraryDependencies ++= Seq( > > "io.delta" %% "delta-spark" % deltaVersion % Provided, > "org.apache.spark" %% "spark-avro" % sparkVersion, > "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion, > "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion, > "za.co.absa" %% "abris" % "6.4.0", > "org.apache.avro" % "avro" % apacheAvro, > "io.confluent" % "kafka-schema-registry-client" % "7.5.1", > "com.github.pureconfig" %% "pureconfig" % "0.17.5" > ) > > And not to add also > > "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, > > > And to be honest - I don't think that the users really need to understand > the internal structure to know what jar they need to add to use each > feature... > I don't think they need to know what project they need to depend on - as > long as it's already provided... They just need to configure spark-provided > :) > > Thanks, > Nimrod > > > On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote: > >> For sure, but, that is what Maven/SBT do. It resolves your project >> dependencies, looking at all their transitive dependencies, according to >> some rules. >> You do not need to re-declare Spark's dependencies in your project, no. >> I'm not quite sure what you mean. >> >> On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com> >> wrote: >> >>> Thanks Sean. >>> There are other dependencies that you need to align with Spark if you >>> need to use them as well - like Guava, Jackson etc. >>> I find them more difficult to use - because you need to go to Spark repo >>> to check the correct version used - and if there are upgrades between >>> versions you need to check that to upgrade as well. >>> What do you think? >>> >>