I'll five an example: If I have a project that reads from Kafka topic avro messages - and writes them to Delta tables, I would expect to set only:
libraryDependencies ++= Seq( "io.delta" %% "delta-spark" % deltaVersion % Provided, "org.apache.spark" %% "spark-avro" % sparkVersion, "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion, "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion, "za.co.absa" %% "abris" % "6.4.0", "org.apache.avro" % "avro" % apacheAvro, "io.confluent" % "kafka-schema-registry-client" % "7.5.1", "com.github.pureconfig" %% "pureconfig" % "0.17.5" ) And not to add also "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, And to be honest - I don't think that the users really need to understand the internal structure to know what jar they need to add to use each feature... I don't think they need to know what project they need to depend on - as long as it's already provided... They just need to configure spark-provided :) Thanks, Nimrod On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote: > For sure, but, that is what Maven/SBT do. It resolves your project > dependencies, looking at all their transitive dependencies, according to > some rules. > You do not need to re-declare Spark's dependencies in your project, no. > I'm not quite sure what you mean. > > On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > >> Thanks Sean. >> There are other dependencies that you need to align with Spark if you >> need to use them as well - like Guava, Jackson etc. >> I find them more difficult to use - because you need to go to Spark repo >> to check the correct version used - and if there are upgrades between >> versions you need to check that to upgrade as well. >> What do you think? >> >