I'll five an example:
If I have a project that reads from Kafka topic avro messages - and writes
them to Delta tables, I would expect to set only:

libraryDependencies ++= Seq(

  "io.delta" %% "delta-spark" % deltaVersion % Provided,
  "org.apache.spark" %% "spark-avro" % sparkVersion,
  "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
  "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
  "za.co.absa" %% "abris" % "6.4.0",
  "org.apache.avro" % "avro" % apacheAvro,
  "io.confluent" % "kafka-schema-registry-client" % "7.5.1",
  "com.github.pureconfig" %% "pureconfig" % "0.17.5"
)

And not to add also

"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,


And to be honest - I don't think that the users really need to understand
the internal structure to know what jar they need to add to use each
feature...
I don't think they need to know what project they need to depend on - as
long as it's already provided... They just need to configure spark-provided
:)

Thanks,
Nimrod


On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote:

> For sure, but, that is what Maven/SBT do. It resolves your project
> dependencies, looking at all their transitive dependencies, according to
> some rules.
> You do not need to re-declare Spark's dependencies in your project, no.
> I'm not quite sure what you mean.
>
> On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>
>> Thanks Sean.
>> There are other dependencies that you need to align with Spark if you
>> need to use them as well - like Guava, Jackson etc.
>> I find them more difficult to use - because you need to go to Spark repo
>> to check the correct version used - and if there are upgrades between
>> versions you need to check that to upgrade as well.
>> What do you think?
>>
>

Reply via email to