You don't add dependencies you don't use- but you do need to declare dependencies you do use, and if the platform you are running use a specific version you need to use that version- you can't break comparability. Since spark uses a lot of dependencies - I don't expect the user to check if spark use for instance Jackson, and what version. I also didn't expect the ordinary user to know if spark structured streaming uses spark sql or not when they need both- especially when they are already packaged together in the spark server.
Having said that, I guess that they will just try adding packages and is something won't compile they will use courser to fix the dependencies... Thanks anyway! בתאריך יום ג׳, 3 ביוני 2025, 22:09, מאת Sean Owen <sro...@gmail.com>: > Do you have an example of what you mean? > > Yes, a deployment of Spark has all the modules. You do not need to (should > not in fact) deploy Spark code with your Spark app for this reason. > You still need to express dependencies on the Spark code that your app > uses at *compile* time however, in order to compile, or else how can it > compile? > You do not add dependencies that you do not directly use, no. > This is like any other multi-module project in the Maven/SBT ecosystem. > > On Tue, Jun 3, 2025 at 1:59 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: > >> It does not compile if I don't add spark -sql. >> In usual projects I'd agree with you, but since Spark comes complete with >> all dependencies unlike other programs where you deploy certain >> dependencies only- I see no reason for users to select specific >> dependencies that are already bundled in the spark server up front. >> >> בתאריך יום ג׳, 3 ביוני 2025, 21:44, מאת Sean Owen <sro...@gmail.com>: >> >>> I think Spark, like any project, is large enough to decompose into >>> modules, and it has been. A single app almost surely doesn't need all the >>> modules. So yes you have to depend on the modules you actually need, and I >>> think that's normal. See Jackson for example. >>> (spark-sql is not necessary as it's required by the modules you depend >>> on already) >>> >>> What's the name for this new convenience package? spark-avro-sql-kafka? >>> that seems too specific. And what about the 100 other variations that other >>> apps need? >>> For example, some apps will not need spark-sql-kafka but will need >>> spark-streaming-kafka. >>> >>> You do not have to depend on exactly the same versions of dependencies >>> that Spark does, although that's the safest thing to do. For example, >>> unless you use Avro directly and its version matters to you, you do not >>> declare this in your POM. If you do, that's fine, Maven/SBT decides on what >>> version to use based on what you say and what Spark says. And this could be >>> wrong, but, that's life in the world of dependencies. Much of the time, it >>> works. >>> >>> On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek <ofek.nim...@gmail.com> >>> wrote: >>> >>>> >>>> I'll five an example: >>>> If I have a project that reads from Kafka topic avro messages - and >>>> writes them to Delta tables, I would expect to set only: >>>> >>>> libraryDependencies ++= Seq( >>>> >>>> "io.delta" %% "delta-spark" % deltaVersion % Provided, >>>> "org.apache.spark" %% "spark-avro" % sparkVersion, >>>> "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion, >>>> "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion, >>>> "za.co.absa" %% "abris" % "6.4.0", >>>> "org.apache.avro" % "avro" % apacheAvro, >>>> "io.confluent" % "kafka-schema-registry-client" % "7.5.1", >>>> "com.github.pureconfig" %% "pureconfig" % "0.17.5" >>>> ) >>>> >>>> And not to add also >>>> >>>> "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, >>>> >>>> >>>> And to be honest - I don't think that the users really need to >>>> understand the internal structure to know what jar they need to add to use >>>> each feature... >>>> I don't think they need to know what project they need to depend on - >>>> as long as it's already provided... They just need to configure >>>> spark-provided :) >>>> >>>> Thanks, >>>> Nimrod >>>> >>>> >>>> On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> For sure, but, that is what Maven/SBT do. It resolves your project >>>>> dependencies, looking at all their transitive dependencies, according to >>>>> some rules. >>>>> You do not need to re-declare Spark's dependencies in your project, no. >>>>> I'm not quite sure what you mean. >>>>> >>>>> On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Sean. >>>>>> There are other dependencies that you need to align with Spark if you >>>>>> need to use them as well - like Guava, Jackson etc. >>>>>> I find them more difficult to use - because you need to go to Spark >>>>>> repo to check the correct version used - and if there are upgrades >>>>>> between >>>>>> versions you need to check that to upgrade as well. >>>>>> What do you think? >>>>>> >>>>>