Re: Question Regarding Spark Dependencies in Scala

Nimrod Ofek Tue, 03 Jun 2025 15:23:27 -0700

You don't add dependencies you don't use- but you do need to declare
dependencies you do use, and if the platform you are running use a specific
version you need to use that version- you can't break comparability.
Since spark uses a lot of dependencies - I don't expect the user to check
if spark use for instance Jackson, and what version.
I also didn't expect the ordinary user to know if spark structured
streaming uses spark sql or not when they need both- especially when they
are already packaged together in the spark server.


Having said that, I guess that they will just try adding packages and is
something won't compile they will use courser to fix the dependencies...

Thanks anyway!

בתאריך יום ג׳, 3 ביוני 2025, 22:09, מאת Sean Owen ‏<sro...@gmail.com>:

> Do you have an example of what you mean?
>
> Yes, a deployment of Spark has all the modules. You do not need to (should
> not in fact) deploy Spark code with your Spark app for this reason.
> You still need to express dependencies on the Spark code that your app
> uses at *compile* time however, in order to compile, or else how can it
> compile?
> You do not add dependencies that you do not directly use, no.
> This is like any other multi-module project in the Maven/SBT ecosystem.
>
> On Tue, Jun 3, 2025 at 1:59 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>
>> It does not compile if I don't add spark -sql.
>> In usual projects I'd agree with you, but since Spark comes complete with
>> all dependencies unlike other programs where you deploy certain
>> dependencies only- I see no reason for users to select specific
>> dependencies that are already bundled in the spark server up front.
>>
>> בתאריך יום ג׳, 3 ביוני 2025, 21:44, מאת Sean Owen ‏<sro...@gmail.com>:
>>
>>> I think Spark, like any project, is large enough to decompose into
>>> modules, and it has been. A single app almost surely doesn't need all the
>>> modules. So yes you have to depend on the modules you actually need, and I
>>> think that's normal. See Jackson for example.
>>> (spark-sql is not necessary as it's required by the modules you depend
>>> on already)
>>>
>>> What's the name for this new convenience package? spark-avro-sql-kafka?
>>> that seems too specific. And what about the 100 other variations that other
>>> apps need?
>>> For example, some apps will not need spark-sql-kafka but will need
>>> spark-streaming-kafka.
>>>
>>> You do not have to depend on exactly the same versions of dependencies
>>> that Spark does, although that's the safest thing to do. For example,
>>> unless you use Avro directly and its version matters to you, you do not
>>> declare this in your POM. If you do, that's fine, Maven/SBT decides on what
>>> version to use based on what you say and what Spark says. And this could be
>>> wrong, but, that's life in the world of dependencies. Much of the time, it
>>> works.
>>>
>>> On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek <ofek.nim...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I'll five an example:
>>>> If I have a project that reads from Kafka topic avro messages - and
>>>> writes them to Delta tables, I would expect to set only:
>>>>
>>>> libraryDependencies ++= Seq(
>>>>
>>>>   "io.delta" %% "delta-spark" % deltaVersion % Provided,
>>>>   "org.apache.spark" %% "spark-avro" % sparkVersion,
>>>>   "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
>>>>   "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
>>>>   "za.co.absa" %% "abris" % "6.4.0",
>>>>   "org.apache.avro" % "avro" % apacheAvro,
>>>>   "io.confluent" % "kafka-schema-registry-client" % "7.5.1",
>>>>   "com.github.pureconfig" %% "pureconfig" % "0.17.5"
>>>> )
>>>>
>>>> And not to add also
>>>>
>>>> "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
>>>>
>>>>
>>>> And to be honest - I don't think that the users really need to
>>>> understand the internal structure to know what jar they need to add to use
>>>> each feature...
>>>> I don't think they need to know what project they need to depend on -
>>>> as long as it's already provided... They just need to configure
>>>> spark-provided :)
>>>>
>>>> Thanks,
>>>> Nimrod
>>>>
>>>>
>>>> On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> For sure, but, that is what Maven/SBT do. It resolves your project
>>>>> dependencies, looking at all their transitive dependencies, according to
>>>>> some rules.
>>>>> You do not need to re-declare Spark's dependencies in your project, no.
>>>>> I'm not quite sure what you mean.
>>>>>
>>>>> On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Sean.
>>>>>> There are other dependencies that you need to align with Spark if you
>>>>>> need to use them as well - like Guava, Jackson etc.
>>>>>> I find them more difficult to use - because you need to go to Spark
>>>>>> repo to check the correct version used - and if there are upgrades 
>>>>>> between
>>>>>> versions you need to check that to upgrade as well.
>>>>>> What do you think?
>>>>>>
>>>>>

Re: Question Regarding Spark Dependencies in Scala

Reply via email to