Do you have an example of what you mean?

Yes, a deployment of Spark has all the modules. You do not need to (should
not in fact) deploy Spark code with your Spark app for this reason.
You still need to express dependencies on the Spark code that your app uses
at *compile* time however, in order to compile, or else how can it compile?
You do not add dependencies that you do not directly use, no.
This is like any other multi-module project in the Maven/SBT ecosystem.

On Tue, Jun 3, 2025 at 1:59 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:

> It does not compile if I don't add spark -sql.
> In usual projects I'd agree with you, but since Spark comes complete with
> all dependencies unlike other programs where you deploy certain
> dependencies only- I see no reason for users to select specific
> dependencies that are already bundled in the spark server up front.
>
> בתאריך יום ג׳, 3 ביוני 2025, 21:44, מאת Sean Owen ‏<sro...@gmail.com>:
>
>> I think Spark, like any project, is large enough to decompose into
>> modules, and it has been. A single app almost surely doesn't need all the
>> modules. So yes you have to depend on the modules you actually need, and I
>> think that's normal. See Jackson for example.
>> (spark-sql is not necessary as it's required by the modules you depend on
>> already)
>>
>> What's the name for this new convenience package? spark-avro-sql-kafka?
>> that seems too specific. And what about the 100 other variations that other
>> apps need?
>> For example, some apps will not need spark-sql-kafka but will need
>> spark-streaming-kafka.
>>
>> You do not have to depend on exactly the same versions of dependencies
>> that Spark does, although that's the safest thing to do. For example,
>> unless you use Avro directly and its version matters to you, you do not
>> declare this in your POM. If you do, that's fine, Maven/SBT decides on what
>> version to use based on what you say and what Spark says. And this could be
>> wrong, but, that's life in the world of dependencies. Much of the time, it
>> works.
>>
>> On Tue, Jun 3, 2025 at 1:35 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote:
>>
>>>
>>> I'll five an example:
>>> If I have a project that reads from Kafka topic avro messages - and
>>> writes them to Delta tables, I would expect to set only:
>>>
>>> libraryDependencies ++= Seq(
>>>
>>>   "io.delta" %% "delta-spark" % deltaVersion % Provided,
>>>   "org.apache.spark" %% "spark-avro" % sparkVersion,
>>>   "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
>>>   "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
>>>   "za.co.absa" %% "abris" % "6.4.0",
>>>   "org.apache.avro" % "avro" % apacheAvro,
>>>   "io.confluent" % "kafka-schema-registry-client" % "7.5.1",
>>>   "com.github.pureconfig" %% "pureconfig" % "0.17.5"
>>> )
>>>
>>> And not to add also
>>>
>>> "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
>>>
>>>
>>> And to be honest - I don't think that the users really need to
>>> understand the internal structure to know what jar they need to add to use
>>> each feature...
>>> I don't think they need to know what project they need to depend on - as
>>> long as it's already provided... They just need to configure spark-provided
>>> :)
>>>
>>> Thanks,
>>> Nimrod
>>>
>>>
>>> On Tue, Jun 3, 2025 at 8:57 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> For sure, but, that is what Maven/SBT do. It resolves your project
>>>> dependencies, looking at all their transitive dependencies, according to
>>>> some rules.
>>>> You do not need to re-declare Spark's dependencies in your project, no.
>>>> I'm not quite sure what you mean.
>>>>
>>>> On Tue, Jun 3, 2025 at 12:55 PM Nimrod Ofek <ofek.nim...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Sean.
>>>>> There are other dependencies that you need to align with Spark if you
>>>>> need to use them as well - like Guava, Jackson etc.
>>>>> I find them more difficult to use - because you need to go to Spark
>>>>> repo to check the correct version used - and if there are upgrades between
>>>>> versions you need to check that to upgrade as well.
>>>>> What do you think?
>>>>>
>>>>

Reply via email to