Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Ángel Álvarez Pascua
But... is it not like that in any other Java/Scala/Python/... app that uses dependencies that also have their own dependencies? If you want to provide a library, maybe you should give the user the option to decide if they want an all-in-one ubber jar with shaded (more difficult to debug) dependenc

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Sem
> I may not need anything from spark but if I'll declare a dependency in Jackson or guava with a different version than spark already use and package- I might break things... In that case I would recommend you to use assembly / assemblyShadeRules for sbt-assembly or maven-shade-plugin for maven an

Re: Question Regarding Spark Dependencies in Scala

2025-06-04 Thread Nimrod Ofek
Yes, that was my point. If I'm directly using something or not - it is really there, so it would be beneficial for me to have a way of knowing what are the exact dependencies that I have even if I don't use them directly in case a or b- because they are there. For instance, if I am creating a libr

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
Yes, you're just saying that if your app depends on Foo, and Spark depends on Foo, then ideally you depend on the exact same version Spark uses. Otherwise it's up to Maven/SBT to pick one or the other version, which might or might not be suitable. Yes, dependency conflicts are painful to deal with

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
You don't add dependencies you don't use- but you do need to declare dependencies you do use, and if the platform you are running use a specific version you need to use that version- you can't break comparability. Since spark uses a lot of dependencies - I don't expect the user to check if spark us

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
I'll five an example: If I have a project that reads from Kafka topic avro messages - and writes them to Delta tables, I would expect to set only: libraryDependencies ++= Seq( "io.delta" %% "delta-spark" % deltaVersion % Provided, "org.apache.spark" %% "spark-avro" % sparkVersion, "org.apac

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
Do you have an example of what you mean? Yes, a deployment of Spark has all the modules. You do not need to (should not in fact) deploy Spark code with your Spark app for this reason. You still need to express dependencies on the Spark code that your app uses at *compile* time however, in order to

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
It does not compile if I don't add spark -sql. In usual projects I'd agree with you, but since Spark comes complete with all dependencies unlike other programs where you deploy certain dependencies only- I see no reason for users to select specific dependencies that are already bundled in the spark

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
For sure, but, that is what Maven/SBT do. It resolves your project dependencies, looking at all their transitive dependencies, according to some rules. You do not need to re-declare Spark's dependencies in your project, no. I'm not quite sure what you mean. On Tue, Jun 3, 2025 at 12:55 PM Nimrod O

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
I think Spark, like any project, is large enough to decompose into modules, and it has been. A single app almost surely doesn't need all the modules. So yes you have to depend on the modules you actually need, and I think that's normal. See Jackson for example. (spark-sql is not necessary as it's r

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Thanks Sean. There are other dependencies that you need to align with Spark if you need to use them as well - like Guava, Jackson etc. I find them more difficult to use - because you need to go to Spark repo to check the correct version used - and if there are upgrades between versions you need to

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
I think this is already how it works. Most apps would depend on just spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull in streaming or mllib. I don't think it's intended that you pull in all submodules for any one app, although you could. I don't know if there's some common

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Hi all, Sorry for bumping this again - just trying to understand if it's worth adding a small feature for this - I think it can help Spark users and Spark libraries upgrade and support Spark versions a lot easier :) If instead of adding many provided dependencies we'll have one that will include t

Re: Question Regarding Spark Dependencies in Scala

2025-05-31 Thread Nimrod Ofek
No K8s deployment, nothing special. I just don't see why when I'm developing and compiling or let's say upgrade from spark 3.5 to spark 4.0 I need to upgrade all the dependencies I use but don't actually deploy- but use from the regular spark runtime... Thanks, Nimrod בתאריך שבת, 31 במאי 2025, 23

Re: Question Regarding Spark Dependencies in Scala

2025-05-31 Thread Mich Talebzadeh
Are you running in YARN mode and you want to put these jar files into HDFS in a distributed cluster? HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Sat, 31