Hm I think I may be wrong about pyspark - this could affect the default scala version in other binary distros. The intended default scala version should be 2.11 as you observe. I think the docs are just wrong. That line is filled in by the docs build and seems like it ran from the 2.12 build. I'm going to check the release candidate for 3.2.0 to see if this has come up again or not w.r.t. 2.13.
On Wed, Sep 29, 2021, 7:43 PM Brandon Chinn <[email protected]> wrote: > What do you mean by Pyspark? I downloaded > https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz > and am just running "spark-submit" (running with spark.master = local, > spark.deploy-mode = client). I see the output mentioned in the first > message (Spark 2.4.5, Scala 2.11), which seems to indicate that the Spark > server is running on Scala 2.11. > > In the docs link I initially sent, it says: > > > Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, > Spark 2.4.5 uses Scala 2.12. You will need to use a compatible Scala > version (2.12.x). > > Is "Scala API" here different from "the Scala version that the Spark > server is running in"? > > On Wed, Sep 29, 2021 at 5:11 PM Sean Owen <[email protected]> wrote: > >> If I recall correctly, this only affected Pyspark. There were always 2.11 >> and 2.12 builds of 2.4.x, but, the (single) Pyspark distro shipped with >> 2.12 unintentionally and that was reversed. >> >> This comment is referring to the Scala API. In releases where Scala 2.11 >> and 2.12 were supported, it looks like the docs generation process used >> 2.12, and auto-generated this line. It's "true", but, there was also a 2.11 >> build. And it doesn't tell you what Pyspark has inside, which might matter >> a little more, although, presumably Pyspark users mostly do not care about >> what's going on in the JVM. >> >> It's safe to assume the Pyspark distro will probably stick on the older >> of two Scala versions, when two are available, as is about to be the case >> for Spark 3.2.0 again, which adds 2.13 support. Pyspark distro is still on >> 2.12. >> >> On Wed, Sep 29, 2021 at 6:58 PM Brandon Chinn <[email protected]> >> wrote: >> >>> Hello, >>> >>> I'm looking at this SO post: https://stackoverflow.com/a/56197399, >>> which says that 2.4.1 changed to Scala 2.12, then 2.4.3 changed back to >>> Scala 2.11, but the docs still say Scala 2.12, e.g. >>> https://spark.apache.org/docs/2.4.5/#downloading: >>> >>> For the Scala API, Spark 2.4.5 uses Scala 2.12 >>>> >>> >>> This also doesn't match behavior, as I indeed see >>> >>> Welcome to Spark version 2.4.5 >>> >>> Using Scala version 2.11.12 >>> >>> >>> in the Spark output. Are the docs indeed incorrect? Can they be updated? >>> >>> -- >>> Brandon Chinn >>> LeapYear Technologies (http://leapyear.io) >>> >> > > -- > Brandon Chinn > LeapYear Technologies (http://leapyear.io) >
