Why Spark generates Java code and not Scala?

Bartosz Konieczny Sat, 09 Nov 2019 09:47:23 -0800

Hi there,

Few days ago I got an intriguing but hard to answer question:
"Why Spark generates Java code and not Scala code?"
(https://github.com/bartosz25/spark-scala-playground/issues/18)


Since I'm not sure about the exact answer, I'd like to ask you to confirm
or not my thinking. I was looking for the reasons in the JIRA and the
research paper "Spark SQL: Relational Data Processing in Spark" (
http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but
found nothing explaining why Java over Scala. The single task I found was
about why Scala and not Java but concerning data types (
https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing
here.

My guesses about choosing Java code are:
- Java runtime compiler libs are more mature and prod-ready than the
Scala's - or at least, they were at the implementation time
- Scala compiler tends to be slower than the Java's
https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed
- Scala compiler seems to be more complex, so debugging & maintaining it
would be harder
- it was easier to represent a pure Java OO design than mixed FP/OO in Scala
?

Thank you for your help.

-- 
Bartosz Konieczny
data engineer
https://www.waitingforcode.com
https://github.com/bartosz25/
https://twitter.com/waitingforcode

Why Spark generates Java code and not Scala?

Reply via email to