Hi there, Few days ago I got an intriguing but hard to answer question: "Why Spark generates Java code and not Scala code?" (https://github.com/bartosz25/spark-scala-playground/issues/18)
Since I'm not sure about the exact answer, I'd like to ask you to confirm or not my thinking. I was looking for the reasons in the JIRA and the research paper "Spark SQL: Relational Data Processing in Spark" ( http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but found nothing explaining why Java over Scala. The single task I found was about why Scala and not Java but concerning data types ( https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing here. My guesses about choosing Java code are: - Java runtime compiler libs are more mature and prod-ready than the Scala's - or at least, they were at the implementation time - Scala compiler tends to be slower than the Java's https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed - Scala compiler seems to be more complex, so debugging & maintaining it would be harder - it was easier to represent a pure Java OO design than mixed FP/OO in Scala ? Thank you for your help. -- Bartosz Konieczny data engineer https://www.waitingforcode.com https://github.com/bartosz25/ https://twitter.com/waitingforcode