Well TIL. For those also newly informed: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-whole-stage-codegen.html https://mail-archives.apache.org/mod_mbox/spark-dev/201911.mbox/browser
On Sun, Nov 10, 2019 at 7:57 AM Holden Karau <hol...@pigscanfly.ca> wrote: > *This Message originated outside your organization.* > ------------------------------ > If you look inside of the generation we generate java code and compile it > with Janino. For interested folks the conversation moved over to the dev@ > list > > On Sat, Nov 9, 2019 at 10:37 AM Marcin Tustin > <marcin.tus...@bluevoyant.com.invalid> wrote: > >> What do you mean by this? Spark is written in a combination of Scala and >> Java, and then compiled to Java Byte Code, as is typical for both Scala and >> Java. If there's additional byte code generation happening, it's java byte >> code, because the platform runs on the JVM. >> >> On Sat, Nov 9, 2019 at 12:47 PM Bartosz Konieczny < >> bartkoniec...@gmail.com> wrote: >> >>> *This Message originated outside your organization.* >>> ------------------------------ >>> Hi there, >>> >> >>> Few days ago I got an intriguing but hard to answer question: >>> "Why Spark generates Java code and not Scala code?" >>> (https://github.com/bartosz25/spark-scala-playground/issues/18 >>> <https://github.com/bartosz25/spark-scala-playground/issues/18> >>> ) >>> >>> Since I'm not sure about the exact answer, I'd like to ask you to >>> confirm or not my thinking. I was looking for the reasons in the JIRA and >>> the research paper "Spark SQL: Relational Data Processing in Spark" ( >>> http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf >>> <http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf>) >>> but found nothing explaining why Java over Scala. The single task I found >>> was about why Scala and not Java but concerning data types ( >>> https://issues.apache.org/jira/browse/SPARK-5193 >>> <https://issues.apache.org/jira/browse/SPARK-5193>) >>> That's why I'm writing here. >>> >>> My guesses about choosing Java code are: >>> - Java runtime compiler libs are more mature and prod-ready than the >>> Scala's - or at least, they were at the implementation time >>> - Scala compiler tends to be slower than the Java's >>> https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed >>> <https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed> >>> - Scala compiler seems to be more complex, so debugging & maintaining it >>> would be harder >>> - it was easier to represent a pure Java OO design than mixed FP/OO in >>> Scala >>> ? >>> >>> Thank you for your help. >>> >>> -- >>> Bartosz Konieczny >>> data engineer >>> https://www.waitingforcode.com >>> <https://www.waitingforcode.com> >>> https://github.com/bartosz25/ >>> <https://github.com/bartosz25/> >>> https://twitter.com/waitingforcode >>> <https://twitter.com/waitingforcode> >>> >>> -- > Twitter: https://twitter.com/holdenkarau > <https://twitter.com/holdenkarau> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > <https://www.youtube.com/user/holdenkarau> >