Hi, I wanted to know more about how Spark supports R and Python, with respect to what gets copied into the language environments.
To clarify : I know that PySpark utilizes py4j sockets to pass pickled python functions between the JVM and the python daemons. However, I wanted to know how it passes the data from the JVM into the daemon environment. I assume it has to copy the data over into the new environment, since python can't exactly operate in JVM heap space, (or can it?). I had the same question with respect to SparkR, though I'm not completely familiar with how they pass around native R code through the worker JVM's. The primary question I wanted to ask is does Spark make a second copy of data, so language-specific daemons can operate on the data? What are some of the other limitations encountered when we try to offer multi-language support, whether it's in performance or in general software architecture. With python in particular the collect operation must be first written to disk and then read back from the python driver process. Would appreciate any insight on this, and if there is any work happening in this area. Thank you, Rahul Palamuttam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Support-of-other-languages-tp24599.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org