Re: Calling Scala/Java methods which operates on RDD

2014-07-11 Thread Kan Zhang
Hi Jai, Your suspicion is correct. In general, Python RDDs are pickled into byte arrays and stored in Java land as RDDs of byte arrays. union/zip operates on byte arrays directly without deserializing. Currently, Python byte arrays only get unpickled into Java objects in special cases, like SQL fu

Calling Scala/Java methods which operates on RDD

2014-07-11 Thread Jai Kumar Singh
HI, I want to write some common utility function in Scala and want to call the same from Java/Python Spark API ( may be add some wrapper code around scala calls). Calling Scala functions from Java works fine. I was reading pyspark rdd code and find out that pyspark is able to call JavaRDD functio