Its a bit tricky - if the users data is represented in a DataFrame or Dataset then its much easier. Assuming that the function is going to be called from the driver program (e.g. not inside of a transformation or action) then you can use the Py4J context to make the calls. You might find looking at wrapper.py in the ml directory to help some.
On Tue, Apr 12, 2016 at 4:30 PM, AlexG <swift...@gmail.com> wrote: > I have Scala Spark code for computing a matrix factorization. I'd like to > make it possible to use this code from PySpark, so users can pass in a > python RDD and receive back one without knowing or caring that Scala code > is > being called. > > Please point me to an example of code (e.g. somewhere in the Spark > codebase, > if it's clean enough) from which I can learn how to do this. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/how-to-write-pyspark-interface-to-scala-code-tp26765.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau