subject:"Call Scala API from PySpark"

Re: Call Scala API from PySpark

2016-06-30 Thread Pedro Rodriguez

That was indeed the case, using UTF8Deserializer makes everything work correctly. Thanks for the tips! On Thu, Jun 30, 2016 at 3:32 PM, Pedro Rodriguez wrote: > Quick update, I was able to get most of the plumbing to work thanks to the > code Holden posted and browsing more source code. > > I a

Re: Call Scala API from PySpark

2016-06-30 Thread Pedro Rodriguez

Quick update, I was able to get most of the plumbing to work thanks to the code Holden posted and browsing more source code. I am running into this error which makes me think that maybe I shouldn't be leaving the default python RDD serializer/pickler in place and do something else https://github.c

Re: Call Scala API from PySpark

2016-06-30 Thread Pedro Rodriguez

Thanks Jeff and Holden, A little more context here probably helps. I am working on implementing the idea from this article to make reads from S3 faster: http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219 (although my name is Pedro, I am not the author of the article). The

Re: Call Scala API from PySpark

2016-06-30 Thread Holden Karau

So I'm a little biased - I think the bet bride between the two is using DataFrames. I've got some examples in my talk and on the high performance spark GitHub https://github.com/high-performance-spark/high-performance-spark-examples/blob/master/high_performance_pyspark/simple_perf_test.py calls som

Re: Call Scala API from PySpark

2016-06-30 Thread Jeff Zhang

Hi Pedro, Your use case is interesting. I think launching java gateway is the same as native SparkContext, the only difference is on creating your custom SparkContext instead of native SparkContext. You might also need to wrap it using java. https://github.com/apache/spark/blob/v1.6.2/python/pys

Call Scala API from PySpark

2016-06-30 Thread Pedro Rodriguez

Hi All, I have written a Scala package which essentially wraps the SparkContext around a custom class that adds some functionality specific to our internal use case. I am trying to figure out the best way to call this from PySpark. I would like to do this similarly to how Spark itself calls the J

Re: Call Scala API from PySpark

Re: Call Scala API from PySpark

Re: Call Scala API from PySpark

Re: Call Scala API from PySpark

Re: Call Scala API from PySpark

Call Scala API from PySpark

6 matches

Site Navigation

Mail list logo

Footer information