subject:"PySpark \: couldn't pickle object of type class T"

Re: PySpark : couldn't pickle object of type class T

2016-02-28 Thread Jeff Zhang

Hi Anoop, I don't see the exception you mentioned in the link. I can use spark-avro to read the sample file users.avro in spark successfully. Do you have the details of the union issue ? On Sat, Feb 27, 2016 at 10:05 AM, Anoop Shiralige wrote: > Hi Jeff, > > Thank you for looking into the pos

Re: PySpark : couldn't pickle object of type class T

2016-02-26 Thread Anoop Shiralige

Hi Jeff, Thank you for looking into the post. I had explored spark-avro option earlier. Since, we have union of multiple complex data types in our avro schema we couldn't use it. Couple of things I tried. - https://stackoverflow.com/questions/31261376/how-to-read-pyspark-avro-file-and-ext

Re: PySpark : couldn't pickle object of type class T

2016-02-24 Thread Jeff Zhang

Avro Record is not supported by pickler, you need to create a custom pickler for it. But I don't think it worth to do that. Actually you can use package spark-avro to load avro data and then convert it to RDD if necessary. https://github.com/databricks/spark-avro On Thu, Feb 11, 2016 at 10:38 P

PySpark : couldn't pickle object of type class T

2016-02-11 Thread Anoop Shiralige

Hi All, I am working with Spark 1.6.0 and pySpark shell specifically. I have an JavaRDD[org.apache.avro.GenericRecord] which I have converted to pythonRDD in the following way. javaRDD = sc._jvm.java.package.loadJson("path to data", sc._jsc) javaPython = sc._jvm.SerDe.javaToPython(javaRDD) from