How to load avro file into spark not on Hadoop in pyspark?

sa Mon, 13 Apr 2015 13:22:59 -0700

I have 2 questions related to pyspark:

1. How do I load an avro file that is on the local filesystem as opposed to
hadoop? I tried the following and I just get NullPointerExceptions:


avro_rdd = sc.newAPIHadoopFile(
    "file:///c:/my-file.avro",
    "org.apache.avro.mapreduce.AvroKeyInputFormat",
    "org.apache.avro.mapred.AvroKey",
    "org.apache.hadoop.io.NullWritable",
   
keyConverter="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter",
    conf=None)

2. If I have a stream of bytes with the avro "avrobytes", is there a way I
can create a spark context from it?

Let me know if either of the two above is possible, and if so, how.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-load-avro-file-into-spark-not-on-Hadoop-in-pyspark-tp22480.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to load avro file into spark not on Hadoop in pyspark?

Reply via email to