Thanks for the suggestion. I can confirm that my problem is I have files
with zero bytes. It's a known bug and is marked as a high priority:
https://issues.apache.org/jira/browse/SPARK-1960
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when
I'm pretty sure my problem is related to this unresolved bug regarding files
with size zero:
https://issues.apache.org/jira/browse/SPARK-1960
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when-I-list-all-files-in-hdfs-directory-tp10648p10649.
I'm trying to list and then process all files in an hdfs directory.
I'm able to run the code below when I supply a specific AvroSequence file,
but if I use a wildcard to get all Avro sequence files in the directory it
fails.
Anyone know how to do this?
val avroRdd = sc.newAPIHadoopFile("hdfs://
For those curious I was using KryoRegistrator it was causing some null
pointer exception. I removed the code and problem went away.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-small-list-causes-NullPointerException-tp10400p10402.html
Sent fro
Running a simple collect method on a group of Avro objects causes a plain
NullPointerException. Does anyone know what may be wrong?
>files.collect()
Press ENTER or type command to continue
Exception in thread "Executor task launch worker-0"
java.lang.NullPointerException
at
org.apache.sp
For those curious I used the JavaSparkContext and got access to an
AvroSequenceFile (wrapper around Sequence File) using the following:
file = sc.newAPIHadoopFile("",
AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class,
new Configuration())
--
View this message in context:
Thanks for the gist. I'm just now learning about Avro. I think when you use
a DataFileWriter you are writing to an Avro Container (which is different
than an Avro Sequence File). I have a system where data was written to an
HDFS Sequence File using AvroSequenceFile.Writer (which is a wrapper ar
To be more specific, I'm working with a system that stores data in
org.apache.avro.hadoop.io.AvroSequenceFile format. An AvroSequenceFile is
"A wrapper around a Hadoop SequenceFile that also supports reading and
writing Avro data."
It seems that Spark does not support this out of the box.
--
I see Spark is using AvroRecordReaderBase, which is used to grab Avro
Container Files, which is different from Sequence Files. If anyone is using
Avro Sequence Files with success and has an example, please let me know.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.n
Thanks for responding. I tried using the newAPIHadoopFile method and got an
IO Exception with the message "Not a data file".
If anyone has an example of this working I'd appreciate your input or
examples.
What I entered at the repl and what I got back are below:
val myAvroSequenceFile = sc.
10 matches
Mail list logo