Re: EOFException when I list all files in hdfs directory

2014-07-25 Thread Sparky
Thanks for the suggestion. I can confirm that my problem is I have files with zero bytes. It's a known bug and is marked as a high priority: https://issues.apache.org/jira/browse/SPARK-1960 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when

Re: EOFException when I list all files in hdfs directory

2014-07-25 Thread Sparky
I'm pretty sure my problem is related to this unresolved bug regarding files with size zero: https://issues.apache.org/jira/browse/SPARK-1960 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when-I-list-all-files-in-hdfs-directory-tp10648p10649.

EOFException when I list all files in hdfs directory

2014-07-25 Thread Sparky
I'm trying to list and then process all files in an hdfs directory. I'm able to run the code below when I supply a specific AvroSequence file, but if I use a wildcard to get all Avro sequence files in the directory it fails. Anyone know how to do this? val avroRdd = sc.newAPIHadoopFile("hdfs://

Re: collect() on small list causes NullPointerException

2014-07-22 Thread Sparky
For those curious I was using KryoRegistrator it was causing some null pointer exception. I removed the code and problem went away. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-small-list-causes-NullPointerException-tp10400p10402.html Sent fro

collect() on small group of Avro files causes plain NullPointerException

2014-07-22 Thread Sparky
Running a simple collect method on a group of Avro objects causes a plain NullPointerException. Does anyone know what may be wrong? >files.collect() Press ENTER or type command to continue Exception in thread "Executor task launch worker-0" java.lang.NullPointerException at org.apache.sp

Re: NullPointerException When Reading Avro Sequence Files

2014-07-21 Thread Sparky
For those curious I used the JavaSparkContext and got access to an AvroSequenceFile (wrapper around Sequence File) using the following: file = sc.newAPIHadoopFile("", AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class, new Configuration()) -- View this message in context:

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
Thanks for the gist. I'm just now learning about Avro. I think when you use a DataFileWriter you are writing to an Avro Container (which is different than an Avro Sequence File). I have a system where data was written to an HDFS Sequence File using AvroSequenceFile.Writer (which is a wrapper ar

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
To be more specific, I'm working with a system that stores data in org.apache.avro.hadoop.io.AvroSequenceFile format. An AvroSequenceFile is "A wrapper around a Hadoop SequenceFile that also supports reading and writing Avro data." It seems that Spark does not support this out of the box. --

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
I see Spark is using AvroRecordReaderBase, which is used to grab Avro Container Files, which is different from Sequence Files. If anyone is using Avro Sequence Files with success and has an example, please let me know. -- View this message in context: http://apache-spark-user-list.1001560.n3.n

Re: NullPointerException When Reading Avro Sequence Files

2014-07-18 Thread Sparky
Thanks for responding. I tried using the newAPIHadoopFile method and got an IO Exception with the message "Not a data file". If anyone has an example of this working I'd appreciate your input or examples. What I entered at the repl and what I got back are below: val myAvroSequenceFile = sc.