Re: Is there a way to write spark RDD to Avro files

2014-08-02 Thread touchdown
YES! This worked! thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Is there a way to write spark RDD to Avro files

2014-08-02 Thread Fengyun RAO
Below works for me: val job = Job.getInstance val schema = Schema.create(Schema.Type.STRING) AvroJob.setOutputKeySchema(job, schema) records.map(item => (new AvroKey[String](item.getGridsumId), NullWritable.get())) .saveAsNewAPIHadoopFile(args(1),

Re: Is there a way to write spark RDD to Avro files

2014-08-01 Thread touchdown
Yes, I saw that after I looked at it closer. Thanks! But I am running into a schema not set error: Writer schema for output key was not set. Use AvroJob.setOutputKeySchema() I am in the process of figuring out how to set schema for an AvroJob from a HDFS file, but any pointer is much appreciated!

Re: Is there a way to write spark RDD to Avro files

2014-08-01 Thread Ron Gonzalez
You have to import org.apache.spark.rdd._, which will automatically make available this method. Thanks, Ron Sent from my iPhone > On Aug 1, 2014, at 3:26 PM, touchdown wrote: > > Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small > avro files into one avro file. I re

Re: Is there a way to write spark RDD to Avro files

2014-08-01 Thread touchdown
Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small avro files into one avro file. I read it in with: sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]](path) but I can't find saveAsHadoopFile or saveAsNewAPIHadoopFile. Can you ple

Re: Is there a way to write spark RDD to Avro files

2014-07-31 Thread Fengyun RAO
Thanks, Marcelo. It works! 2014-07-31 5:37 GMT+08:00 Marcelo Vanzin : > Hi Fengyun, > > Have you tried to use saveAsHadoopFile() (or > saveAsNewAPIHadoopFile())? You should be able to do something with > that API by using AvroKeyValueOutputFormat. > > The API is defined here: > > http://spark.ap

Re: Is there a way to write spark RDD to Avro files

2014-07-30 Thread Marcelo Vanzin
Hi Fengyun, Have you tried to use saveAsHadoopFile() (or saveAsNewAPIHadoopFile())? You should be able to do something with that API by using AvroKeyValueOutputFormat. The API is defined here: http://spark.apache.org/docs/1.0.0/api/scala/#org.apache.spark.rdd.PairRDDFunctions Lots of RDD types i

Re: Is there a way to write spark RDD to Avro files

2014-07-30 Thread Lewis John Mcgibbney
Hi, Have you checked out SchemaRDD? There should be an examp[le of writing to Parquet files there. BTW, FYI I was discussing this with the SparlSQL developers last week and possibly using Apache Gora [0] for achieving this. HTH Lewis [0] http://gora.apache.org On Wed, Jul 30, 2014 at 5:14 AM, Fen