YES! This worked! thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Below works for me:
val job = Job.getInstance
val schema = Schema.create(Schema.Type.STRING)
AvroJob.setOutputKeySchema(job, schema)
records.map(item => (new AvroKey[String](item.getGridsumId),
NullWritable.get()))
.saveAsNewAPIHadoopFile(args(1),
Yes, I saw that after I looked at it closer. Thanks! But I am running into a
schema not set error:
Writer schema for output key was not set. Use AvroJob.setOutputKeySchema()
I am in the process of figuring out how to set schema for an AvroJob from a
HDFS file, but any pointer is much appreciated!
You have to import org.apache.spark.rdd._, which will automatically make
available this method.
Thanks,
Ron
Sent from my iPhone
> On Aug 1, 2014, at 3:26 PM, touchdown wrote:
>
> Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small
> avro files into one avro file. I re
Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small
avro files into one avro file. I read it in with:
sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable,
AvroKeyInputFormat[GenericRecord]](path)
but I can't find saveAsHadoopFile or saveAsNewAPIHadoopFile. Can you ple
Thanks, Marcelo. It works!
2014-07-31 5:37 GMT+08:00 Marcelo Vanzin :
> Hi Fengyun,
>
> Have you tried to use saveAsHadoopFile() (or
> saveAsNewAPIHadoopFile())? You should be able to do something with
> that API by using AvroKeyValueOutputFormat.
>
> The API is defined here:
>
> http://spark.ap
Hi Fengyun,
Have you tried to use saveAsHadoopFile() (or
saveAsNewAPIHadoopFile())? You should be able to do something with
that API by using AvroKeyValueOutputFormat.
The API is defined here:
http://spark.apache.org/docs/1.0.0/api/scala/#org.apache.spark.rdd.PairRDDFunctions
Lots of RDD types i
Hi,
Have you checked out SchemaRDD?
There should be an examp[le of writing to Parquet files there.
BTW, FYI I was discussing this with the SparlSQL developers last week and
possibly using Apache Gora [0] for achieving this.
HTH
Lewis
[0] http://gora.apache.org
On Wed, Jul 30, 2014 at 5:14 AM, Fen