Re: Is there a way to write spark RDD to Avro files

Fengyun RAO Thu, 31 Jul 2014 02:59:31 -0700

Thanks, Marcelo. It works!


2014-07-31 5:37 GMT+08:00 Marcelo Vanzin <van...@cloudera.com>:

> Hi Fengyun,
>
> Have you tried to use saveAsHadoopFile() (or
> saveAsNewAPIHadoopFile())? You should be able to do something with
> that API by using AvroKeyValueOutputFormat.
>
> The API is defined here:
>
> http://spark.apache.org/docs/1.0.0/api/scala/#org.apache.spark.rdd.PairRDDFunctions
>
> Lots of RDD types include that functionality already.
>
>
> On Wed, Jul 30, 2014 at 2:14 AM, Fengyun RAO <raofeng...@gmail.com> wrote:
> > We used mapreduce for ETL and storing results in Avro files, which are
> > loaded to hive/impala for query.
> >
> > Now we are trying to migrate to spark, but didn't find a way to write
> > resulting RDD to Avro files.
> >
> > I wonder if there is a way to make it, or if not, why spark doesn't
> support
> > Avro as well as mapreduce? Are there any plans?
> >
> > Or what's the recommended way to output spark results with schema? I
> don't
> > think plain text is a good choice.
>
>
>
> --
> Marcelo
>

Re: Is there a way to write spark RDD to Avro files

Reply via email to