Spark SQL using the Data Source API can also do this with much less code
<https://twitter.com/michaelarmbrust/status/579346328636891136>.

https://github.com/databricks/spark-avro

On Thu, May 7, 2015 at 8:41 AM, Jonathan Coveney <[email protected]> wrote:

> A helpful example of how to convert:
> http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
>
> As far as performance, that depends on your data. If you have a lot of
> columns and use all of them, parquet deserialization is expensive. If you
> have a column and only need a few (or have some filters you can push down),
> the savings can be huge.
>
> 2015-05-07 11:29 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]>:
>
> 1) What is the best way to convert data from Avro to Parquet so that it
>> can be later read and processed ?
>>
>> 2) Will the performance of processing (join, reduceByKey) be better if
>> both datasets are in Parquet format when compared to Avro + Sequence ?
>>
>> --
>> Deepak
>>
>>
>

Reply via email to