A helpful example of how to convert:
http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/

As far as performance, that depends on your data. If you have a lot of
columns and use all of them, parquet deserialization is expensive. If you
have a column and only need a few (or have some filters you can push down),
the savings can be huge.

2015-05-07 11:29 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:

> 1) What is the best way to convert data from Avro to Parquet so that it
> can be later read and processed ?
>
> 2) Will the performance of processing (join, reduceByKey) be better if
> both datasets are in Parquet format when compared to Avro + Sequence ?
>
> --
> Deepak
>
>

Reply via email to