A helpful example of how to convert: http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
As far as performance, that depends on your data. If you have a lot of columns and use all of them, parquet deserialization is expensive. If you have a column and only need a few (or have some filters you can push down), the savings can be huge. 2015-05-07 11:29 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>: > 1) What is the best way to convert data from Avro to Parquet so that it > can be later read and processed ? > > 2) Will the performance of processing (join, reduceByKey) be better if > both datasets are in Parquet format when compared to Avro + Sequence ? > > -- > Deepak > >