Re: Parquet versus ORC

Uli Bethke Sun, 06 Mar 2016 07:34:34 -0800

Curious why you think that Parquet does not have metadat at file, rowgroup or column level.Please refer here to the type of metadata that Parquet supports in thedocs http://parquet.apache.org/documentation/latest/


n 06/03/2016 15:26, Mich Talebzadeh wrote:

Hi.
I have been hearing a fair bit about Parquet versus ORC tables.
In a nutshell I can say that Parquet is a predecessor to ORC (bothprovide columnar type storage) but I notice that it is still beingused especially with Spark users.
In mitigation it appears that Spark users are reluctant to use ORCdespite the fact that with inbuilt Store Index it offers superioroptimisation with data and stats at file, stripe and row group level.Both Parquet and ORC offer SNAPPY compression as well. ORC offers ZLIBas default.
There may be other than technical reasons for this adaption, forexample too much reliance on Hive plus the fact that it is easier toflatten Parquet than ORC (whatever that means).
I for myself use either text files or ORC with Hive and Spark anddon't really see any reason why I should adopt others like Avro,Parquet etc.
Appreciate any verification or experience on this.

Thanks
,

Dr Mich Talebzadeh
LinkedIn/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>



--
___________________________
Uli Bethke
Chair Hadoop User Group Ireland
www.hugireland.org
HUG Ireland is community sponsor of Hadoop Summit Europe in Dublin
http://2016.hadoopsummit.org/dublin/

Re: Parquet versus ORC

Reply via email to