ORC has what is called storage index built in that provide data + statistics.
It provide stats at file, stripe and rowgroup (batches of rows) levels. In terms of efficiency, for Data warehouse applications it is best format I believe On 18/02/2016 07:38, Abhishek Dubey wrote: > I think it's fair to say that one of the main differences is the > representation of nesting structure. > > PARQUET uses Dremel's repetition and definition levels, which is an extremely > efficient representation of nested structure that has the > > added benefit of being easy to embed into the column data itself; > > Julien wrote an excellent blog post that explains the details: > https://blog.twitter.com/2013/dremel-made-simple-with-parquet > > ORCFILE on the other hand uses separate "counter" columns, which means that > for nested structures you need to read those counter columns in > > addition to the data columns you care about in order to recreate the nesting > structure; this increases the required amount of random I/O. > > Also, Parquet is natively supported in a number of popular Hadoop frameworks: > Pig, Impala, Hive, MR, Cascading. > > Source : https://groups.google.com/forum/#!topic/parquet-dev/0IdtSLdIINQ [1] > > THANKS & REGARDS, > ABHISHEK DUBEY > > FROM: Ravi Prasad [mailto:raviprasa...@gmail.com] > SENT: Thursday, February 18, 2016 9:06 AM > TO: user@hive.apache.org > SUBJECT: Difference between RC file format & Parquet file format > > Hi all, > > Can you please let me know, > > How the RC file format is different from the Parquet file format. > > Both are column oriented file format, then what are the difference. > > -- > > ---------------------------------------------- > Regards, > RAVI PRASAD. T -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility. Links: ------ [1] https://groups.google.com/forum/#!topic/parquet-dev/0IdtSLdIINQ