Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would not be an option for us as our cluster still runs Hive 0.9 and we won't be migrating any time soon.
thanks, Thilina On Mon, Jan 27, 2014 at 2:35 PM, Sharath Punreddy <srpunre...@gmail.com>wrote: > Quick insights: > > > http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/ > > > > > On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) < > eric.n.han...@microsoft.com> wrote: > >> It sounds like ORC would be best. >> >> >> >> -Eric >> >> >> >> *From:* Thilina Gunarathne [mailto:cset...@gmail.com] >> *Sent:* Monday, January 27, 2014 11:05 AM >> *To:* user@hive.apache.org >> *Subject:* RCFile vs SequenceFile vs text files >> >> >> >> Dear all, >> >> We are trying to pick the right data storage format for the Hive table >> with the following requirement and would really appreciate any insights you >> can provide to help our decision. >> >> 1. ~50Billion records per month. ~14 columns per record and each record >> is ~100 bytes. Table is partitioned by the date. Table gets populated >> periodically from another Hive query. >> >> 2. The columns are dense, so I'm not sure whether we'll get any space >> savings by using RCFiles. >> >> 3. Data needs to be compressed. >> >> 4. We will be doing lot of aggregation queries for selected columns. >> There will be ad-hoc queries for whole records as well. >> >> 5. We need the ability to run Java MapReduce programs on the underlying >> data. We have existing programs which use custom inputformats with >> compressed textfiles as input and we are willing to port them to use other >> formats. (how easy to use Java MapReduce with RCFiles vs SequenceFiles?) >> >> 6. Ability to use hive indexing. >> >> thanks a ton in advance, >> >> Thilina >> >> >> >> -- >> https://www.cs.indiana.edu/~tgunarat/ >> http://www.linkedin.com/in/thilina >> >> http://thilina.gunarathne.org >> > > > > -- > Thank you > > Sharath Punreddy > 1201 Golden gate Dr, > Southlake TX 76092. > Phone:626-470-7867 > -- https://www.cs.indiana.edu/~tgunarat/ http://www.linkedin.com/in/thilina http://thilina.gunarathne.org