Quick insights:

http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/




On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) <
eric.n.han...@microsoft.com> wrote:

>  It sounds like ORC would be best.
>
>
>
>                 -Eric
>
>
>
> *From:* Thilina Gunarathne [mailto:cset...@gmail.com]
> *Sent:* Monday, January 27, 2014 11:05 AM
> *To:* user@hive.apache.org
> *Subject:* RCFile vs SequenceFile vs text files
>
>
>
> Dear all,
>
> We are trying to pick the right data storage format for the Hive table
> with the following requirement and would really appreciate any insights you
> can provide to help our decision.
>
> 1. ~50Billion records per month. ~14 columns per record and each record is
> ~100 bytes.  Table is partitioned by the date. Table gets populated
> periodically from another Hive query.
>
> 2. The columns are dense, so I'm not sure whether we'll get any space
> savings by using RCFiles.
>
> 3. Data needs to be compressed.
>
> 4. We will be doing lot of aggregation queries for selected columns. There
> will be ad-hoc queries for whole records as well.
>
> 5. We need the ability to run Java MapReduce programs on the underlying
> data. We have existing programs which use custom inputformats with
> compressed textfiles as input and we are willing to port them to use other
> formats. (how easy to use Java MapReduce with RCFiles vs SequenceFiles?)
>
> 6. Ability to use hive indexing.
>
> thanks a ton in advance,
>
> Thilina
>
>
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
>
> http://thilina.gunarathne.org
>



-- 
Thank you

Sharath Punreddy
1201 Golden gate Dr,
Southlake TX 76092.
Phone:626-470-7867

Reply via email to