Hi,
We have a requirement to store a large data set (more than 5TB) mapped to a
Hive table. This Hive table would be populated (and appended periodically)
using a Hive query from another Hive table. In addition to the Hive
queries, we need to be able to run Java MapReduce and preferably Pig jobs
as well on top of this data.

I'm wondering what would be the best storage format for this Hive table.
How easy it is to use JavaMapReduce on Hive generated sequence files (eg:
stored as SequenceFile). How easy it is to use JavaMapReduce on RC files.
Any pointers to examples of these would be really great. Does using
compressed Text Files (deflate) sound like the best option for this usecase.

BTW we are stuck with Hive 0.9 for the foreseeable future and ORC is out of
the options.

thanks,
Thilina

-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

Reply via email to