When you use the default Serde (lazySerde) and sequence files hive writes a SequenceFile(create table x .... stored as sequence file) , the key is null and hive serializes all the columns into a Text Writable that is easy for other tools to read. Hive does not dictate the input format or the output format, usually you can get hive to produce exactly what you want by mixing and matching serde and output format options.
On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne <cset...@gmail.com>wrote: > Hi, > We have a requirement to store a large data set (more than 5TB) mapped to > a Hive table. This Hive table would be populated (and appended > periodically) using a Hive query from another Hive table. In addition to > the Hive queries, we need to be able to run Java MapReduce and preferably > Pig jobs as well on top of this data. > > I'm wondering what would be the best storage format for this Hive table. > How easy it is to use JavaMapReduce on Hive generated sequence files (eg: > stored as SequenceFile). How easy it is to use JavaMapReduce on RC files. > Any pointers to examples of these would be really great. Does using > compressed Text Files (deflate) sound like the best option for this usecase. > > BTW we are stuck with Hive 0.9 for the foreseeable future and ORC is out > of the options. > > thanks, > Thilina > > -- > https://www.cs.indiana.edu/~tgunarat/ > http://www.linkedin.com/in/thilina > http://thilina.gunarathne.org >