Options 1. create table and put files under the table dir 2. create external table and point it to files dir
3. if files are small then I recomend to create new set of files using simple MR program and specifying number of reduce tasks. Goal is to make files size > hdfs block size (it safes NN memory and read will be faster) On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang <zuo...@gmail.com> wrote: > I have millions of gzip files in hdfs (with the same fields), would like > to load them into one table in hive with a specified schema. > What is the most efficient ways to do that? > Given that my data is only in hdfs, and also gzipped, does that mean I > could just simply set up the table somehow bypassing some unnecessary > overhead of the typical approach? > > Thanks! >