Re: best way to load millions of gzip files in hdfs to one table in hive?

Alexander Pivovarov Tue, 02 Oct 2012 13:16:52 -0700

Options
1. create table and put files under the table dir

2. create external table and point it to files dir

3. if files are small then I recomend to create new set of files using
simple MR program and specifying number of reduce tasks. Goal is to make
files size > hdfs block size (it safes NN memory and read will be faster)

On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang <zuo...@gmail.com> wrote:

> I have millions of gzip files in hdfs (with the same fields), would like
> to load them into one table in hive with a specified schema.
> What is the most efficient ways to do that?
> Given that my data is only in hdfs, and also gzipped, does that mean I
> could just simply set up the table somehow bypassing some unnecessary
> overhead of the typical approach?
>
> Thanks!
>

Re: best way to load millions of gzip files in hdfs to one table in hive?

Reply via email to