What is best way to load data into hive tables/hadoop file system

Shantian Purkad Mon, 31 Oct 2011 16:34:14 -0700

Hello,

We have multiple terabytes of data (currently in gz format approx size 2GB per 
file). What is best way to load that data into Hadoop?


We have seen that (especially when loaded using hive's load data local inpath 
....) to load a gz file it takes around 12 seconds and when we decompress it 
(around 4~5GB) it takes 8 minutes to load the file.

We want these files to be processed using multiple mappers on the Hadoop and 
not with singles.

What would be best way to load these files in Hive/hdfs so that it takes less 
time to load as well as use multiple mappers to process the files.


Thanks and Regards,
Shantian

What is best way to load data into hive tables/hadoop file system

Reply via email to