Hello,

We have multiple terabytes of data (currently in gz format approx size 2GB per 
file). What is best way to load that data into Hadoop?

We have seen that (especially when loaded using hive's load data local inpath 
....) to load a gz file it takes around 12 seconds and when we decompress it 
(around 4~5GB) it takes 8 minutes to load the file.

We want these files to be processed using multiple mappers on the Hadoop and 
not with singles.

What would be best way to load these files in Hive/hdfs so that it takes less 
time to load as well as use multiple mappers to process the files.


Thanks and Regards,
Shantian

Reply via email to