bzip2 or snappy-codec will be very usefull for that.
- Alex
On Wed, Nov 2, 2011 at 11:00 AM, Martin Kuhn wrote:
> You could try to use splittable LZO compression instead:
> https://github.com/kevinweil/hadoop-lzo (a gz file can't be split)
>
>
> > We have multiple terabytes of data (currently in
You could try to use splittable LZO compression instead:
https://github.com/kevinweil/hadoop-lzo (a gz file can't be split)
> We have multiple terabytes of data (currently in gz format approx size 2GB
> per file). What is best way to load that data into Hadoop?
> We have seen that (especially
; user@hive.apache.org
Subject: What is best way to load data into hive tables/hadoop file system
Hello,
We have multiple terabytes of data (currently in gz format approx size 2GB per
file). What is best way to load that data into Hadoop?
We have seen that (especially when loaded using hive
Hello,
We have multiple terabytes of data (currently in gz format approx size 2GB per
file). What is best way to load that data into Hadoop?
We have seen that (especially when loaded using hive's load data local inpath
) to load a gz file it takes around 12 seconds and when we decompress it