bzip2 or snappy-codec will be very usefull for that.
- Alex
On Wed, Nov 2, 2011 at 11:00 AM, Martin Kuhn wrote:
> You could try to use splittable LZO compression instead:
> https://github.com/kevinweil/hadoop-lzo (a gz file can't be split)
>
>
> > We have multiple terabytes of data (currently in
You could try to use splittable LZO compression instead:
https://github.com/kevinweil/hadoop-lzo (a gz file can't be split)
> We have multiple terabytes of data (currently in gz format approx size 2GB
> per file). What is best way to load that data into Hadoop?
> We have seen that (especially
Run multiple concurrent LOAD DATAs, one per file.
Alternatively, if your TT nodes have access to the source file system, use a
map-only Hadoop job, such as distcp.
From: Shantian Purkad [mailto:shantian_pur...@yahoo.com]
Sent: Monday, October 31, 2011 4:34 PM
To: common-u...@hadoop.apache.org;