Re: What is best way to load data into hive tables/hadoop file system

2011-11-02 Thread Alexander C.H. Lorenz
bzip2 or snappy-codec will be very usefull for that. - Alex On Wed, Nov 2, 2011 at 11:00 AM, Martin Kuhn wrote: > You could try to use splittable LZO compression instead: > https://github.com/kevinweil/hadoop-lzo (a gz file can't be split) > > > > We have multiple terabytes of data (currently in

Re: What is best way to load data into hive tables/hadoop file system

2011-11-02 Thread Martin Kuhn
You could try to use splittable LZO compression instead: https://github.com/kevinweil/hadoop-lzo (a gz file can't be split) > We have multiple terabytes of data (currently in gz format approx size 2GB > per file). What is best way to load that data into Hadoop? > We have seen that (especially

RE: What is best way to load data into hive tables/hadoop file system

2011-11-01 Thread Steven Wong
Run multiple concurrent LOAD DATAs, one per file. Alternatively, if your TT nodes have access to the source file system, use a map-only Hadoop job, such as distcp. From: Shantian Purkad [mailto:shantian_pur...@yahoo.com] Sent: Monday, October 31, 2011 4:34 PM To: common-u...@hadoop.apache.org;