Re: What is best way to load data into hive tables/hadoop file system

2011-11-02 Thread Alexander C.H. Lorenz
bzip2 or snappy-codec will be very usefull for that. - Alex On Wed, Nov 2, 2011 at 11:00 AM, Martin Kuhn wrote: > You could try to use splittable LZO compression instead: > https://github.com/kevinweil/hadoop-lzo (a gz file can't be split) > > > > We have multiple terabytes of data (currently in

Re: What is best way to load data into hive tables/hadoop file system

2011-11-02 Thread Martin Kuhn
You could try to use splittable LZO compression instead: https://github.com/kevinweil/hadoop-lzo (a gz file can't be split) > We have multiple terabytes of data (currently in gz format approx size 2GB > per file). What is best way to load that data into Hadoop? > We have seen that (especially

RE: What is best way to load data into hive tables/hadoop file system

2011-11-01 Thread Steven Wong
; user@hive.apache.org Subject: What is best way to load data into hive tables/hadoop file system Hello, We have multiple terabytes of data (currently in gz format approx size 2GB per file). What is best way to load that data into Hadoop? We have seen that (especially when loaded using hive&#

What is best way to load data into hive tables/hadoop file system

2011-10-31 Thread Shantian Purkad
Hello, We have multiple terabytes of data (currently in gz format approx size 2GB per file). What is best way to load that data into Hadoop? We have seen that (especially when loaded using hive's load data local inpath ) to load a gz file it takes around 12 seconds and when we decompress it