There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive 
different experience for different tasks ie HDFS read , HDFS write , shuffle 
and sort . So to go for compression or not depends on your usages .
Sent from my N8



-----Original Message-----
From: Sreenath Menon
Sent: 6/6/2012 8:50:23 AM
To: user@hive.apache.org
Subject: Compressed data storage in HDFS - Error
I would like to compress my data in the HDFS using some Hive commands.
Step followed: (data already residing in table sample)

create table rc_lzo like sample;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
insert overwrite table rc_lzo select * from sample;

Error:
Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found

1)What do I need to do to use Lzo as well as other compression methods?

2)Heard somewhere that :Using compressed data will produce better results than 
uncompressed data in some cases. How can this be, as there is always a 
compression and decompression time allotted with compression methods. Any truth 
in this, if so how ? Can understand how there are better results when using 
compression between mappers-to-reducers and in between map-reduce jobs.

Thanks and Regards
Sreenath Mullassery

Reply via email to