There is something you gain and something you loose. Compression would reduce IO through increased cpu work . Also you would receive different experience for different tasks ie HDFS read , HDFS write , shuffle and sort . So to go for compression or not depends on your usages . Sent from my N8
-----Original Message----- From: Sreenath Menon Sent: 6/6/2012 8:50:23 AM To: user@hive.apache.org Subject: Compressed data storage in HDFS - Error I would like to compress my data in the HDFS using some Hive commands. Step followed: (data already residing in table sample) create table rc_lzo like sample; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec; insert overwrite table rc_lzo select * from sample; Error: Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found 1)What do I need to do to use Lzo as well as other compression methods? 2)Heard somewhere that :Using compressed data will produce better results than uncompressed data in some cases. How can this be, as there is always a compression and decompression time allotted with compression methods. Any truth in this, if so how ? Can understand how there are better results when using compression between mappers-to-reducers and in between map-reduce jobs. Thanks and Regards Sreenath Mullassery