Hi, I have a table stored as SEQUENCEFILE in hive-0.10,* facts520_normal_seq*
Now, i wish to create another table stored as a SEQUENCEFILE itself, but compressed using the Gzip codec. So, i set the compression codec and type as BLOCK and then executed the following query: *SET hive.exec.compress.output=true;* *SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;* *SET mapred.output.compression.type=BLOCK;* *create table test1facts520_gzip_seq as select * from facts520_normal_seq;* * * The table got created and was compressed as well. *[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq* *Found 5 items* *-rw-r--r-- 3 admin supergroup 38099145 2013-06-10 17:56 /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000000_0.gz* *-rw-r--r-- 3 admin supergroup 31450189 2013-06-10 17:56 /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000001_0.gz* *-rw-r--r-- 3 admin supergroup 20764259 2013-06-10 17:56 /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000002_0.gz* *-rw-r--r-- 3 admin supergroup 21107597 2013-06-10 17:56 /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000003_0.gz* *-rw-r--r-- 3 admin supergroup 12202692 2013-06-10 17:56 /user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000004_0.gz* * * However, when i checked the table properties, it was surprising to see that the table has been stored as a textfile! *hive> show create table test1facts520_gzip_seq;* *OK* *CREATE TABLE test1facts520_gzip_seq(* * fact_key bigint,* * products_key int,* * retailers_key int,* * suppliers_key int,* * time_key int,* * units int)* *ROW FORMAT SERDE* * 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'* *STORED AS INPUTFORMAT* * 'org.apache.hadoop.mapred.TextInputFormat'* *OUTPUTFORMAT* * 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'* *LOCATION* * 'hdfs:// aana1.ird.com/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq'* *TBLPROPERTIES (* * 'numPartitions'='0',* * 'numFiles'='5',* * 'transient_lastDdlTime'='1370867198',* * 'numRows'='0',* * 'totalSize'='123623882',* * 'rawDataSize'='0')* *Time taken: 0.15 seconds* * * * * So, i tried adding the STORED AS clause to my earlier create table statement and created a new table: *create table test3facts520_gzip_seq STORED AS SEQUENCEFILE as select * from facts520_normal_seq;* * * This time, the output table got stored as a SEQUENCEFILE, *hive> show create table test3facts520_gzip_seq;* *OK* *CREATE TABLE test3facts520_gzip_seq(* * fact_key bigint,* * products_key int,* * retailers_key int,* * suppliers_key int,* * time_key int,* * units int)* *ROW FORMAT SERDE* * 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'* *STORED AS INPUTFORMAT* * 'org.apache.hadoop.mapred.SequenceFileInputFormat'* *OUTPUTFORMAT* * 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'* *LOCATION* * 'hdfs:// aana1.ird.com/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq'* *TBLPROPERTIES (* * 'numPartitions'='0',* * 'numFiles'='5',* * 'transient_lastDdlTime'='1370867777',* * 'numRows'='0',* * 'totalSize'='129811519',* * 'rawDataSize'='0')* *Time taken: 0.135 seconds* But, the compression itself did not happen! *[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq* *Found 5 items* *-rw-r--r-- 3 admin supergroup 40006368 2013-06-10 18:06 /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000000_0* *-rw-r--r-- 3 admin supergroup 33026961 2013-06-10 18:06 /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000001_0* *-rw-r--r-- 3 admin supergroup 21797242 2013-06-10 18:05 /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000002_0* *-rw-r--r-- 3 admin supergroup 22171637 2013-06-10 18:05 /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000003_0* *-rw-r--r-- 3 admin supergroup 12809311 2013-06-10 18:05 /user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000004_0* Is there anything that I have done wrong, or I have missed something ? Any help would be greatly appreciated! Thank you, Sachin