Hi Stephen, Thank you for your reply.
But, its the silliest error from my side. Its a typo! The codec is : org.apache.hadoop.io.compress.*GzipCodec* and not org.apache.hadoop.io.compress.*GZipCodec.* * * I regret making that mistake. Thank you, Sachin On Thu, Jun 6, 2013 at 10:07 PM, Stephen Sprague <sprag...@gmail.com> wrote: > Hi Sachin, > LIke you say looks like something to do with the GZipCodec all right. And > that would make sense given your original problem. > > Yeah, one would think it'd be in there by default but for whatever reason > its not finding it but at least the problem is now identified. > > Now _my guess_ is that maybe your hadoop core-site.xml file might need to > list the codecs available under the property name: > "io.compression.codecs". Can you chase that up as a possibility and let us > know what you find out? > > > > > On Thu, Jun 6, 2013 at 4:02 AM, Sachin Sudarshana <sachin.had...@gmail.com > > wrote: > >> Hi Stephen, >> >> *hive> show create table facts520_normal_text;* >> *OK* >> *CREATE TABLE facts520_normal_text(* >> * fact_key bigint,* >> * products_key int,* >> * retailers_key int,* >> * suppliers_key int,* >> * time_key int,* >> * units int)* >> *ROW FORMAT DELIMITED* >> * FIELDS TERMINATED BY ','* >> * LINES TERMINATED BY '\n'* >> *STORED AS INPUTFORMAT* >> * 'org.apache.hadoop.mapred.TextInputFormat'* >> *OUTPUTFORMAT* >> * 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'* >> *LOCATION* >> * 'hdfs:// >> aana1.ird.com/user/hive/warehouse/facts_520.db/facts520_normal_text'* >> *TBLPROPERTIES (* >> * 'numPartitions'='0',* >> * 'numFiles'='1',* >> * 'transient_lastDdlTime'='1369395430',* >> * 'numRows'='0',* >> * 'totalSize'='545216508',* >> * 'rawDataSize'='0')* >> *Time taken: 0.353 seconds* >> >> >> The syserror log shows this: >> >> *java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >> * >> * at >> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:543) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >> * at >> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >> * at >> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)* >> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)* >> * at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)* >> * at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)* >> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)* >> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at javax.security.auth.Subject.doAs(Subject.java:415)* >> * at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> * >> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >> *Caused by: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.io.compress.GZipCodec not found* >> * at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> * >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >> * >> * ... 21 more* >> *java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >> * >> * at >> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) >> * >> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at javax.security.auth.Subject.doAs(Subject.java:415)* >> * at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> * >> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >> *Caused by: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.io.compress.GZipCodec not found* >> * at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> * >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >> * >> * ... 16 more* >> *org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) >> * >> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at javax.security.auth.Subject.doAs(Subject.java:415)* >> * at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> * >> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >> *Caused by: java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >> * >> * at >> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >> * >> * ... 14 more* >> *Caused by: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.io.compress.GZipCodec not found* >> * at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> * >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >> * >> * ... 16 more* >> *org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) >> * >> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at javax.security.auth.Subject.doAs(Subject.java:415)* >> * at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> * >> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >> *Caused by: java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >> * >> * at >> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >> * >> * ... 14 more* >> *Caused by: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.io.compress.GZipCodec not found* >> * at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> * >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >> * >> * ... 16 more* >> *org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:479) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:739) >> * >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)* >> * at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) >> * >> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)* >> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at javax.security.auth.Subject.doAs(Subject.java:415)* >> * at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> * >> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >> *Caused by: java.lang.IllegalArgumentException: Compression codec >> org.apache.hadoop.io.compress.GZipCodec was not found.* >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85) >> * >> * at >> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934) >> * >> * at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469) >> * >> * ... 14 more* >> *Caused by: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.io.compress.GZipCodec not found* >> * at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> * >> * at >> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82) >> * >> * ... 16 more* >> >> It says that GZipCodec is not found. >> Isn't Snappy,GZip and BZip codecs available on Hadoop by default? >> >> Thank you, >> Sachin >> >> >> >> >> >> On Wed, Jun 5, 2013 at 11:58 PM, Stephen Sprague <sprag...@gmail.com>wrote: >> >>> well... the hiveException has the word "metadata" in it. maybe that's >>> a hint or a red-herrring. :) Let's try the following: >>> >>> 1. show create table * facts520_normal_text; >>> >>> * >>> *2. anything useful at this URL? ** >>> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002or >>> is it just the same stack dump? >>> >>> >>> * >>> >>> >>> On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana < >>> sachin.had...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster. >>>> >>>> I have a table facts520_normal_text stored as a textfile. I'm trying to >>>> create a compressed table from this table using GZip codec. >>>> >>>> *hive> SET hive.exec.compress.output=true;* >>>> *hive> SET >>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec; >>>> * >>>> *hive> SET mapred.output.compression.type=BLOCK;* >>>> * >>>> * >>>> *hive>* >>>> * > Create table facts520_gzip_text* >>>> * > (fact_key BIGINT,* >>>> * > products_key INT,* >>>> * > retailers_key INT,* >>>> * > suppliers_key INT,* >>>> * > time_key INT,* >>>> * > units INT)* >>>> * > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','* >>>> * > LINES TERMINATED BY '\n'* >>>> * > STORED AS TEXTFILE;* >>>> * >>>> * >>>> *hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from >>>> facts520_normal_text;* >>>> >>>> >>>> When I run the above queries, the MR job fails. >>>> >>>> The error that the Hive CLI itself shows is the following: >>>> >>>> *Total MapReduce jobs = 3* >>>> *Launching Job 1 out of 3* >>>> *Number of reduce tasks is set to 0 since there's no reduce operator* >>>> *Starting Job = job_201306051948_0010, Tracking URL = >>>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* >>>> *Kill Command = /usr/lib/hadoop/bin/hadoop job -kill >>>> job_201306051948_0010* >>>> *Hadoop job information for Stage-1: number of mappers: 3; number of >>>> reducers: 0* >>>> *2013-06-05 21:09:42,281 Stage-1 map = 0%, reduce = 0%* >>>> *2013-06-05 21:10:11,446 Stage-1 map = 100%, reduce = 100%* >>>> *Ended Job = job_201306051948_0010 with errors* >>>> *Error during job, obtaining debugging information...* >>>> *Job Tracking URL: >>>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* >>>> *Examining task ID: task_201306051948_0010_m_000004 (and more) from >>>> job job_201306051948_0010* >>>> *Examining task ID: task_201306051948_0010_m_000001 (and more) from >>>> job job_201306051948_0010* >>>> * >>>> * >>>> *Task with the most failures(4):* >>>> *-----* >>>> *Task ID:* >>>> * task_201306051948_0010_m_000002* >>>> * >>>> * >>>> *URL:* >>>> * >>>> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002 >>>> * >>>> *-----* >>>> *Diagnostic Messages for this Task:* >>>> *java.lang.RuntimeException: >>>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >>>> processing row >>>> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} >>>> * >>>> * at >>>> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)* >>>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)* >>>> * at >>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* >>>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* >>>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* >>>> * at java.security.AccessController.doPrivileged(Native Method)* >>>> * at javax.security.auth.Subject.doAs(Subject.java:415)* >>>> * at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>> * >>>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)* >>>> *Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive >>>> Runtime Error while processing row >>>> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} >>>> * >>>> * at org.apach* >>>> * >>>> * >>>> *FAILED: Execution Error, return code 2 from >>>> org.apache.hadoop.hive.ql.exec.MapRedTask* >>>> *MapReduce Jobs Launched:* >>>> *Job 0: Map: 3 HDFS Read: 0 HDFS Write: 0 FAIL* >>>> *Total MapReduce CPU Time Spent: 0 msec* >>>> >>>> >>>> I'm unable to figure out why this is happening. It looks like the data >>>> is not being able to be copied properly. >>>> Or is it that GZip codec is not supported on textfiles? >>>> >>>> Any help in this issue is greatly appreciated! >>>> >>>> Thank you, >>>> Sachin >>>> >>>> >>>> >>> >> >