I am having trouble reading gzip compressed input. Is this a known
problem? Any workarounds?
(I am using gzip 1.3.3 )
Thanks,
Delip
$ hadoop dfs -ls input
Found 1 items
-rw-r--r-- 3 huser supergroup 17532230 2008-12-11 23:52
/user/huser/input/words.gz
$ hadoop jar hadoop-0.19.0-examples.jar wordcount input output
08/12/12 00:23:10 INFO mapred.FileInputFormat: Total input paths to process : 1
08/12/12 00:23:10 INFO mapred.JobClient: Running job: job_200812100142_0072
08/12/12 00:23:11 INFO mapred.JobClient: map 0% reduce 0%
08/12/12 00:23:32 INFO mapred.JobClient: Task Id :
attempt_200812100142_0072_m_000000_0, Status : FAILED
java.lang.InternalError
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(Native
Method)
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:114)
at
org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188)
at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170)
at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:82)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:321)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
08/12/12 00:23:44 INFO mapred.JobClient: Task Id :
attempt_200812100142_0072_m_000000_1, Status : FAILED
...