[ https://issues.apache.org/jira/browse/HADOOP-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved HADOOP-6817. ----------------------------- Resolution: Duplicate This is being addressed via HADOOP-8582. > SequenceFile.Reader can't read gzip format compressed sequence file which > produce by a mapreduce job without native compression library > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6817 > URL: https://issues.apache.org/jira/browse/HADOOP-6817 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 0.20.2 > Environment: Cluster:CentOS 5,jdk1.6.0_20 > Client:Mac SnowLeopard,jdk1.6.0_20 > Reporter: Wenjun Huang > > An hadoop job output a gzip compressed sequence file(whether record > compressed or block compressed).The client program use SequenceFile.Reader to > read this sequence file,when reading the client program shows the following > exceptions: > 2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new > decompressor > Exception in thread "main" java.io.EOFException > at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) > at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) > at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > com.shiningware.intelligenceonline.taobao.mapreduce.HtmlContentSeqOutputView.main(HtmlContentSeqOutputView.java:28) > I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method > and read: > // Initialize... *not* if this we are constructing a temporary Reader > if (!tempReader) { > valBuffer = new DataInputBuffer(); > if (decompress) { > valDecompressor = CodecPool.getDecompressor(codec); > valInFilter = codec.createInputStream(valBuffer, valDecompressor); > valIn = new DataInputStream(valInFilter); > } else { > valIn = valBuffer; > } > the problem seems to be caused by "valBuffer = new DataInputBuffer();" > ,because GzipCodec.createInputStream creates an instance of GzipInputStream > whose constructor creates an instance of ResetableGZIPInputStream class.When > ResetableGZIPInputStream's constructor calls it base class > java.util.zip.GZIPInputStream's constructor ,it trys to read the empty > "valBuffer = new DataInputBuffer();" and get no content,so it throws an > EOFException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira