David Phillips created HIVE-3569: ------------------------------------ Summary: RCFile requires native Hadoop library Key: HIVE-3569 URL: https://issues.apache.org/jira/browse/HIVE-3569 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: David Phillips
RCFile requires the native Hadoop library. It does not work when using the Java {{GzipCodec}}. The root cause is that the two versions of {{GzipCodec.createInputStream()}} work differently. The native version simply saves a reference to the supplied input stream. The Java version wraps the stream in a Java {{GZIPInputStream}}, which immediately tries to read the header. The problem occurs because the stream passed by the {{RCFile.ValueBuffer}} constructor is empty (the buffer backing the stream is still empty at that point). {noformat} 12/10/11 10:37:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/10/11 10:37:25 INFO io.CodecPool: Got brand-new decompressor 12/10/11 10:37:25 INFO io.CodecPool: Got brand-new decompressor Exception in thread "main" java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:264) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:254) at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:163) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:78) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:90) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.<init>(RCFile.java:451) at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1205) at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1111) at org.apache.hadoop.hive.ql.io.RCFileRecordReader.<init>(RCFileRecordReader.java:52) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira