Venkata Puneet Ravuri created HIVE-7886: -------------------------------------------
Summary: Aggregation queries fail with RCFile based Hive tables with S3 storage Key: HIVE-7886 URL: https://issues.apache.org/jira/browse/HIVE-7886 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Venkata Puneet Ravuri Aggregation queries on Hive tables which use RCFile format and S3 storage are failing. My setup is Hadoop 2.5.0 and Hive 0.13.1. I create a table with following schema:- CREATE EXTERNAL TABLE `testtable`( `col1` string, `col2` tinyint, `col3` int, `col4` float, `col5` boolean, `col6` smallint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' WITH SERDEPROPERTIES ( 'serialization.format'='\t', 'line.delim'='\n', 'field.delim'='\t' ) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' LOCATION 's3n://<testbucket>/testtable'; When I run 'select count(*) from testtable', it gives the following exception stack:- Error: java.io.IOException: java.io.IOException: java.io.EOFException: Attempted to seek or read past the end of the file at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: java.io.EOFException: Attempted to seek or read past the end of the file at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) ... 11 more Caused by: java.io.EOFException: Attempted to seek or read past the end of the file at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205) at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96) at org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67) at java.io.DataInputStream.skipBytes(DataInputStream.java:220) at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739) at org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720) at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898) at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149) at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) ... 15 more -- This message was sent by Atlassian JIRA (v6.2#6252)