hi all: I test the above example with hive trunk and it still fail. After some debugging, finally I find the cause of the problem:
Hive use CombineFileRecordReader and in one CombineFileSplit there are often more than one path. In this case, the schema for these two paths (dt='20140718' vs dt='20140719') are different and these two paths are in the same split. Method OrcRecordReader#next(NullWritable key, OrcStruct value) would be called and the value object is reused each time we deserialize one row. At first all fields of the value are null. After deserializing one row, the value is set to the current row. In this case, when we switch reading from one path to the other, the schema changes from {IntWritable} to {LongWritable} . In LongTreeReader#next, if the value is not null, the value would be cast as LongWritable even though it's IntWritable instead. > Object next(Object previous) throws IOException { > super.next(previous); > LongWritable result = null; > if (valuePresent) { > if (previous == null) { > result = new LongWritable(); > } else { > result = (LongWritable) previous; > } > result.set(reader.next()); > } > return result; > } from which would cause the above exception. Here I think we may reset value each time we finish reading one path, just one line code and the problem is easy solved: > diff --git a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > index 7edb3c2..696b1bc 100644 > --- a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > +++ b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > @@ -154,6 +154,7 @@ public boolean next(NullWritable key, OrcStruct value) throws IOException { > progress = reader.getProgress(); > return true; > } else { > + value.linkFields(createValue()); > return false; > } > } If the fix is desirable, I may create a tck in hive jira and upload a patch for it. Please correct me if I'm wrong. Thanks. 2014-07-31 4:56 GMT+08:00 wzc <wzc1...@gmail.com>: > > hi, > Currently, if we change orc format hive table using "alter table orc_table change c1 c1 bigint ", it will throw exception from SerDe ("org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable" ) in query time, this is different behavior from hive (using other file format), where it will try to perform cast (null value in case of incompatible type). > I find HIVE-6784 happen to be the same issue with parquet while it says that currently it works with partitioned table: >>> >>> The exception raised from changing type actually only happens to non-partitioned tables. For partitioned tables, if there is type change in table level, there will be an ObjectInspectorConverter (in parquet's case — StructConverter) to convert type between partition and table. For non-partitioned tables, the ObjectInspectorConverter is always IdentityConverter, which passes the deserialized object as it is, causing type mismatch between object and ObjectInspector. >> >> > > According to my test with hive branch-0.13, it still fail with orc partitioned table.I think this behavior is unexpected and I'm digging into the code to find a way to fix it now. Any help is appreciated. > > > > > > I use the following script to test it with partitioned table on branch-0.13: > >> use test; >> DROP TABLE if exists orc_change_type_staging; >> DROP TABLE if exists orc_change_type; >> CREATE TABLE orc_change_type_staging ( >> id int >> ); >> CREATE TABLE orc_change_type ( >> id int >> ) PARTITIONED BY (`dt` string) >> stored as orc; >> --- load staging table >> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE orc_change_type_staging; >> --- populate orc hive table >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM orc_change_type_staging; >> --- change column id from int to bigint >> ALTER TABLE orc_change_type CHANGE id id bigint; >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM orc_change_type_staging; >> SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; > > > and it throw exception with branch-0.13: >> >> Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) >> Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) >> at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) >> at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) >> ... 11 more >> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) >> ... 15 more > > > > Thanks. > > >