hi all: I'veI created a jira for this problem: https://issues.apache.org/jira/browse/HIVE-7847 .
Thanks. 2014-08-22 1:59 GMT+08:00 wzc <wzc1...@gmail.com>: > hi all: > > I test the above example with hive trunk and it still fail. After some > debugging, finally I find the cause of the problem: > > Hive use CombineFileRecordReader and in one CombineFileSplit there are > often more than one path. In this case, the schema for these two paths > (dt='20140718' vs dt='20140719') are different and these two paths are in > the same split. Method OrcRecordReader#next(NullWritable key, OrcStruct > value) would be called and the value object is reused each time we > deserialize one row. At first all fields of the value are null. After > deserializing one row, the value is set to the current row. In this case, > when we switch reading from one path to the other, the schema changes from > {IntWritable} to {LongWritable} . In LongTreeReader#next, if the value is > not null, the value would be cast as LongWritable even though it's > IntWritable instead. > > > Object next(Object previous) throws IOException { > > super.next(previous); > > LongWritable result = null; > > if (valuePresent) { > > if (previous == null) { > > result = new LongWritable(); > > } else { > > result = (LongWritable) previous; > > } > > result.set(reader.next()); > > } > > return result; > > } > > from which would cause the above exception. > > Here I think we may reset value each time we finish reading one path, > just one line code and the problem is easy solved: > > > diff --git > a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > > index 7edb3c2..696b1bc 100644 > > --- a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > > +++ b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java > > @@ -154,6 +154,7 @@ public boolean next(NullWritable key, OrcStruct > value) throws IOException { > > progress = reader.getProgress(); > > return true; > > } else { > > + value.linkFields(createValue()); > > return false; > > } > > } > > If the fix is desirable, I may create a tck in hive jira and upload a > patch for it. Please correct me if I'm wrong. > > > Thanks. > > > 2014-07-31 4:56 GMT+08:00 wzc <wzc1...@gmail.com>: > > > > > hi, > > Currently, if we change orc format hive table using "alter table > orc_table change c1 c1 bigint ", it will throw exception from SerDe > ("org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable" ) in query time, this is different > behavior from hive (using other file format), where it will try to perform > cast (null value in case of incompatible type). > > I find HIVE-6784 happen to be the same issue with parquet while it > says that currently it works with partitioned table: > >>> > >>> The exception raised from changing type actually only happens to > non-partitioned tables. For partitioned tables, if there is type change in > table level, there will be an ObjectInspectorConverter (in parquet's case — > StructConverter) to convert type between partition and table. For > non-partitioned tables, the ObjectInspectorConverter is always > IdentityConverter, which passes the deserialized object as it is, causing > type mismatch between object and ObjectInspector. > >> > >> > > > > According to my test with hive branch-0.13, it still fail with orc > partitioned table.I think this behavior is unexpected and I'm digging into > the code to find a way to fix it now. Any help is appreciated. > > > > > > > > > > > > I use the following script to test it with partitioned table on > branch-0.13: > > > >> use test; > >> DROP TABLE if exists orc_change_type_staging; > >> DROP TABLE if exists orc_change_type; > >> CREATE TABLE orc_change_type_staging ( > >> id int > >> ); > >> CREATE TABLE orc_change_type ( > >> id int > >> ) PARTITIONED BY (`dt` string) > >> stored as orc; > >> --- load staging table > >> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO > TABLE orc_change_type_staging; > >> --- populate orc hive table > >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select > * FROM orc_change_type_staging; > >> --- change column id from int to bigint > >> ALTER TABLE orc_change_type CHANGE id id bigint; > >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select > * FROM orc_change_type_staging; > >> SELECT id FROM orc_change_type where dt between '20140718' and > '20140719'; > > > > > > and it throw exception with branch-0.13: > >> > >> Error: java.io.IOException: java.io.IOException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be > cast to org.apache.hadoop.io.LongWritable > >> at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > >> at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > >> at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) > >> at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) > >> at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) > >> at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) > >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > >> at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > >> Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable > >> at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > >> at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > >> at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) > >> at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > >> at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > >> at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) > >> at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) > >> ... 11 more > >> Caused by: java.lang.ClassCastException: > org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable > >> at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) > >> at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) > >> at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) > >> at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) > >> at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) > >> at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) > >> ... 15 more > > > > > > > > Thanks. > > > > > > >