Re: change column type of orc table will throw exception in query time

wzc Thu, 21 Aug 2014 23:20:07 -0700

hi all:
 I'veI created a jira for this problem:
https://issues.apache.org/jira/browse/HIVE-7847 .


Thanks.


2014-08-22 1:59 GMT+08:00 wzc <wzc1...@gmail.com>:

> hi all:
>
> I test the above example with hive trunk and it still fail. After some
> debugging, finally I find the cause of the problem:
>
>   Hive use CombineFileRecordReader and in one CombineFileSplit there are
> often more than one path.  In this case, the schema for these two paths
> (dt='20140718' vs dt='20140719') are different and these two paths are in
> the same split.  Method OrcRecordReader#next(NullWritable key, OrcStruct
> value) would be called and the value object is reused each time we
> deserialize one row. At first all fields of the value are null. After
> deserializing one row, the value is set to the current row. In this case,
> when we switch reading from one path to the other, the schema changes from
> {IntWritable} to {LongWritable} . In LongTreeReader#next,  if the value is
> not null, the value would be cast as LongWritable even though it's
> IntWritable instead.
>
> >     Object next(Object previous) throws IOException {
> >       super.next(previous);
> >       LongWritable result = null;
> >       if (valuePresent) {
> >         if (previous == null) {
> >           result = new LongWritable();
> >         } else {
> >           result = (LongWritable) previous;
> >         }
> >         result.set(reader.next());
> >       }
> >       return result;
> >     }
>
> from which would cause the above exception.
>
> Here I think we may reset value each time we finish reading one path,
>  just one line code and the problem is easy solved:
>
> > diff --git
> a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> > index 7edb3c2..696b1bc 100644
> > --- a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> > +++ b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> > @@ -154,6 +154,7 @@ public boolean next(NullWritable key, OrcStruct
> value) throws IOException {
> >          progress = reader.getProgress();
> >          return true;
> >        } else {
> > +        value.linkFields(createValue());
> >          return false;
> >        }
> >      }
>
> If the fix is desirable, I may create a tck in hive jira and upload a
> patch for it. Please correct me if I'm wrong.
>
>
> Thanks.
>
>
> 2014-07-31 4:56 GMT+08:00 wzc <wzc1...@gmail.com>:
>
> >
> > hi,
> >  Currently, if we change orc format hive table using "alter table
> orc_table change c1 c1 bigint ", it will throw exception  from SerDe
> ("org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.hadoop.io.LongWritable" ) in query time, this is different
> behavior from hive (using other file format), where it will try to perform
> cast (null value in case of incompatible type).
> >   I find HIVE-6784  happen to be the same issue with parquet while it
> says that currently it works with partitioned table:
> >>>
> >>> The exception raised from changing type actually only happens to
> non-partitioned tables. For partitioned tables, if there is type change in
> table level, there will be an ObjectInspectorConverter (in parquet's case —
> StructConverter) to convert type between partition and table. For
> non-partitioned tables, the ObjectInspectorConverter is always
> IdentityConverter, which passes the deserialized object as it is, causing
> type mismatch between object and ObjectInspector.
> >>
> >>
> >
> >   According to my test with hive branch-0.13, it still fail with orc
> partitioned table.I think this behavior is unexpected and I'm digging into
> the code to find a way to fix it now. Any help is appreciated.
> >
> >
> >
> >
> >
> > I use the following script to test it with partitioned table on
> branch-0.13:
> >
> >> use test;
> >> DROP TABLE if exists orc_change_type_staging;
> >> DROP TABLE if exists orc_change_type;
> >> CREATE TABLE orc_change_type_staging (
> >>     id int
> >> );
> >> CREATE TABLE orc_change_type (
> >>     id int
> >> ) PARTITIONED BY (`dt` string)
> >> stored as orc;
> >> --- load staging table
> >> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO
> TABLE orc_change_type_staging;
> >> --- populate orc hive table
> >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select
> * FROM orc_change_type_staging;
> >> --- change column id from int to bigint
> >> ALTER TABLE orc_change_type CHANGE id id bigint;
> >> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select
> * FROM orc_change_type_staging;
> >> SELECT id FROM orc_change_type where dt between '20140718' and
> '20140719';
> >
> >
> > and it throw exception with branch-0.13:
> >>
> >> Error: java.io.IOException: java.io.IOException:
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
> cast to org.apache.hadoop.io.LongWritable
> >>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> >>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> >>         at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
> >>         at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
> >>         at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
> >>         at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
> >>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
> >>         at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> >>         at java.security.AccessController.doPrivileged(Native Method)
> >>         at javax.security.auth.Subject.doAs(Subject.java:415)
> >>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> >>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> >> Caused by: java.io.IOException: java.lang.ClassCastException:
> org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.hadoop.io.LongWritable
> >>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> >>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> >>         at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
> >>         at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> >>         at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> >>         at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
> >>         at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
> >>         ... 11 more
> >> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.hadoop.io.LongWritable
> >>         at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
> >>         at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
> >>         at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
> >>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
> >>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
> >>         at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
> >>         ... 15 more
> >
> >
> >
> > Thanks.
> >
> >
> >
>

Re: change column type of orc table will throw exception in query time

Reply via email to