Re: change column type of orc table will throw exception in query time

wzc Thu, 21 Aug 2014 11:00:09 -0700

hi all:

I test the above example with hive trunk and it still fail. After some
debugging, finally I find the cause of the problem:


  Hive use CombineFileRecordReader and in one CombineFileSplit there are
often more than one path.  In this case, the schema for these two paths
(dt='20140718' vs dt='20140719') are different and these two paths are in
the same split.  Method OrcRecordReader#next(NullWritable key, OrcStruct
value) would be called and the value object is reused each time we
deserialize one row. At first all fields of the value are null. After
deserializing one row, the value is set to the current row. In this case,
when we switch reading from one path to the other, the schema changes from
{IntWritable} to {LongWritable} . In LongTreeReader#next,  if the value is
not null, the value would be cast as LongWritable even though it's
IntWritable instead.

>     Object next(Object previous) throws IOException {
>       super.next(previous);
>       LongWritable result = null;
>       if (valuePresent) {
>         if (previous == null) {
>           result = new LongWritable();
>         } else {
>           result = (LongWritable) previous;
>         }
>         result.set(reader.next());
>       }
>       return result;
>     }

from which would cause the above exception.

Here I think we may reset value each time we finish reading one path,  just
one line code and the problem is easy solved:

> diff --git
a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> index 7edb3c2..696b1bc 100644
> --- a/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> +++ b/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
> @@ -154,6 +154,7 @@ public boolean next(NullWritable key, OrcStruct
value) throws IOException {
>          progress = reader.getProgress();
>          return true;
>        } else {
> +        value.linkFields(createValue());
>          return false;
>        }
>      }

If the fix is desirable, I may create a tck in hive jira and upload a patch
for it. Please correct me if I'm wrong.


Thanks.


2014-07-31 4:56 GMT+08:00 wzc <wzc1...@gmail.com>:
>
> hi,
>  Currently, if we change orc format hive table using "alter table
orc_table change c1 c1 bigint ", it will throw exception  from SerDe
("org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.hadoop.io.LongWritable" ) in query time, this is different
behavior from hive (using other file format), where it will try to perform
cast (null value in case of incompatible type).
>   I find HIVE-6784  happen to be the same issue with parquet while it
says that currently it works with partitioned table:
>>>
>>> The exception raised from changing type actually only happens to
non-partitioned tables. For partitioned tables, if there is type change in
table level, there will be an ObjectInspectorConverter (in parquet's case —
StructConverter) to convert type between partition and table. For
non-partitioned tables, the ObjectInspectorConverter is always
IdentityConverter, which passes the deserialized object as it is, causing
type mismatch between object and ObjectInspector.
>>
>>
>
>   According to my test with hive branch-0.13, it still fail with orc
partitioned table.I think this behavior is unexpected and I'm digging into
the code to find a way to fix it now. Any help is appreciated.
>
>
>
>
>
> I use the following script to test it with partitioned table on
branch-0.13:
>
>> use test;
>> DROP TABLE if exists orc_change_type_staging;
>> DROP TABLE if exists orc_change_type;
>> CREATE TABLE orc_change_type_staging (
>>     id int
>> );
>> CREATE TABLE orc_change_type (
>>     id int
>> ) PARTITIONED BY (`dt` string)
>> stored as orc;
>> --- load staging table
>> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO
TABLE orc_change_type_staging;
>> --- populate orc hive table
>> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select *
FROM orc_change_type_staging;
>> --- change column id from int to bigint
>> ALTER TABLE orc_change_type CHANGE id id bigint;
>> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select *
FROM orc_change_type_staging;
>> SELECT id FROM orc_change_type where dt between '20140718' and
'20140719';
>
>
> and it throw exception with branch-0.13:
>>
>> Error: java.io.IOException: java.io.IOException:
java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
cast to org.apache.hadoop.io.LongWritable
>>         at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>>         at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>>         at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
>>         at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
>>         at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
>>         at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>>         at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>> Caused by: java.io.IOException: java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.hadoop.io.LongWritable
>>         at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>>         at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>>         at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
>>         at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
>>         at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>>         at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
>>         at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
>>         ... 11 more
>> Caused by: java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.hadoop.io.LongWritable
>>         at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
>>         at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
>>         at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
>>         at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
>>         at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
>>         at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
>>         ... 15 more
>
>
>
> Thanks.
>
>
>

Re: change column type of orc table will throw exception in query time

Reply via email to