[jira] [Updated] (HIVE-17448) ArrayIndexOutOfBoundsException on ORC tables after adding a struct field

Nikolay Sokolov (JIRA) Wed, 06 Sep 2017 17:26:34 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nikolay Sokolov updated HIVE-17448:
-----------------------------------
    Description: 
When ORC files have been created with older schema, which had smaller set of 
struct fields, and schema have been changed to one with more struct fields, and 
there are sibling fields of struct type going after struct itself, 
ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:
{code:none}
create external table test_broken_struct(a struct<f1:int, f2:int>, b int) 
stored as orc;
insert into table test_broken_struct 
    select named_struct("f1", 1, "f2", 2), 3;
drop table test_broken_struct;
create external table test_broken_struct(a struct<f1:int, f2:int, f3:int>, b 
int) stored as orc;
select * from test_broken_struct;
{code}
Same scenario is not causing crash on hive 0.14.

Debug log and stack trace:
{code:none}
2017-09-07T00:21:40,266  INFO [main] orc.OrcInputFormat: Using schema evolution 
configuration variables schema.evol
ution.columns [a, b] / schema.evolution.columns.types 
[struct<f1:int,f2:int,f3:int>, int] (isAcidRead false)
2017-09-07T00:21:40,267 DEBUG [main] orc.OrcInputFormat: No ORC pushdown 
predicate
2017-09-07T00:21:40,267  INFO [main] orc.ReaderImpl: Reading ORC rows from 
hdfs://cluster-7199-m/user/hive/warehous
e/test_broken_struct/000000_0 with {include: [true, true, true, true, true], 
offset: 3, length: 159, schema: struct
<a:struct<f1:int,f2:int,f3:int>,b:int>}
Failed with exception 
java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 5
2017-09-07T00:21:40,273 ERROR [main] CliDriver: Failed with exception 
java.io.IOException:java.lang.ArrayIndexOutOf
BoundsException: 5
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 5
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
        at 
org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:195)
        at 
org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:253)
        at org.apache.orc.impl.SchemaEvolution.<init>(SchemaEvolution.java:59)
        at 
org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:149)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:87)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:314)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:225)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1691)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:69
5)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
        ... 15 more
{code}

  was:
When ORC files have been created with older schema, which had smaller set of 
struct fields, and schema have been changed to one with more struct fields, and 
there are sibling fields of struct type going after struct itself, 
ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:
{code:none}
create external table test_broken_struct(a struct<f1:int, f2:int>, b int);
insert into table test_broken_struct 
    select named_struct("f1", 1, "f2", 2), 3;
drop table test_broken_struct;
create external table test_broken_struct(a struct<f1:int, f2:int, f3:int>, b 
int);
select * from test_broken_struct;
{code}

Same scenario is not causing crash on hive 0.14.


> ArrayIndexOutOfBoundsException on ORC tables after adding a struct field
> ------------------------------------------------------------------------
>
>                 Key: HIVE-17448
>                 URL: https://issues.apache.org/jira/browse/HIVE-17448
>             Project: Hive
>          Issue Type: Bug
>          Components: ORC
>    Affects Versions: 2.1.1
>         Environment: Reproduced on Dataproc 1.1, 1.2 (Hive 2.1).
>            Reporter: Nikolay Sokolov
>            Priority: Minor
>         Attachments: HIVE-17448.1-branch-2.1.patch
>
>
> When ORC files have been created with older schema, which had smaller set of 
> struct fields, and schema have been changed to one with more struct fields, 
> and there are sibling fields of struct type going after struct itself, 
> ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:
> {code:none}
> create external table test_broken_struct(a struct<f1:int, f2:int>, b int) 
> stored as orc;
> insert into table test_broken_struct 
>     select named_struct("f1", 1, "f2", 2), 3;
> drop table test_broken_struct;
> create external table test_broken_struct(a struct<f1:int, f2:int, f3:int>, b 
> int) stored as orc;
> select * from test_broken_struct;
> {code}
> Same scenario is not causing crash on hive 0.14.
> Debug log and stack trace:
> {code:none}
> 2017-09-07T00:21:40,266  INFO [main] orc.OrcInputFormat: Using schema 
> evolution configuration variables schema.evol
> ution.columns [a, b] / schema.evolution.columns.types 
> [struct<f1:int,f2:int,f3:int>, int] (isAcidRead false)
> 2017-09-07T00:21:40,267 DEBUG [main] orc.OrcInputFormat: No ORC pushdown 
> predicate
> 2017-09-07T00:21:40,267  INFO [main] orc.ReaderImpl: Reading ORC rows from 
> hdfs://cluster-7199-m/user/hive/warehous
> e/test_broken_struct/000000_0 with {include: [true, true, true, true, true], 
> offset: 3, length: 159, schema: struct
> <a:struct<f1:int,f2:int,f3:int>,b:int>}
> Failed with exception 
> java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 5
> 2017-09-07T00:21:40,273 ERROR [main] CliDriver: Failed with exception 
> java.io.IOException:java.lang.ArrayIndexOutOf
> BoundsException: 5
> java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 5
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>         at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
>         at 
> org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:195)
>         at 
> org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:253)
>         at org.apache.orc.impl.SchemaEvolution.<init>(SchemaEvolution.java:59)
>         at 
> org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:149)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
>         at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:87)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:314)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:225)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1691)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:69
> 5)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>         ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17448) ArrayIndexOutOfBoundsException on ORC tables after adding a struct field

Reply via email to