[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

Eugene Koifman (JIRA) Sun, 10 Jul 2016 14:12:25 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369956#comment-15369956
 ]


Eugene Koifman commented on HIVE-14004:
---------------------------------------

[~mmccline] I left a few comments on RB.  Mostly superficial except 
"clone()/toString()" wrt Reader.Options.
I understand what you are trying to do and generally it makes sense.

One concern I have is that that the fact that orc.RecordReader need to know 
that it's doing Acid read vs regular read.  Given how things are currently 
implemented, I'm not sure how to avoid that.  It would be better if higher 
layer just specified what columns it wants - user cols and acid meta cols and 
interpret them so that RecordReader doesn't have to.  Perhaps down the road we 
can make the layout be just 1 struct<operation, originalTransaction, .... 
currentTransaction, c1, c2 ..., cN>  where "c" are user columns and include 
some version number in the ORC footer to know where the offset for the user 
columns start (in case we add more metadata columns).  I think this may have 
other advantages.

I think someone more familiar with the code patch should look at this as well.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14004
>                 URL: https://issues.apache.org/jira/browse/HIVE-14004
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Matt McCline
>         Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
>     int[][] tableData = {{1,2},{3,4}};
>     runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
>     Worker t = new Worker();
>     t.setThreadId((int) t.getId());
>     t.setHiveConf(hiveConf);
>     AtomicBoolean stop = new AtomicBoolean();
>     AtomicBoolean looped = new AtomicBoolean();
>     stop.set(true);
>     t.init(stop, looped);
>     t.run();
>     runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
>     runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
>     t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>         at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[?:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[?:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[?:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
> {noformat}
> I observed the same on a real cluster.
> Based on my observations, running Major compaction instead of minor, works 
> fine.
> Replacing the DELETE operation with update, makes both Major/Minor run fine.
> The issue itself should be addressed by HIVE-13974 but need to make sure to 
> add the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

Reply via email to