[ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369956#comment-15369956 ]
Eugene Koifman commented on HIVE-14004: --------------------------------------- [~mmccline] I left a few comments on RB. Mostly superficial except "clone()/toString()" wrt Reader.Options. I understand what you are trying to do and generally it makes sense. One concern I have is that that the fact that orc.RecordReader need to know that it's doing Acid read vs regular read. Given how things are currently implemented, I'm not sure how to avoid that. It would be better if higher layer just specified what columns it wants - user cols and acid meta cols and interpret them so that RecordReader doesn't have to. Perhaps down the road we can make the layout be just 1 struct<operation, originalTransaction, .... currentTransaction, c1, c2 ..., cN> where "c" are user columns and include some version number in the ORC footer to know where the offset for the user columns start (in case we add more metadata columns). I think this may have other advantages. I think someone more familiar with the code patch should look at this as well. > Minor compaction produces ArrayIndexOutOfBoundsException: 7 in > SchemaEvolution.getFileType > ------------------------------------------------------------------------------------------ > > Key: HIVE-14004 > URL: https://issues.apache.org/jira/browse/HIVE-14004 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 2.2.0 > Reporter: Eugene Koifman > Assignee: Matt McCline > Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, > HIVE-14004.03.patch > > > Easiest way to repro is to add TestTxnCommands2 > {noformat} > @Test > public void testCompactWithDelete() throws Exception { > int[][] tableData = {{1,2},{3,4}}; > runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + > makeValuesClause(tableData)); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'"); > Worker t = new Worker(); > t.setThreadId((int) t.getId()); > t.setHiveConf(hiveConf); > AtomicBoolean stop = new AtomicBoolean(); > AtomicBoolean looped = new AtomicBoolean(); > stop.set(true); > t.init(stop, looped); > t.run(); > runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4"); > runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = > 2"); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'"); > t.run(); > } > {noformat} > to TestTxnCommands2 and run it. > Test won't fail but if you look > in target/tmp/log/hive.log for the following exception (from Minor > compaction). > {noformat} > 2016-06-09T18:36:39,071 WARN [Thread-190[]]: mapred.LocalJobRunner > (LocalJobRunner.java:run(560)) - job_local1233973168_0005 > java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > [hadoop-mapreduce-client-common-2.6.1.jar:?] > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) > ~[classes/:?] > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[?:1.7.0_71] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[?:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[?:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71] > {noformat} > I observed the same on a real cluster. > Based on my observations, running Major compaction instead of minor, works > fine. > Replacing the DELETE operation with update, makes both Major/Minor run fine. > The issue itself should be addressed by HIVE-13974 but need to make sure to > add the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)