[ https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449205#comment-15449205 ]
Benjamin BONNET commented on HIVE-14660: ---------------------------------------- Here are the properties we set before creating the table and before inserting/deleting rows: {code} set hive.support.concurrency=true; set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=true; set hive.compactor.worker.threads=1; {code} Apart from those properties, settings are "standard" (we use a HDP 2.3 cluster). mapred.reduce.tasks is not set We have (by default): mapreduce.job.reduces=-1 mapreduce.reduce.speculative=true Table has 36 columns, is clustered into 4 buckets (on a single column), ORC formatted, transactional and partitioned by year/month. It has about half billion rows. The genuine query that fails is a kind of : {code}DELETE FROM table WHERE string_operations_on_some_columns IN ( select_from_another_table ); {code} Concerning the mapred.reduce.tasks setting, I talk about a work-around (not a solution) since reading the FileSinkOperator, one sees it has be designed to operate on multiple buckets. In my opinion, the only mistake was the use of an array instead of a map, or a circular array (if you are guaranteed the way buckets are dealtwith is circular for the latter, which I could not assume when I wrote the patch). > ArrayIndexOutOfBoundsException on delete > ---------------------------------------- > > Key: HIVE-14660 > URL: https://issues.apache.org/jira/browse/HIVE-14660 > Project: Hive > Issue Type: Bug > Components: Query Processor, Transactions > Affects Versions: 1.2.1 > Reporter: Benjamin BONNET > Assignee: Benjamin BONNET > Attachments: HIVE-14660.1-banch-1.2.patch > > > Hi, > DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException. > That bug occurs at Reduce phase when there are less reducers than the number > of the table buckets. > In order to reproduce, create a simple ACID table : > {code:sql} > CREATE TABLE test (`cle` bigint,`valeur` string) > PARTITIONED BY (`annee` string) > CLUSTERED BY (cle) INTO 5 BUCKETS > TBLPROPERTIES ('transactional'='true'); > {code} > Populate it with lines distributed among all buckets, with random values and > a few partitions. > Force the Reducers to be less than the buckets : > {code:sql} > set mapred.reduce.tasks=1; > {code} > Then execute a delete that will remove many lines from all the buckets. > {code:sql} > DELETE FROM test WHERE valeur<'some_value'; > {code} > Then you will get an ArrayIndexOutOfBoundsException : > {code} > 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 17 more > {code} > Adding logs into FileSinkOperator, one sees the operator deals with buckets > 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you > switch bucket, you move forwards in a 5 (number of buckets) elements array. > So when you get bucket 0 for the second time, you get out of the array... -- This message was sent by Atlassian JIRA (v6.3.4#6332)