[ 
https://issues.apache.org/jira/browse/HIVE-28790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28790:
----------------------------------
    Labels: hive-4.1.0-must  (was: )

> ACID deletes are failing with ArrayIndexOutOfBoundsException when direct 
> insert is enabled 
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28790
>                 URL: https://issues.apache.org/jira/browse/HIVE-28790
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Kokila N
>            Priority: Major
>              Labels: hive-4.1.0-must
>
> *Steps to reproduce:*
> {code:java}
>     set mapreduce.job.reduces=7;
>     create external table ext(a int) stored as textfile;
>     insert into table ext values(1),(2),(3),(4),(5),(6),(7), (8), (9), (12);
>     create table full_acid(a int) stored as orc 
> tblproperties("transactional"="true");
>     insert into table full_acid select * from ext where a != 3 and a <=7 
> group by a;
>     insert into table full_acid select * from ext where a>7 group by a;
>     set mapreduce.job.reduces=1;
>     delete from full_acid where a in (2, 12);
> {code}
> The delete will fail with the following exception:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:258)
> {code}
>  
> *The problem* is in the FileSinkOperator.createDynamicBucket method:
> {code:java}
>    public int createDynamicBucket(int bucketNum) {
>       // this assumes all paths are bucket names (which means no lookup is 
> needed)
>       int writerOffset = bucketNum;
>       if (updaters.length <= writerOffset) {
>         this.updaters = Arrays.copyOf(updaters, writerOffset + 1);
>         this.outPaths = Arrays.copyOf(outPaths, writerOffset + 1);
>         this.finalPaths = Arrays.copyOf(finalPaths, writerOffset + 1);
>       }
>       if (this.finalPaths[writerOffset] == null) {
>         if (conf.isDirectInsert()) {
>           this.outPathsCommitted = Arrays.copyOf(outPathsCommitted, 
> writerOffset + 1);
>           this.finalPaths[writerOffset] = buildTmpPath();
>           this.outPaths[writerOffset] = buildTmpPath();
>         } else {
>           // uninitialized bucket
>           String bucketName =
>               Utilities.replaceTaskIdFromFilename(Utilities.getTaskId(hconf), 
> bucketNum);
>           this.finalPaths[writerOffset] = new Path(bDynParts ? buildTmpPath() 
> : parent, bucketName);
>           this.outPaths[writerOffset] = new Path(buildTaskOutputTempPath(), 
> bucketName);
>         }
>       }
>       return writerOffset;
>     }
>   } // class FSPaths
> {code}
> In the first part the updaters, outPaths and finalPaths arrays are copied if 
> the writerOffset is not smaller than their length. So these array are 
> extended. But in the second part when the outPathsCommitted array is copied, 
> the size of the array is not compared with the writerOffset. So it can happen 
> that the outPathsCommitted array is reduced. If this situation happens it 
> leads to the ArrayIndexOutOfBoundsException when closing the writes, because 
> the outPathsCommitted array is shorter than the updaters array.
>  
> *About the reproduction:*
> The first insert into the full_acid table creates files with buckets 1, 2, 3, 
> 5, 6
> The second insert creates the files with buckets 1, 4, 6
> The bucket number for 6 is 537264128, for 4 is 537133056, so for 4 it is 
> smaller than for 6.
> To reproduce the issue, we need to delete a row from bucket 6 and bucket 4 
> together and make it so, that both rows are processed by the same 
> FileSinkOperator. It will process the row from bucket 6 first, so makes the 
> arraycopy in dynamicBucketing with the writerOffset 6. Then comes the row for 
> 4 and in the dynamicBucketing it will do the second arrayCopy wrongly. So the 
> finalPath array will be size 7, but the outPathsCommitted will be arrayCopied 
> to size 4+1. This will cause the exception when closing the writers.
> By setting the reducer number to 1 before the delete, both rows are processed 
> by the same FileSinkOperator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to