[ https://issues.apache.org/jira/browse/HIVE-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa reassigned HIVE-25071: ------------------------------------- > Number of reducers limited to fixed 1 when updating/deleting > ------------------------------------------------------------ > > Key: HIVE-25071 > URL: https://issues.apache.org/jira/browse/HIVE-25071 > Project: Hive > Issue Type: Bug > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > > When updating/deleting bucketed tables an extra ReduceSink operator is > created to enforce bucketing. After HIVE-22538 number of reducers limited to > fixed 1 in these RS operators. > This can lead to performance degradation. > Prior HIVE-22538 multiple reducers was available such cases. The reason for > limiting the number of reducers is to ensure RowId ascending order in delete > delta files produced by the update/delete statements. > This is the plan of delete statement like: > {code} > DELETE FROM t1 WHERE a = 1; > {code} > {code} > TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7] > {code} > RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of > reducers were limited to bucket number in the table or > hive.exec.reducers.max. However RS[5] does not provide any ordering so above > plan may generate unsorted deleted deltas which leads to corrupted data reads. > Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication > and the resulting RS kept the ordering and enabled multiple reducers. It > could do because ReduceSinkDeduplication was prepared for ACID writes. This > was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)