[ https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075839#comment-16075839 ]
liyunzhang_intel commented on HIVE-17010: ----------------------------------------- [~csun]: the explain of the query17 without HIVE-17010.patch is in [link|https://issues.apache.org/jira/secure/attachment/12875204/query17_explain.log]. Reduce3's datasize is 9223372036854775807 {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col28 (type: bigint), _col27 (type: bigint) 1 cs_bill_customer_sk (type: bigint), cs_item_sk (type: bigint) outputColumnNames: _col1, _col2, _col6, _col8, _col9, _col22, _col27, _col28, _col34, _col35, _col45, _col51, _col63, _col66, _col82 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL Reduce Output Operator key expressions: _col22 (type: bigint) sort order: + Map-reduce partition columns: _col22 (type: bigint) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL value expressions: _col1 (type: bigint), _col2 (type: bigint), _col6 (type: bigint), _col8 (type: bigint), _col9 (type: int), _col27 (type: bigint), _col28 (type: bigint), _col34 (type: bigint), _col35 (type: int), _col45 (type: bigint), _col51 (type: bigint), _col63 (type: bigint), _col66 (type: int), _col82 {code} Map9's datasize is 1022672 {code} Map 9 Map Operator Tree: TableScan alias: d1 filterExpr: (d_date_sk is not null and (d_quarter_name = '2000Q1')) (type: boolean) Statistics: Num rows: 73049 Data size: 2045372 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (d_date_sk is not null and (d_quarter_name = '2000Q1')) (type: boolean) Statistics: Num rows: 36524 Data size: 1022672 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: d_date_sk (type: bigint) sort order: + Map-reduce partition columns: d_date_sk (type: bigint) Statistics: Num rows: 36524 Data size: 1022672 Basic stats: COMPLETE Column stats: NONE {code} There is a join of Map 9 and Reducer3 {code} Reducer 4 <- Map 9 (PARTITION-LEVEL SORT, 1), Reducer 3 (PARTITION-LEVEL SORT, 1) {code} 9223372036854775807 + 1022672 cause the problem > Fix the overflow problem of Long type in SetSparkReducerParallelism > ------------------------------------------------------------------- > > Key: HIVE-17010 > URL: https://issues.apache.org/jira/browse/HIVE-17010 > Project: Hive > Issue Type: Bug > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Attachments: HIVE-17010.1.patch > > > We use > [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129] > to collect the numberOfBytes of sibling of specified RS. We use Long type > and it happens overflow when the data is too big. After happening this > situation, the parallelism is decided by > [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184] > if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond > is a dymamic value which is decided by spark runtime. For example, the value > of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility > that the value may be 1. The may problem here is the overflow of addition of > Long type. You can reproduce the overflow problem by following code > {code} > public static void main(String[] args) { > long a1= 9223372036854775807L; > long a2=1022672; > long res = a1+a2; > System.out.println(res); //-9223372036853753137 > BigInteger b1= BigInteger.valueOf(a1); > BigInteger b2 = BigInteger.valueOf(a2); > BigInteger bigRes = b1.add(b2); > System.out.println(bigRes); //9223372036855798479 > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)