[ 
https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075839#comment-16075839
 ] 

liyunzhang_intel commented on HIVE-17010:
-----------------------------------------

[~csun]: 
the explain of the query17 without HIVE-17010.patch is in 
[link|https://issues.apache.org/jira/secure/attachment/12875204/query17_explain.log].
  Reduce3's datasize  is 9223372036854775807 
{code}
   Reducer 3 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col28 (type: bigint), _col27 (type: bigint)
                  1 cs_bill_customer_sk (type: bigint), cs_item_sk (type: 
bigint)
                outputColumnNames: _col1, _col2, _col6, _col8, _col9, _col22, 
_col27, _col28, _col34, _col35, _col45, _col51, _col63, _col66, _col82
                Statistics: Num rows: 9223372036854775807 Data size: 
9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL
                Reduce Output Operator
                  key expressions: _col22 (type: bigint)
                  sort order: +
                  Map-reduce partition columns: _col22 (type: bigint)
                  Statistics: Num rows: 9223372036854775807 Data size: 
9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL
                  value expressions: _col1 (type: bigint), _col2 (type: 
bigint), _col6 (type: bigint), _col8 (type: bigint), _col9 (type: int), _col27 
(type: bigint), _col28 (type: bigint), _col34 (type: bigint), _col35 (type: 
int), _col45 (type: bigint), _col51 (type: bigint), _col63 (type: bigint), 
_col66 (type: int), _col82 
{code}

Map9's datasize is 1022672 
{code}
        Map 9 
            Map Operator Tree:
                TableScan
                  alias: d1
                  filterExpr: (d_date_sk is not null and (d_quarter_name = 
'2000Q1')) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 2045372 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (d_date_sk is not null and (d_quarter_name = 
'2000Q1')) (type: boolean)
                    Statistics: Num rows: 36524 Data size: 1022672 Basic stats: 
COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: d_date_sk (type: bigint)
                      sort order: +
                      Map-reduce partition columns: d_date_sk (type: bigint)
                      Statistics: Num rows: 36524 Data size: 1022672 Basic 
stats: COMPLETE Column stats: NONE
{code}
There is a join of Map 9 and Reducer3
{code}
        Reducer 4 <- Map 9 (PARTITION-LEVEL SORT, 1), Reducer 3 
(PARTITION-LEVEL SORT, 1)
{code}
9223372036854775807 + 1022672 
cause the problem

> Fix the overflow problem of Long type in SetSparkReducerParallelism
> -------------------------------------------------------------------
>
>                 Key: HIVE-17010
>                 URL: https://issues.apache.org/jira/browse/HIVE-17010
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17010.1.patch
>
>
> We use 
> [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
>  to collect the numberOfBytes of sibling of specified RS. We use Long type 
> and it happens overflow when the data is too big. After happening this 
> situation, the parallelism is decided by 
> [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
>  if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond 
> is a dymamic value which is decided by spark runtime. For example, the value 
> of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility 
> that the value may be 1. The may problem here is the overflow of addition of 
> Long type.  You can reproduce the overflow problem by following code
> {code}
>     public static void main(String[] args) {
>       long a1= 9223372036854775807L;
>       long a2=1022672;
>       long res = a1+a2;
>       System.out.println(res);  //-9223372036853753137
>       BigInteger b1= BigInteger.valueOf(a1);
>       BigInteger b2 = BigInteger.valueOf(a2);
>       BigInteger bigRes = b1.add(b2);
>       System.out.println(bigRes); //9223372036855798479
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to