[ 
https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174341#comment-14174341
 ] 

Chao commented on HIVE-8486:
----------------------------

OK, I debugged this query. In {{SetSparkReducerParallelism}}, in order to 
estimate, it needs to obtain statistics from the siblings of the current reduce 
sink, and adds up the total number of bytes. However, somehow the 
{{statistics}} field of all the siblings are null, and hence the number of 
bytes is 0 at end. As result, it will only use one reducer.

I'm wondering if this is something we haven't implemented yet, or is it a bug?

> TPC-DS Query 96 parallelism is not set correcly
> -----------------------------------------------
>
>                 Key: HIVE-8486
>                 URL: https://issues.apache.org/jira/browse/HIVE-8486
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>            Assignee: Chao
>
> When we run the query on a 20B we only have a parallelism factor of 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to