[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

Rui Li (JIRA) Thu, 05 Feb 2015 18:09:47 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308462#comment-14308462
 ]


Rui Li commented on HIVE-9561:
------------------------------

Yes I'm trying to avoid using SHUFFLE_SORT for any non-orderBy query. But for 
this particular case, we'll have QueryProperties.hasSortBy and 
QueryProperties.hasOrderBy both true. So we can't really distinguish.

As to removing the unnecessary ordering for subqueries, that sounds 
interesting. Maybe we can traverse SparkWork to look for SHUFFLE_SORT edges. If 
the downstream work on that edge is not a leaf work, we can remove it. What do 
you think?

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> ------------------------------------------------------------------
>
>                 Key: HIVE-9561
>                 URL: https://issues.apache.org/jira/browse/HIVE-9561
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

Reply via email to