[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308462#comment-14308462 ]
Rui Li commented on HIVE-9561: ------------------------------ Yes I'm trying to avoid using SHUFFLE_SORT for any non-orderBy query. But for this particular case, we'll have QueryProperties.hasSortBy and QueryProperties.hasOrderBy both true. So we can't really distinguish. As to removing the unnecessary ordering for subqueries, that sounds interesting. Maybe we can traverse SparkWork to look for SHUFFLE_SORT edges. If the downstream work on that edge is not a leaf work, we can remove it. What do you think? > SHUFFLE_SORT should only be used for order by query [Spark Branch] > ------------------------------------------------------------------ > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)