[ https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238556#comment-15238556 ]
Rui Li commented on HIVE-13293: ------------------------------- Thanks [~xuefuz] for the review. I mean it can work with queries that have only one ShuffleMapStage. It will definitely work with queries that have multiple ShuffleMapStage too. But as I said in previous comment, what we care about here is just the last ShuffleMapStage because that's what gets re-computed in parallel order by. On the other hand, splitting task that has only one ShuffleMapStage seems weird and may be bad for performance. That's why I chose to cache the RDD. > Query occurs performance degradation after enabling parallel order by for > Hive on Spark > --------------------------------------------------------------------------------------- > > Key: HIVE-13293 > URL: https://issues.apache.org/jira/browse/HIVE-13293 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 2.0.0 > Reporter: Lifeng Wang > Assignee: Rui Li > Attachments: HIVE-13293.1.patch > > > I use TPCx-BB to do some performance test on Hive on Spark engine. And found > query 10 has performance degradation when enabling parallel order by. > It seems that sampling cost much time before running the real query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)