[ https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280347#comment-15280347 ]
Rui Li commented on HIVE-13293: ------------------------------- Hi [~xuefuz], yeah order by is mostly at the end of stages. But that doesn't mean the amount of data is small - that's why we need parallel order by. During our benchmark, we hit OOM for several cases, which is due to some bug in Spark 1.6.0. So I thought using memory level cache may make it even worse. To your second question, we unpersist cached RDDs at the end of each job. You can refer to {{RemoteDriver#JobWrapper}} for that. > Query occurs performance degradation after enabling parallel order by for > Hive on Spark > --------------------------------------------------------------------------------------- > > Key: HIVE-13293 > URL: https://issues.apache.org/jira/browse/HIVE-13293 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 2.0.0 > Reporter: Lifeng Wang > Assignee: Rui Li > Attachments: HIVE-13293.1.patch, HIVE-13293.1.patch > > > I use TPCx-BB to do some performance test on Hive on Spark engine. And found > query 10 has performance degradation when enabling parallel order by. > It seems that sampling cost much time before running the real query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)