[ https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197341#comment-15197341 ]
Xuefu Zhang commented on HIVE-13293: ------------------------------------ Yes, the result can vary from query to query due to the cost of a separate sampling spark job. I'd image that the benefit diminishes as the data to be processed get smaller. While we are here, [~Lifeng Wang], could you check if SamplingOptimizer, which does the similar thing for MR, also has the same effect for MapReduce engine? Thanks. > Query occurs performance degradation after enabling parallel order by for > Hive on sprak > --------------------------------------------------------------------------------------- > > Key: HIVE-13293 > URL: https://issues.apache.org/jira/browse/HIVE-13293 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 2.0.0 > Reporter: Lifeng Wang > > I use TPCx-BB to do some performance test on Hive on Spark engine. And found > query 10 has performance degradation when enabling parallel order by. > It seems that sampling cost much time before running the real query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)