Thanks for sharing benchmark results. May I ask why you choose ORC? 2017-01-30 19:57 GMT+09:00 김동원 <eastcirc...@gmail.com>:
> Hi, > > Recently I did some experiments using Hive, Spark, and Presto using TPC-DS > benchmark > and I'd like to share the result with the community: http://www. > slideshare.net/ssuser6bb12d/hive-presto-and-spark-on-tpcds-benchmark > I entirely depend on the benchmark kit from Hortonwork: > https://github.com/hortonworks/hive-testbench > > Here I have a question about query 72. > Hive LLAP shows better performance than Presto and Spark for most queries, > but it shows very poor performance on the execution of query 72. > While Presto also struggles with query 72, Spark finishes the execution of > query 72 a lot faster than Hive (page 9 and 10). > I've observed a weird pattern in CPU utilization from Presto and Hive > executing query 72 (page 11). > When I turned off Spark's WholeStageCodeGen, Spark also takes a very long > time to finish the execution of query 72 (page 12). > Did I miss some feature of Hive to improve the performance of that kind of > query? > I use the following setting for Hive experiments: https://github. > com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/testbench. > settings > > Except query 72, Hive with LLAP shows very good performance for both small > and large workload anyway. > > - Dongwon Kim >