Thanks for sharing benchmark results. May I ask why you choose ORC?

2017-01-30 19:57 GMT+09:00 김동원 <eastcirc...@gmail.com>:

> Hi,
>
> Recently I did some experiments using Hive, Spark, and Presto using TPC-DS
> benchmark
> and I'd like to share the result with the community: http://www.
> slideshare.net/ssuser6bb12d/hive-presto-and-spark-on-tpcds-benchmark
> I entirely depend on the benchmark kit from Hortonwork:
> https://github.com/hortonworks/hive-testbench
>
> Here I have a question about query 72.
> Hive LLAP shows better performance than Presto and Spark for most queries,
> but it shows very poor performance on the execution of query 72.
> While Presto also struggles with query 72, Spark finishes the execution of
> query 72 a lot faster than Hive (page 9 and 10).
> I've observed a weird pattern in CPU utilization from Presto and Hive
> executing query 72 (page 11).
> When I turned off Spark's WholeStageCodeGen, Spark also takes a very long
> time to finish the execution of query 72 (page 12).
> Did I miss some feature of Hive to improve the performance of that kind of
> query?
> I use the following setting for Hive experiments: https://github.
> com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/testbench.
> settings
>
> Except query 72, Hive with LLAP shows very good performance for both small
> and large workload anyway.
>
> - Dongwon Kim
>

Reply via email to