Re: Re: Performance evaluation of Trino 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3

Sungwoo Park Wed, 23 Apr 2025 01:14:17 -0700

Evaluating Hive4-LLAP can be of interest to many users in this mailing
list, but it's a lot of work and we are not sure if we can finish parameter
tuning to achieve the best performance.
For Hive-Tez, DAGAppMaster is reused across queries. Only worker containers
are not reused across queries.


--- Sungwoo


On Tue, Apr 22, 2025 at 6:25 PM lisoda <[email protected]> wrote:

> Hello Sungwoo
> BTW, would you consider adding HIVE4-LLAP as a control group for the trial?
> Tks.
> Lisoda
>
>
>
>
>
> 在 2025-04-22 16:37:29，"Sungwoo Park" <[email protected]> 写道：
>
> From average response time analysis:
>
> For Spark, it performs better than its total execution time suggests, with
> an average response time significantly lower than Hive on Tez.
>
> For long-running complex queries (like query 24) on large datasets, Hive
> on Tez can be a better choice than Spark, even with its initial overhead of
> starting YARN containers.
>
> --- Sungwoo
>
> On Tue, Apr 22, 2025 at 2:52 PM ypeng <[email protected]> wrote:
>
>>
>> Thanks for the doc.
>> I am surprised to see spark 4 is even slower than hive on Tez.
>>
>>
>> [Total Execution Time (Sequential). Trino is the fastest, followed
>> closely by Hive on MR3, which significantly outperformed Hive on Tez.
>> Spark is the slowest, skewed by a few outlier queries.]
>>
>>
>> Sungwoo Park:
>> > We published a blog that reports the performance evaluation of Trino
>> > 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS
>> > Benchmark, 10TB scale factor. Hope you find it useful.
>>
>>

Re: Re: Performance evaluation of Trino 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3

Reply via email to