Evaluating Hive4-LLAP can be of interest to many users in this mailing list, but it's a lot of work and we are not sure if we can finish parameter tuning to achieve the best performance. For Hive-Tez, DAGAppMaster is reused across queries. Only worker containers are not reused across queries.
--- Sungwoo On Tue, Apr 22, 2025 at 6:25 PM lisoda <lis...@yeah.net> wrote: > Hello Sungwoo > BTW, would you consider adding HIVE4-LLAP as a control group for the trial? > Tks. > Lisoda > > > > > > 在 2025-04-22 16:37:29,"Sungwoo Park" <glap...@gmail.com> 写道: > > From average response time analysis: > > For Spark, it performs better than its total execution time suggests, with > an average response time significantly lower than Hive on Tez. > > For long-running complex queries (like query 24) on large datasets, Hive > on Tez can be a better choice than Spark, even with its initial overhead of > starting YARN containers. > > --- Sungwoo > > On Tue, Apr 22, 2025 at 2:52 PM ypeng <yp...@t-online.de> wrote: > >> >> Thanks for the doc. >> I am surprised to see spark 4 is even slower than hive on Tez. >> >> >> [Total Execution Time (Sequential). Trino is the fastest, followed >> closely by Hive on MR3, which significantly outperformed Hive on Tez. >> Spark is the slowest, skewed by a few outlier queries.] >> >> >> Sungwoo Park: >> > We published a blog that reports the performance evaluation of Trino >> > 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS >> > Benchmark, 10TB scale factor. Hope you find it useful. >> >>