Good point! I think we can run the TPC-DS benchmark multiple times and wait until the LLAP cache has sufficiently cached the data onto the SSD. Then, we can observe whether the test performance improves. If I remember correctly, LLAP has a page where you can check the cache hit rate.
Thanks, Butao Zhang On 2025/09/08 09:12:59 Denys Kuzmenko wrote: > hi Sungwoo, > > I don’t believe the TPC-DS benchmark is the best way to demonstrate the > advantages of Hive LLAP’s distributed cache. > TPC-DS is primarily designed to measure query optimization and overall system > performance across a wide variety of complex workloads, but it doesn’t > necessarily highlight scenarios where LLAP’s in-memory caching of frequently > accessed data provides clear benefits. > A more targeted benchmark or workload that emphasizes repeated access to the > same datasets would be a better fit to showcase the strengths of LLAP’s > distributed caching capabilities. > > Regards, > Denys >