Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
Oh good to know about the multi-layer proposal. Can you help share a link to it if there's any? I will also draft a short proposal on the manifest-level stats topic in a Google doc so that folks can review and comment. Thank you Yufei for your time and input. On Thu, Sep 26, 2024 at 4:18 PM Yufei

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Yufei Gu
I agree, this approach makes sense and could align well with the multi-layer manifest file proposal. Each layer's manifest file could potentially hold aggregated metrics, which would streamline the process. However, so far, there have only been offline discussions, and no formal proposal has been d

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
Thanks Yufei for taking a look. Yes I think adding the min/max values to partition-level statistics will also do. In fact, it has been proposed by [1]. However, my concern was that calculating partition-level min/max values would be an expensive operation because of the row-level deletes support (

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Yufei Gu
Hi Xingyuan, I've been reviewing the partition statistics file, and it seems that adding partition-level min/max values would be a natural fit within Partition Statistics File[1], which is one file per snapshot. We could introduce a few new fields to accommodate these values. While this addition c

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
Hi team, Just bumping this up. What do you think of this? Does the alternative solution make sense or is it too much of a spec change? Goal is to improve engine CBO's efficiency and effectiveness. Today, it's fairly an expensive operation for engine CBO to get table stats: https://github.com/trin