Does the doc suggest it is too expensive to aggregate min/max stats after planning files (i.e. after loading matching files in memory)? Do we have any benchmarks to refer to? We will have to read manifests for planning anyway, right?
Also, the doc proposes to add column level stats to the manifest list. I remember Dan mentioned the idea to get rid of the manifest list in V4 and allow manifests to point to other manifests. While having lower and upper bounds at the top-level sounds promising, I am not sure we would want to use that for CBO. What if we have a selective predicate that drastically narrows down the scope of the operation? In that case, the file stats will give us much more precise information. - Anton вт, 15 жовт. 2024 р. о 17:01 Xingyuan Lin <linxingyuan1...@gmail.com> пише: > Hi everyone, > > Here's a doc for [Proposal] Add manifest-level statistics for CBO > estimation > <https://docs.google.com/document/d/1NMsS4dg_AXh_abVfzx24VBOmLPIPaHzZRa2g5uUUHDI/edit?usp=sharing>. > It's for more efficient derivation of stats for the CBO process. Original > discussion > thread <https://lists.apache.org/thread/jkt1g4vzjgjbtd5b5dwqs9dzo1ndrwkt.> > . > > Please feel free to take a look and comment. > > Thanks, > Xingyuan >