Re: [PROPOSAL] Add manifest-level statistics for CBO estimation

Anton Okolnychyi Wed, 16 Oct 2024 16:08:29 -0700

Does the doc suggest it is too expensive to aggregate min/max stats after
planning files (i.e. after loading matching files in memory)? Do we have
any benchmarks to refer to? We will have to read manifests for planning
anyway, right?


Also, the doc proposes to add column level stats to the manifest list. I
remember Dan mentioned the idea to get rid of the manifest list in V4 and
allow manifests to point to other manifests. While having lower and upper
bounds at the top-level sounds promising, I am not sure we would want to
use that for CBO. What if we have a selective predicate that drastically
narrows down the scope of the operation? In that case, the file stats will
give us much more precise information.

- Anton

вт, 15 жовт. 2024 р. о 17:01 Xingyuan Lin <linxingyuan1...@gmail.com> пише:

> Hi everyone,
>
> Here's a doc for [Proposal] Add manifest-level statistics for CBO
> estimation
> <https://docs.google.com/document/d/1NMsS4dg_AXh_abVfzx24VBOmLPIPaHzZRa2g5uUUHDI/edit?usp=sharing>.
> It's for more efficient derivation of stats for the CBO process. Original 
> discussion
> thread <https://lists.apache.org/thread/jkt1g4vzjgjbtd5b5dwqs9dzo1ndrwkt.>
> .
>
> Please feel free to take a look and comment.
>
> Thanks,
> Xingyuan
>

Re: [PROPOSAL] Add manifest-level statistics for CBO estimation

Reply via email to