Hi Ajantha, this is very interesting document, thank you for your work on this! I've added a few comments there.
I have one high-level design comment so I thought it would be nicer to everyone if I re-post it here is "partition" the right level of keeping the stats? > We do this in Hive, but was it an accidental choice? or just the only > thing that was possible to be implemented many years ago? > Iceberg allows to have higher number of partitions compared to Hive, > because it scales better. But that means partition-level may or may not be > the right granularity. > A self-optimizing system would gather stats on "per query unit" basis -- > for example if i partition by [ day x country ], but usually query by day, > the days are the "query unit" and from stats perspective country can be > ignored. > Having more fine-grained partitions may lead to expensive planning time, > so it's not theoretical problem. > I am not saying we should implement all this logic right now, but I think > we should decouple partitioning scheme from stats partitions, to allow > query engine to become smarter. cc @Alexander Jo <alex...@starburstdata.com> Best PF On Mon, Nov 14, 2022 at 12:47 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Hi Community, > I did a proposal write-up for the partition stats in Iceberg. > Please have a look and let me know what you think. I would like to work on > it. > > > https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit?usp=sharing > > Requirement background snippet from the above document. > >> For some query engines that use cost-based-optimizer instead or along >> with rule-based-optimizer (like Dremio, Trino, etc), at the planning time, >> it is good to know the partition level stats like total rows per >> partition and total files per partition to take decisions for CBO ( >> like deciding on the join reordering and join type, identifying the >> parallelism). >> Currently, the only way to do this is to read the partition info from >> data_file >> in manifest_entry of the manifest file and compute partition-level >> statistics (the same thing that ‘partitions’ metadata table is doing [see >> Appendix A >> <https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit#heading=h.s8iywtu7x8m6> >> ]). >> Doing this on each query is expensive. Hence, this is a proposal for >> computing and storing partition-level stats for Iceberg tables and using >> them during queries. > > > > Thanks, > Ajantha >