Hi Ajantha,

this is very interesting document, thank you for your work on this!
I've added a few comments there.

I have one high-level design comment so I thought it would be nicer to
everyone if I re-post it here

is "partition" the right level of keeping the stats?
> We do this in Hive, but was it an accidental choice? or just the only
> thing that was possible to be implemented many years ago?


> Iceberg allows to have higher number of partitions compared to Hive,
> because it scales better. But that means partition-level may or may not be
> the right granularity.


> A self-optimizing system would gather stats on "per query unit" basis --
> for example if i partition by [ day x country ], but usually query by day,
> the days are the "query unit" and from stats perspective country can be
> ignored.
> Having more fine-grained partitions may lead to expensive planning time,
> so it's not theoretical problem.


> I am not saying we should implement all this logic right now, but I think
> we should decouple partitioning scheme from stats partitions, to allow
>  query engine to become smarter.



cc @Alexander Jo <alex...@starburstdata.com>

Best
PF




On Mon, Nov 14, 2022 at 12:47 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hi Community,
> I did a proposal write-up for the partition stats in Iceberg.
> Please have a look and let me know what you think. I would like to work on
> it.
>
>
> https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit?usp=sharing
>
> Requirement background snippet from the above document.
>
>> For some query engines that use cost-based-optimizer instead or along
>> with rule-based-optimizer (like Dremio, Trino, etc), at the planning time,
>> it is good to know the partition level stats like total rows per
>> partition and total files per partition to take decisions for CBO (
>> like deciding on the join reordering and join type, identifying the
>> parallelism).
>> Currently, the only way to do this is to read the partition info from 
>> data_file
>> in manifest_entry of the manifest file and compute partition-level
>> statistics (the same thing that ‘partitions’ metadata table is doing [see
>> Appendix A
>> <https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit#heading=h.s8iywtu7x8m6>
>> ]).
>> Doing this on each query is expensive. Hence, this is a proposal for
>> computing and storing partition-level stats for Iceberg tables and using
>> them during queries.
>
>
>
> Thanks,
> Ajantha
>

Reply via email to