Hi Xingyuan,
I've been reviewing the partition statistics file, and it seems that adding
partition-level min/max values would be a natural fit within Partition
Statistics File[1], which is one file per snapshot. We could introduce a
few new fields to accommodate these values.

While this addition could increase the size of the partition statistics
file, given that it’s stored in Parquet format, I believe the impact on the
reader will be minimal. On the writer's side, collecting these metrics may
require some additional work, but since they are generated asynchronously,
it should be manageable.

One minor limitation is that the async generation may not always reflect
the latest snapshot, but overall, I think this approach should work well.

[1] https://iceberg.apache.org/spec/#partition-statistics-file

Yufei


On Thu, Sep 26, 2024 at 9:59 AM Xingyuan Lin <linxingyuan1...@gmail.com>
wrote:

> Hi team,
>
> Just bumping this up. What do you think of this? Does the alternative
> solution make sense or is it too much of a spec change?
>
> Goal is to improve engine CBO's efficiency and effectiveness. Today, it's
> fairly an expensive operation for engine CBO to get table stats:
> https://github.com/trinodb/trino/blob/117c74e54131b47adf8c74ee1b6d98bafd1fe859/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java#L158-L172.
> In one of our production environments (Trino 423 + Iceberg 1.3.0), if we
> turn on CBO, we could see some queries' planning time getting exceptionally
> long, even time out in some cases (exceeding 180seconds), as well as seeing
> memory pressure on Trino coordinator. There have been many improvements in
> the Trino community to improve the status quo. But a more radical solution
> would be to reduce the amount of computation needed to get table stats
> during the CBO process.
>
> Thanks,
> Xingyuan
>
> On Mon, Sep 23, 2024 at 8:32 PM Xingyuan Lin <linxingyuan1...@gmail.com>
> wrote:
>
>> Hi Iceberg dev team,
>>
>> Cost-based optimizers need these stats:
>>
>>    - record count
>>    - null count
>>    - number of distinct values (NDVs)
>>    - min/max values
>>    - column sizes
>>
>> Today, to get these stats, an engine must process manifest files, which
>> can be an expensive operation when the table is large (has a lot of data
>> files, hence resulting in large-scale manifest files).
>>
>> There has been a proposal to add partition-level stats to improve this:
>> https://github.com/apache/iceberg/issues/8450. However, the current
>> solution doesn't include min/max values, etc. yet. Furthermore, because of
>> row-level delete support, it seems a non-trivial operation to calculate
>> partition-level min/max values. To make them accurate, it seems that an
>> engine must process data files.
>>
>> I think an alternative solution is to add these stats at manifest file
>> level, so that an engine only needs to process a single manifest list file
>> to get them. Furthermore, since estimated stats are good enough for CBOs,
>> the engines can ignore manifest files that track delete files.
>>
>> Another benefit of this alternative solution is that it's a lightweight
>> additive operation to update manifest-file-level stats, and the writers can
>> do this when they ingest data.
>>
>> What do you think?
>>
>> Thanks,
>> Xingyuan
>>
>

Reply via email to