Great ideas! Looking forward to the community focused proposal in details and 
how it can benefit iceberg!

Thanks,
Steve Zhang

> On May 10, 2024, at 10:06 PM, Tyler Akidau 
> <tyler.aki...@snowflake.com.INVALID> wrote:
> 
> Subcolumnarization of variant columns allows query engines to efficiently 
> prune datasets when subcolumns (i.e., nested fields) within a variant column 
> are queried, and also allows optionally materializing some of the nested 
> fields as a column on their own, affording queries on these subcolumns the 
> ability to read less data and spend less CPU on extraction. When 
> subcolumnarizing, the system managing table metadata and data tracks 
> individual pruning statistics (min, max, null, etc.) for some subset of the 
> nested fields within a variant, and also manages any optional 
> materialization. Without subcolumnarization, any query which touches a 
> variant column must read, parse, extract, and filter every row for which that 
> column is non-null. Thus, by providing a standardized way of tracking 
> subcolum metadata and data for variant columns, Iceberg can make subcolumnar 
> optimizations accessible across various catalogs and query engines.

Reply via email to