Great ideas! Looking forward to the community focused proposal in details and how it can benefit iceberg!
Thanks, Steve Zhang > On May 10, 2024, at 10:06 PM, Tyler Akidau > <tyler.aki...@snowflake.com.INVALID> wrote: > > Subcolumnarization of variant columns allows query engines to efficiently > prune datasets when subcolumns (i.e., nested fields) within a variant column > are queried, and also allows optionally materializing some of the nested > fields as a column on their own, affording queries on these subcolumns the > ability to read less data and spend less CPU on extraction. When > subcolumnarizing, the system managing table metadata and data tracks > individual pruning statistics (min, max, null, etc.) for some subset of the > nested fields within a variant, and also manages any optional > materialization. Without subcolumnarization, any query which touches a > variant column must read, parse, extract, and filter every row for which that > column is non-null. Thus, by providing a standardized way of tracking > subcolum metadata and data for variant columns, Iceberg can make subcolumnar > optimizations accessible across various catalogs and query engines.