Hey Iceberg Nation, Here are the meeting minutes from last few meeting's minutes. I've had some adjustments after moving on from Tabular, thanks for bearing with me.
Transcription/Recording https://youtu.be/jAWka8g0o7c Summary 0:11 Significant progress on geospatial support proposal, addressing key aspects like geometry types, encoding, partition transforms, predicate pushdown etc. 0:16 Variant data type proposal from Snowflake team to add support for JSON superset "variant" type in Iceberg 8:44 Discussion on 1.6 release blockers and next steps for other releases (Python, Rust) Recent Updates 0:23 RevAPI plugin forked to new repo to continue using with Gradle 1:00 Parallel file listings for Snapshot/Migrate commands 1:29 Data file content filters pushdown for metadata table scans 3:00 Aggregate pushdown for incremental scans 3:27 Flink performance improvements 4:15 Removing credential override in table sessions Geospatial Support Overview 14:21 Background on open-source geospatial support 14:56 Geometry as a complex type following OGC standards 16:40 WKB encoding mapped to Parquet byte arrays 18:36 Parquet logical type for geometry stats (bounding boxes) 20:25 XZ2 partition transform to map geometries deterministically 21:22 Geospatial sort orders like Hilbert curve 22:12 Integration with Sedona for Spark expressions Variant Data Type Proposal 33:46 Variant as JSON superset with richer types (timestamps etc.) 40:06 Encoding discussion - leaning towards Spark variant binary 44:12 Support both JSON and variant, with variant as superset 52:21 Potential performance improvements with sub-column extraction Releases 7:25 1.6 release blockers: commit-isCommitted PR, Avro release 9:25 Rust release pending Avro 1.11/1.12 10:47 Python 0.7 release wrapping up arrow integration Next Steps * Review and approve pull request for Iceberg 1.6 release * Consider implications of supporting both well-known binary and Parquet logical type for geospatial, and push for Parquet logical type if possible * Evaluate potential use cases for bounding box intersections at manifest list level, beyond just geospatial * Get more details from data types team on variant proposal, including how to map variant to JSON and any performance implications of separating subcolumns * Continue discussion on whether to support variant type, JSON type, or both in Iceberg, considering benefits and downsides of each option Notes * Highlights * Java * Thanks Ajantha, JB and Eduard for [working on]( https://github.com/apache/iceberg/pull/8486/) the [RevAPI upgrade]( https://github.com/apache/iceberg/pull/10631) * RevAPI moved now from https://github.com/palantir/gradle-revapi to https://github.com/revapi/gradle-revapi * Spark snapshot and migrate procedures can parallelize file listing (Thanks, Manu!) * Added pushdown for data file content filters in entries metadata table (Thanks, Steve Zhang!) * Added aggregate pushdown for incremental scans (Thanks, Huaxin!) * Flink performance improved by pre-creating getters (Thanks, @fengjiajie!) * Fixed credential override in table sessions (Thanks, Alexandre!) * Python * Thanks Sung for [streaming data]( https://github.com/apache/iceberg-python/pull/786) through an arrow batch reader. * Thanks Honah for adding [merge-appends]( https://github.com/apache/iceberg-python/pull/569). * Rust * Thanks Zenotme for writing [Field-IDs]( https://github.com/apache/iceberg-rust/pull/411) to the Avro files. * Spark * Can pass read properties via [Spark SQL]( https://github.com/apache/spark/pull/46707) (Szehon) * Releases * Iceberg 1.6.0 Release ([devlist]( https://lists.apache.org/thread/ymx4kbbfmndmhlrzfrpgzj3hmo6294pv), [milestone](https://github.com/apache/iceberg/milestone/44)). * [Kafka Connect: Commit coordination]( https://github.com/apache/iceberg/pull/10351) * Iceberg-rust 0.3.0 release ([devlist]( https://lists.apache.org/thread/x1kn3oq5lv6hllf1d50pbyrwcwthy4t1), [tracking issue](https://github.com/apache/iceberg-rust/issues/348), [project](https://github.com/orgs/apache/projects/339/views/1)). Waiting for the Avro release. * PyIceberg heading to 0.7.0 ([tracking issue]( https://github.com/apache/iceberg-python/issues/736)), wrapping up the final PRs * Progress for project guidelines * Discussion ([all proposals]( https://github.com/apache/iceberg/labels/proposal)) * Geo Support Overview (if helpful?) * Column ranges in ManifestFile metadata (stored in manifest list) * Spec changes * Relative paths in Iceberg metadata (revisit this [proposal]( https://docs.google.com/document/u/0/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit )) * Variant [proposal]( https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit#heading=h.rt0cvesdzsj7 ) * Additional types * timestamp_ns * Variant * Null type? * Geo * TimeUUID (in binary comparable representation) * Blob? * Default values * Type promotion * Long to timestamp * int/long/bool to string * Multi-column transforms * Delete vectors ([discussion]( https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g)) * Path/prefix ownership ([discussion]( https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r)) * Location/path requirements or recommendations