Hey Iceberg Nation,

Here are the meeting minutes from last few meeting's minutes. I've had some
adjustments after moving on from Tabular, thanks for bearing with me.

Transcription/Recording

https://youtu.be/jAWka8g0o7c

Summary

0:11 Significant progress on geospatial support proposal, addressing key
aspects like geometry types, encoding, partition transforms, predicate
pushdown etc.
0:16 Variant data type proposal from Snowflake team to add support for JSON
superset "variant" type in Iceberg
8:44 Discussion on 1.6 release blockers and next steps for other releases
(Python, Rust)

Recent Updates

0:23 RevAPI plugin forked to new repo to continue using with Gradle
1:00 Parallel file listings for Snapshot/Migrate commands
1:29 Data file content filters pushdown for metadata table scans
3:00 Aggregate pushdown for incremental scans
3:27 Flink performance improvements
4:15 Removing credential override in table sessions

Geospatial Support Overview

14:21 Background on open-source geospatial support
14:56 Geometry as a complex type following OGC standards
16:40 WKB encoding mapped to Parquet byte arrays
18:36 Parquet logical type for geometry stats (bounding boxes)
20:25 XZ2 partition transform to map geometries deterministically
21:22 Geospatial sort orders like Hilbert curve
22:12 Integration with Sedona for Spark expressions

Variant Data Type Proposal

33:46 Variant as JSON superset with richer types (timestamps etc.)
40:06 Encoding discussion - leaning towards Spark variant binary
44:12 Support both JSON and variant, with variant as superset
52:21 Potential performance improvements with sub-column extraction

Releases

7:25 1.6 release blockers: commit-isCommitted PR, Avro release
9:25 Rust release pending Avro 1.11/1.12
10:47 Python 0.7 release wrapping up arrow integration

Next Steps

* Review and approve pull request for Iceberg 1.6 release
* Consider implications of supporting both well-known binary and Parquet
logical type for geospatial, and push for Parquet logical type if possible
* Evaluate potential use cases for bounding box intersections at manifest
list level, beyond just geospatial
* Get more details from data types team on variant proposal, including how
to map variant to JSON and any performance implications of separating
subcolumns
* Continue discussion on whether to support variant type, JSON type, or
both in Iceberg, considering benefits and downsides of each option

Notes

* Highlights
  * Java
    * Thanks Ajantha, JB and Eduard for [working on](
https://github.com/apache/iceberg/pull/8486/) the [RevAPI upgrade](
https://github.com/apache/iceberg/pull/10631)
      * RevAPI moved now from https://github.com/palantir/gradle-revapi to
https://github.com/revapi/gradle-revapi
    * Spark snapshot and migrate procedures can parallelize file listing
(Thanks, Manu!)
    * Added pushdown for data file content filters in entries metadata
table (Thanks, Steve Zhang!)
    * Added aggregate pushdown for incremental scans (Thanks, Huaxin!)
    * Flink performance improved by pre-creating getters (Thanks,
@fengjiajie!)
    * Fixed credential override in table sessions (Thanks, Alexandre!)
  * Python
    * Thanks Sung for [streaming data](
https://github.com/apache/iceberg-python/pull/786) through an arrow batch
reader.
    * Thanks Honah for adding [merge-appends](
https://github.com/apache/iceberg-python/pull/569).
  * Rust
    * Thanks Zenotme for writing [Field-IDs](
https://github.com/apache/iceberg-rust/pull/411) to the Avro files.
  * Spark
    * Can pass read properties via [Spark SQL](
https://github.com/apache/spark/pull/46707) (Szehon)
* Releases
  * Iceberg 1.6.0 Release ([devlist](
https://lists.apache.org/thread/ymx4kbbfmndmhlrzfrpgzj3hmo6294pv),
[milestone](https://github.com/apache/iceberg/milestone/44)).
    * [Kafka Connect: Commit coordination](
https://github.com/apache/iceberg/pull/10351)
  * Iceberg-rust 0.3.0 release ([devlist](
https://lists.apache.org/thread/x1kn3oq5lv6hllf1d50pbyrwcwthy4t1),
[tracking issue](https://github.com/apache/iceberg-rust/issues/348),
[project](https://github.com/orgs/apache/projects/339/views/1)). Waiting
for the Avro release.
  * PyIceberg heading to 0.7.0 ([tracking issue](
https://github.com/apache/iceberg-python/issues/736)), wrapping up the
final PRs
  * Progress for project guidelines
* Discussion ([all proposals](
https://github.com/apache/iceberg/labels/proposal))
  * Geo Support Overview (if helpful?)
    * Column ranges in ManifestFile metadata (stored in manifest list)
  * Spec changes
    * Relative paths in Iceberg metadata (revisit this [proposal](
https://docs.google.com/document/u/0/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit
))
    * Variant [proposal](
https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit#heading=h.rt0cvesdzsj7
)
    * Additional types
      * timestamp_ns
      * Variant
      * Null type?
      * Geo
      * TimeUUID (in binary comparable representation)
      * Blob?
    * Default values
    * Type promotion
      * Long to timestamp
      * int/long/bool to string
    * Multi-column transforms
    * Delete vectors ([discussion](
https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g))
    * Path/prefix ownership ([discussion](
https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r))
    * Location/path requirements or recommendations

Reply via email to