Hey Iceberg Nation,

Here are the meeting minutes from the Aug 21st meeting.

Transcription/Recording

https://youtu.be/bN8OSHPApSk

Summary

Project Updates
3:19 Flink support added for 2020 copy and range distribution
4:30 Java 8 support dropped; now targeting Java 11
5:10 V3 format progress: metadata classes copied, upgrade testing added
6:00 S3 recovery operations implemented for repair table action
6:56 Deprecated APIs removed for 1.7 release
7:02 Column stats now exposed to Spark for better query planning
7:44 Rust: 0.3 release out, SQL catalog contribution, manifest list caching
added
8:19 Python: 0.7.1 release out

Upcoming Releases
9:11 1.6.1 vote thread ongoing - focuses on reducing Trino memory
consumption
9:35 1.7 preparing proposals for V3 spec changes
10:25 Default value support needed for Parquet and ORC formats

Row Lineage Proposal
11:13 Aims to identify and track changes to individual rows over time
12:30 Leaning towards global identifier approach
12:41 Two key fields: row identifier and row version (likely using sequence
number)
13:31 Open questions on additional versioning information

Row-level Deletes Improvements
22:17 Proposal addresses shortcomings in current implementation
22:33 Suggests synchronous maintenance of delete files
22:41 Splits metadata tracking at file level, but rolls up to larger files
22:53 Would require synchronous delete maintenance from V3 forward
28:24 Helps with CDC and change log capabilities

Type Promotion
32:16 Current stats lack original type information for promoted types
32:37 Proposal to limit scope for V3 to promotions determinable by byte
count
32:48 Int/long to string promotion tricky; may use lower/upper bound byte
count heuristic
33:21 Community feedback requested on GitHub PR with spec changes

UI for Iceberg
34:11 Prototype UI developed to visualize namespaces, tables, properties,
etc.
35:18 Discussion on whether to include in core project or keep separate
35:35 Consensus leans towards maintaining as separate community project
41:47 Suggestion to create "awesome list" for Iceberg-related tools/projects

REST Catalog Testing
46:31 PR adds lightweight server to run existing catalog tests against REST
implementations
46:40 Provides standardized behavior testing across implementations
46:51 Can be pointed at any REST server exposing Iceberg protocol
47:25 \~100 tests available out-of-the-box

Geo Type Specification
53:58 Email sent with geo spec details for review
51:25 Questions raised about deriving bounding box values from well-known
binary format
52:11 May need clarification in Iceberg spec on extracting values from
Parquet stats

Reply via email to