Hey Iceberg Nation, Here are the meeting minutes from the Aug 21st meeting.
Transcription/Recording https://youtu.be/bN8OSHPApSk Summary Project Updates 3:19 Flink support added for 2020 copy and range distribution 4:30 Java 8 support dropped; now targeting Java 11 5:10 V3 format progress: metadata classes copied, upgrade testing added 6:00 S3 recovery operations implemented for repair table action 6:56 Deprecated APIs removed for 1.7 release 7:02 Column stats now exposed to Spark for better query planning 7:44 Rust: 0.3 release out, SQL catalog contribution, manifest list caching added 8:19 Python: 0.7.1 release out Upcoming Releases 9:11 1.6.1 vote thread ongoing - focuses on reducing Trino memory consumption 9:35 1.7 preparing proposals for V3 spec changes 10:25 Default value support needed for Parquet and ORC formats Row Lineage Proposal 11:13 Aims to identify and track changes to individual rows over time 12:30 Leaning towards global identifier approach 12:41 Two key fields: row identifier and row version (likely using sequence number) 13:31 Open questions on additional versioning information Row-level Deletes Improvements 22:17 Proposal addresses shortcomings in current implementation 22:33 Suggests synchronous maintenance of delete files 22:41 Splits metadata tracking at file level, but rolls up to larger files 22:53 Would require synchronous delete maintenance from V3 forward 28:24 Helps with CDC and change log capabilities Type Promotion 32:16 Current stats lack original type information for promoted types 32:37 Proposal to limit scope for V3 to promotions determinable by byte count 32:48 Int/long to string promotion tricky; may use lower/upper bound byte count heuristic 33:21 Community feedback requested on GitHub PR with spec changes UI for Iceberg 34:11 Prototype UI developed to visualize namespaces, tables, properties, etc. 35:18 Discussion on whether to include in core project or keep separate 35:35 Consensus leans towards maintaining as separate community project 41:47 Suggestion to create "awesome list" for Iceberg-related tools/projects REST Catalog Testing 46:31 PR adds lightweight server to run existing catalog tests against REST implementations 46:40 Provides standardized behavior testing across implementations 46:51 Can be pointed at any REST server exposing Iceberg protocol 47:25 \~100 tests available out-of-the-box Geo Type Specification 53:58 Email sent with geo spec details for review 51:25 Questions raised about deriving bounding box values from well-known binary format 52:11 May need clarification in Iceberg spec on extracting values from Parquet stats