Hey Iceberg Nation,

Here are the meeting minutes from last few meeting's minutes. I've had some
adjustments after moving on from Tabular, thanks for bearing with me.

Transcription/Recording

https://youtu.be/ekR0HOvjvI4

Summary
0:18 Geo support proposal has been added, community feedback is requested
3:55 View support for Hive catalog is in progress, reviewers needed
18:00 Rewrite data files API needs improvements to handle large numbers of
small files and reduce memory pressure
26:38 Discussion around having an open source reference implementation of
the Iceberg REST catalog spec
36:14 Reminder about the upcoming Iceberg Summit virtual conference

Geo Support Proposal

3:15 Geo support proposal has been added to the codebase, community is
requested to review and provide comments
3:43 Exciting new avenue to explore geometric types and geographic
partitioning functions natively in Iceberg

View Support for Hive Catalog

4:06 Work is ongoing to add view support to the Hive catalog
4:33 High-level review done, but reviewers familiar with Hive codebase
needed

Rewrite Data Files API

19:16 Current implementation can lead to high memory pressure when
rewriting large numbers of small files
19:12 Suggestions:
    18:10 Add limit on number of files/bytes to rewrite in one operation
    18:34 Flush data files after certain threshold instead of writing all
at commit time
    19:12 Compact delete files first before rewriting data files
21:51 Need to explore Spark's data source V2 API for potential improvements

Reference REST Implementation

26:38 Discussion on having an open source reference implementation of
Iceberg's REST catalog spec
31:42 Concerns around making it an official Apache project and maintaining
it
19:12 Suggestions:
    32:27 Start as a community project outside Apache, potentially under an
Apple open source repo
    32:48 Make it thin, without opinionated features like metrics/security
    34:03 Once mature, consider contributing to Apache Iceberg

Iceberg Summit

36:14 Reminder about the upcoming Iceberg Summit virtual conference on May
14-15
36:20 Great lineup of speakers from the community and Iceberg users

Notes

* Highlights
  * [Iceberg sink has been added](https://github.com/apache/beam/pull/30797)
to the Beam project
  * [Add pagination when listing namespaces/tables/views](
https://github.com/apache/iceberg/pull/9782) (Thanks Rahil)
  * [Stale PR management](https://github.com/apache/iceberg/pull/10134)
(Thanks JB)
  * [Use 'delete' / 'append' if OverwriteFiles only deletes/appends data
files](https://github.com/apache/iceberg/pull/10150) (Thanks Eduard)
  * [Use ‘delete’ if RowDelta only has deletes](
https://github.com/apache/iceberg/pull/10123) (Thanks Eduard)
  * [JDBC: Fix escape character used in Namespace SQL](
https://github.com/apache/iceberg/pull/10167) (Thanks Chauncy, JB)
  * [JDBC: Fix issue when migrating from V0 schema to V1](
https://github.com/apache/iceberg/pull/10152) (Thanks JB)
  * [Add support for Flink 1.19](
https://github.com/apache/iceberg/pull/10112) (Thanks Rodrigo)
  * [Flink: Apply delete granularity for writes](
https://github.com/apache/iceberg/pull/10200) (Thanks Peter)
  * Iceberg Summit is Next Week!
* Releases
  * Java 1.5.2 Release
    * Please vote!
    * 1.5.2 has the same source code as 1.5.1.
    * This release is being performed due to an issue with the released
Spark artifacts in 1.5.1. Some artifacts were compiled with the incorrect
Scala version due to an unclear reason.
  * Wrapping up the PyIceberg 0.7.0 release. If there is anything that you
want to get in, please add it to the [0.7.0 Milestone](
https://github.com/apache/iceberg-python/milestone/2).
* Discussion
  * Geo support proposal is up: [
github.com/apache/iceberg/issues/10260](http://github.com/apache/iceberg/issues/10260)
please take a look. Can discuss more in next sync
  * View support for Hive-Catalog:
https://github.com/apache/iceberg/pull/9852

Reply via email to