Two Iceberg talks in Flink Forward San Francisco 2022 - Batch Processing at Scale with Flink & Iceberg (Andreas Hailu) - Tame the small files problem and optimize data layout for streaming ingestion to Iceberg (Steven Wu, Gang Ye)
On Wed, Oct 12, 2022 at 12:28 PM Ryan Blue <b...@tabular.io> wrote: > Awesome, thanks Szehon! > > I'll definitely include these. Any other talks that we should highlight? > > On Wed, Oct 12, 2022 at 11:18 AM Szehon Ho <szehon.apa...@gmail.com> > wrote: > >> Hi Ryan, >> >> Do you mention Iceberg-related talks in the board report? There were >> four Iceberg talks at ApacheCon2022 (somehow the event schedule is hidden >> only to participants, not sure why): >> >> >> - Accelerate Data Lakehouse deployment with Apache Iceberg in >> Cloudera Data Platform (Attila Turoczy, Bill Zhang) >> - Apache Iceberg's REST Catalog - A Gateway to Enriching Data Access >> via the Simplicity of an HTTP Service (Sam Redai) >> - Iceberg's Best Secret: Exploring Metadata Tables (Szehon Ho) >> - Integrated Audits: Streamlined Data Observability with Apache >> Iceberg (Sam Redai) >> >> If not, feel free to ignore. >> Thanks, >> Szehon >> >> >> >> On Wed, Oct 12, 2022 at 9:36 AM Ryan Blue <b...@apache.org> wrote: >> >>> Hi everyone, >>> >>> Here’s the board report I just posted. If you have anything to add, >>> please reply to let me know! >>> Description: >>> >>> Apache Iceberg is a table format for huge analytic datasets that is >>> designed >>> for high performance and ease of use. >>> Issues: >>> >>> There are no issues requiring board attention. >>> Membership Data: >>> >>> Apache Iceberg was founded 2020-05-19 (2 years ago) >>> There are currently 22 committers and 12 PMC members in this project. >>> The Committer-to-PMC ratio is roughly 2:1. >>> >>> Community changes, past quarter: >>> >>> - No new PMC members. Last addition was Jack Ye on 2021-11-14. >>> - Fokko Driesprong was added as committer on 2022-08-21 >>> - Steven Wu was added as committer on 2022-10-07 >>> - Yufei Gu was added as committer on 2022-08-25 >>> >>> Project Activity: >>> >>> The community had 2 releases in the 0.14.x line and an initial Python >>> release, >>> 0.1.0. In addition, the vote for a 1.0.0 release is currently passing. >>> >>> The Python release is the result of significant community effort and >>> includes >>> a new CLI utility (pyiceberg), support for Hive and REST catalogs, and >>> the >>> ability to read table metadata. The next goal is a 0.2.0 release that >>> can handle >>> query planning to enable reads in Python and Python-based engines. >>> >>> The 1.0.0 JVM release adds API guarantees to the API module, but is >>> closely >>> based on 0.14.1 to make transitioning to a new major version simple. >>> >>> Next, the community is preparing a 1.1.0 release with significant new >>> updates: >>> >>> - The ability to read and write table branches >>> - Scan metrics reporting >>> - Support for Spark FunctionCatalog >>> - FLIP-27 reader support in Flink SQL >>> - Z-order support when rewriting or compacting data files >>> - Support for Puffin stats in table metadata >>> >>> Community Health: >>> >>> The community continues to be healthy in terms of commits. The number of >>> unique contributors decreased slightly, which indicates the community >>> should >>> ensure pull requests from contributors are getting enough attention. >>> >>> The increase of issues closed is due to setting up a stale issues bot to >>> help >>> keep issues fresh and relevant. The community also added issue templates >>> to >>> make bug reports and feature requests better and more clear. >>> >>> -- >>> Ryan Blue >>> >> > > -- > Ryan Blue > Tabular >