Hi Chen, Here is the doc on remaining tasks for format V2 that I updated with the latest status today, including individual PRs pending review and tasks needed that are V2-blocking: https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit Please feel free to comment/edit as needed.
As mentioned in Anton's email, it would be great if more people can review the pending PRs. Thank you! Yan On Tue, Mar 16, 2021 at 8:06 AM Chen Song <chen.song...@gmail.com> wrote: > Thanks for the summary. On V2 format. Is there a google doc to review, or > any sort of backlog of tickets to track? > > Chen > > On Mon, Mar 15, 2021 at 10:34 PM Anton Okolnychyi > <aokolnyc...@apple.com.invalid> wrote: > >> Hey everyone, >> >> Thanks to folks who attended. I added my notes from the last sync. Please >> feel free to add/correct if I missed anything. >> >> Main points >> >> - Highlights >> - StreamingOffset for Structured Streaming in Spark >> - New Actions API >> - Spark procedure for partial import of existing tables >> - Subsurface talks are online >> - Call for papers is open at ApacheCon and Subsurface >> - Releases >> - 0.11.1 >> - Waiting for the fix on handling situations when the metastore >> fails during commit (#2317). >> - 0.12.0 >> - Should include Spark 3.1 support >> - V2 format items should be included whenever possible but >> should not block the release >> - No new blockers >> - Ideally, end of March >> - Table corruption issue (#2317 >> <https://github.com/apache/iceberg/issues/2317>) >> - We may corrupt tables if the metastore fails during commit and >> the commit state is unknown. Iceberg may delete files that were >> actually >> committed. >> - A lot of folks have seen this issue. >> - Parth has shared some thoughts from a discussion they had >> internally here >> >> <https://docs.google.com/document/d/1dN7gZwXmlI6Nl4RToAWgsMIsiJUCRSpfFfIL9Kr8s0k> >> . >> - We can handle this issue in two phases: >> - Don’t corrupt the table (Russell has a PR) >> - Avoid duplicated results if operations are blindly retried >> (can be done in a follow-up PR) >> - Seems worth including the first part in 0.11.1 >> - V2 format >> - Open points: >> - Primary key or row id for upserts >> - Propagating the sort order id for files on write >> - Need more reviewers >> - Encryption >> - Multiple people expressed interested in data encryption. >> - Existing work by John here >> <https://github.com/apache/iceberg/pull/1918>. >> - Ideally, should leverage as much as possible of modular >> encryption in Parquet 1.12 discussed here >> <https://github.com/apache/iceberg/issues/1413>. >> - Agreed to start a thread on the dev list. >> - ChachingCatalog issues (#2319 >> <https://github.com/apache/iceberg/issues/2319>) >> - The current behavior leads to stale data if multiple sessions >> are used. >> - No ideal solution due to Spark limitations. Agreed to discuss in >> the issue. >> - Multi-table transactions >> - Jacques has proposed an API here >> <https://github.com/apache/iceberg/pull/1849> and is about to >> start working on an implementation. >> - Agreed to collaborate on the dev list. More eyes would be great. >> >> >> The link to the doc: >> https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg >> >> Thanks, >> Anton >> > > > -- > Chen Song > >