Thanks for the summary. On V2 format. Is there a google doc to review, or any sort of backlog of tickets to track?
Chen On Mon, Mar 15, 2021 at 10:34 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > Hey everyone, > > Thanks to folks who attended. I added my notes from the last sync. Please > feel free to add/correct if I missed anything. > > Main points > > - Highlights > - StreamingOffset for Structured Streaming in Spark > - New Actions API > - Spark procedure for partial import of existing tables > - Subsurface talks are online > - Call for papers is open at ApacheCon and Subsurface > - Releases > - 0.11.1 > - Waiting for the fix on handling situations when the metastore > fails during commit (#2317). > - 0.12.0 > - Should include Spark 3.1 support > - V2 format items should be included whenever possible but > should not block the release > - No new blockers > - Ideally, end of March > - Table corruption issue (#2317 > <https://github.com/apache/iceberg/issues/2317>) > - We may corrupt tables if the metastore fails during commit and > the commit state is unknown. Iceberg may delete files that were actually > committed. > - A lot of folks have seen this issue. > - Parth has shared some thoughts from a discussion they had > internally here > > <https://docs.google.com/document/d/1dN7gZwXmlI6Nl4RToAWgsMIsiJUCRSpfFfIL9Kr8s0k> > . > - We can handle this issue in two phases: > - Don’t corrupt the table (Russell has a PR) > - Avoid duplicated results if operations are blindly retried > (can be done in a follow-up PR) > - Seems worth including the first part in 0.11.1 > - V2 format > - Open points: > - Primary key or row id for upserts > - Propagating the sort order id for files on write > - Need more reviewers > - Encryption > - Multiple people expressed interested in data encryption. > - Existing work by John here > <https://github.com/apache/iceberg/pull/1918>. > - Ideally, should leverage as much as possible of modular > encryption in Parquet 1.12 discussed here > <https://github.com/apache/iceberg/issues/1413>. > - Agreed to start a thread on the dev list. > - ChachingCatalog issues (#2319 > <https://github.com/apache/iceberg/issues/2319>) > - The current behavior leads to stale data if multiple sessions are > used. > - No ideal solution due to Spark limitations. Agreed to discuss in > the issue. > - Multi-table transactions > - Jacques has proposed an API here > <https://github.com/apache/iceberg/pull/1849> and is about to start > working on an implementation. > - Agreed to collaborate on the dev list. More eyes would be great. > > > The link to the doc: > https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg > > Thanks, > Anton > -- Chen Song