Hey everyone,

Thanks to folks who attended. I added my notes from the last sync. Please feel 
free to add/correct if I missed anything.

Main points
Highlights
StreamingOffset for Structured Streaming in Spark
New Actions API
Spark procedure for partial import of existing tables
Subsurface talks are online
Call for papers is open at ApacheCon and Subsurface
Releases
0.11.1
Waiting for the fix on handling situations when the metastore fails during 
commit (#2317).
0.12.0
Should include Spark 3.1 support
V2 format items should be included whenever possible but should not block the 
release
No new blockers
Ideally, end of March
Table corruption issue (#2317 <https://github.com/apache/iceberg/issues/2317>)
We may corrupt tables if the metastore fails during commit and the commit state 
is unknown. Iceberg may delete files that were actually committed.
A lot of folks have seen this issue.
Parth has shared some thoughts from a discussion they had internally here 
<https://docs.google.com/document/d/1dN7gZwXmlI6Nl4RToAWgsMIsiJUCRSpfFfIL9Kr8s0k>.
We can handle this issue in two phases:
Don’t corrupt the table (Russell has a PR)
Avoid duplicated results if operations are blindly retried (can be done in a 
follow-up PR)
Seems worth including the first part in 0.11.1
V2 format
Open points:
Primary key or row id for upserts
Propagating the sort order id for files on write
Need more reviewers
Encryption
Multiple people expressed interested in data encryption.
Existing work by John here <https://github.com/apache/iceberg/pull/1918>.
Ideally, should leverage as much as possible of modular encryption in Parquet 
1.12 discussed here <https://github.com/apache/iceberg/issues/1413>.
Agreed to start a thread on the dev list.
ChachingCatalog issues (#2319 <https://github.com/apache/iceberg/issues/2319>)
The current behavior leads to stale data if multiple sessions are used.
No ideal solution due to Spark limitations. Agreed to discuss in the issue.
Multi-table transactions
Jacques has proposed an API here <https://github.com/apache/iceberg/pull/1849> 
and is about to start working on an implementation.
Agreed to collaborate on the dev list. More eyes would be great.

The link to the doc: 
https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg 
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg>

Thanks,
Anton

Reply via email to