Hi everyone, here are my notes from the sync last night. Feel free to add clarifications or corrections. If you’d like to be added to the invite, just send me an email and I’ll add you. Everyone is welcome.
*Topics*: - 0.8.0 release update: it’s released! - Will update the ASF site and announce tomorrow - Flink sink update - Deletes: How will clients write delete files? - Delete format questions - Should manifest_entry and data_file merge? - How should planning work? 2-phase? - Should delete files and data files be allowed in the same manfiest? - Should the same schema be used for delete and data manifests? *Discussion*: - 0.8.0 release is out - Powered by page: who would like to be listed? Gautam/Adobe, Anton and Owen will check. - Flink sink update - Tencent is working on a Flink sink in PR #856, maintained against master branch by @waterlx - Nearly all comments from Flink committer and Ryan have been addressed, getting close to ready to merge - Next step is to enhance and polish - Ryan: if this is getting close to merging, can we start breaking it down into smaller commits to make review easier? - Will start opening small PRs to get the work into master - Thanks to Steven Wu at Netflix for his comments and reviews! - Row-level delete questions: How will clients write deletes? - Junjie: How will clients write delete files with file/position? Writing delete files depends on Spark metadata columns (PR #28027 <https://github.com/apache/spark/pull/28027>) - Ryan: metadata columns are special columns that can be requested to get metadata about a row, like the source data file - MERGE INTO in Spark would use a metadata column to get the file and position for each row and pass this back to the source to delete - This is described in a draft doc for Spark: https://docs.google.com/document/d/1W1niaD0X5jV7r5uIp380ZfTW8sL5fL2fy176aoaHCj8/edit# - Iceberg would need to produce a metadata column with row position using a counter value - Row-level delete questions: Junjie: How will compaction work? - Ryan: Anton recently added an Actions framework that can use Spark to do parallel operations, like rewriting metadata and finding orphan files - Compactions could be written using this framework, then built into maintenance services or used as reference implementations - Anton: We have additional actions to contribute, like bin packing and merging that are similar - Ryan: There would also be an action to change equality deletes to position deletes - Anton: And one to compact delete files into larger delete files (minor compaction) - Row-level delete questions: Junjie: How will metadata columns work for Flink or Hive? - Junjie: Metadata column PR is for Spark - Owen: Hive has metadata columns - Ryan: Flink would implement something similar - Row-level delete questions: Anton: How will job planning work with 2 phases? What is parallelized? - Ryan: We want to stream through files and not keep lots of state in memory while planning. And, we want to be able to scan manifests in parallel. In order to release a data file as a task as soon as it is seen, we need to know all of the delete files that need to be applied to the data file. We need some strategy to ensure that all deletes are known for a data file when it is seen. That requires 2-phase planning, where we find all of the delete files and then find and release data file tasks just like the current planning. The sorting strategy doesn’t always solve the problem and complicates parallelization because the delete file for a task might be found in another thread so we would need coordination and would possibly need to wait until another thread produces possible delete files. - Anton: So the first phase produces a list of delete files? - Ryan: Yes, but we would probably index them by partition. - There was discussion about whether we need to keep data and delete files in separate manifests. The discussion noted benefits to compaction, write amplification, and simplicity. Rough consensus was to keep them separate - Anton: Can we make it part of the spec that a position delete file can’t contain deletes for a different partition. - Ryan: Any delete file can be encoded for at most one partition because it is in the delete file’s metadata. It won’t be found or considered for any data file that doesn’t have a matching partition. The only exception is when the partition spec evolves. A file stored in an hourly partition might be affected by a delete stored in a daily partition if the partition spec changed from hourly to daily. This should be something we can handle in planning. -- Ryan Blue Software Engineer Netflix