Iceberg community sync notes - 6 May 2020

Ryan Blue Thu, 07 May 2020 15:48:26 -0700

Hi everyone, here are my notes from the sync last night. Feel free to add
clarifications or corrections. If you’d like to be added to the invite,
just send me an email and I’ll add you. Everyone is welcome.


*Topics*:

   - 0.8.0 release update: it’s released!
      - Will update the ASF site and announce tomorrow
   - Flink sink update
   - Deletes: How will clients write delete files?
   - Delete format questions
      - Should manifest_entry and data_file merge?
      - How should planning work? 2-phase?
      - Should delete files and data files be allowed in the same manfiest?
      - Should the same schema be used for delete and data manifests?

*Discussion*:

   - 0.8.0 release is out
      - Powered by page: who would like to be listed? Gautam/Adobe, Anton
      and Owen will check.
   - Flink sink update
      - Tencent is working on a Flink sink in PR #856, maintained against
      master branch by @waterlx
      - Nearly all comments from Flink committer and Ryan have been
      addressed, getting close to ready to merge
      - Next step is to enhance and polish
      - Ryan: if this is getting close to merging, can we start breaking it
      down into smaller commits to make review easier?
      - Will start opening small PRs to get the work into master
      - Thanks to Steven Wu at Netflix for his comments and reviews!
   - Row-level delete questions: How will clients write deletes?
      - Junjie: How will clients write delete files with file/position?
      Writing delete files depends on Spark metadata columns (PR #28027
      <https://github.com/apache/spark/pull/28027>)
      - Ryan: metadata columns are special columns that can be requested to
      get metadata about a row, like the source data file
      - MERGE INTO in Spark would use a metadata column to get the file and
      position for each row and pass this back to the source to delete
      - This is described in a draft doc for Spark:
      
https://docs.google.com/document/d/1W1niaD0X5jV7r5uIp380ZfTW8sL5fL2fy176aoaHCj8/edit#
      - Iceberg would need to produce a metadata column with row position
      using a counter value
   - Row-level delete questions: Junjie: How will compaction work?
      - Ryan: Anton recently added an Actions framework that can use Spark
      to do parallel operations, like rewriting metadata and finding
orphan files
      - Compactions could be written using this framework, then built into
      maintenance services or used as reference implementations
      - Anton: We have additional actions to contribute, like bin packing
      and merging that are similar
      - Ryan: There would also be an action to change equality deletes to
      position deletes
      - Anton: And one to compact delete files into larger delete files
      (minor compaction)
   - Row-level delete questions: Junjie: How will metadata columns work for
   Flink or Hive?
      - Junjie: Metadata column PR is for Spark
      - Owen: Hive has metadata columns
      - Ryan: Flink would implement something similar
   - Row-level delete questions: Anton: How will job planning work with 2
   phases? What is parallelized?
      - Ryan: We want to stream through files and not keep lots of state in
      memory while planning. And, we want to be able to scan manifests in
      parallel. In order to release a data file as a task as soon as
it is seen,
      we need to know all of the delete files that need to be applied
to the data
      file. We need some strategy to ensure that all deletes are known
for a data
      file when it is seen. That requires 2-phase planning, where we
find all of
      the delete files and then find and release data file tasks just like the
      current planning. The sorting strategy doesn’t always solve the
problem and
      complicates parallelization because the delete file for a task might be
      found in another thread so we would need coordination and would possibly
      need to wait until another thread produces possible delete files.
      - Anton: So the first phase produces a list of delete files?
      - Ryan: Yes, but we would probably index them by partition.
      - There was discussion about whether we need to keep data and delete
      files in separate manifests. The discussion noted benefits to compaction,
      write amplification, and simplicity. Rough consensus was to keep them
      separate
      - Anton: Can we make it part of the spec that a position delete file
      can’t contain deletes for a different partition.
      - Ryan: Any delete file can be encoded for at most one partition
      because it is in the delete file’s metadata. It won’t be found or
      considered for any data file that doesn’t have a matching partition. The
      only exception is when the partition spec evolves. A file stored in an
      hourly partition might be affected by a delete stored in a daily
partition
      if the partition spec changed from hourly to daily. This should be
      something we can handle in planning.

-- 
Ryan Blue
Software Engineer
Netflix

Iceberg community sync notes - 6 May 2020

Reply via email to