Re: [DISCUSS] Changes for row-level deletes

Junjie Chen Wed, 06 May 2020 09:05:20 -0700

Hi Ryan

Besides the reading and merging of delete files, can we talk a bit about
write side of delete files? For example, generate delete files in a spark
action, the metadata column support, the service to transfer equality
delete files to position delete files etc..


On Wed, May 6, 2020 at 1:34 PM Miao Wang <miw...@adobe.com.invalid> wrote:

> Hi Ryan,
>
>
>
> “Tables must be manually upgraded to version 2 in order to use any of the
> metadata changes we are making” If I understand correctly, for exist
> iceberg table in v1, we have to run some CLI/script to rewrite the
> metadata.
>
>
>
> “Next, we've added sequence numbers and the proposed inheritance scheme to
> v2, along with tests to ensure that v1 is written without sequence numbers
> and that when reading v1 metadata, the sequence numbers are all 0.” To me,
> this means V2 reader should be able to read V1 table metadata. Therefore,
> the step above is not required, which only requires us to use a V2 reader
> on a V1 table.
>
>
>
> However, if a table has been written in V1, we want to save it as V2. I
> expect only metadata data will be rewritten into V2 and V1 metadata will be
> vacuumed upon V2 success.
>
>
>
> Is my understanding correct?
>
>
>
> Thanks!
>
>
>
> Miao
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, "
> rb...@netflix.com" <rb...@netflix.com>
> *Date: *Tuesday, May 5, 2020 at 5:03 PM
> *To: *Iceberg Dev List <dev@iceberg.apache.org>
> *Subject: *[DISCUSS] Changes for row-level deletes
>
>
>
> Hi, everyone,
>
>
>
> I know several people that are planning to attend the sync tomorrow are
> interested in the row-level delete work, so I wanted to share some of the
> progress and my current thinking ahead of time.
>
>
>
> The codebase now supports a new version number, 2. Tables must be manually
> upgraded to version 2 in order to use any of the metadata changes we are
> making; v1 readers cannot read v2 tables. When a write takes place, the
> version number is now passed to the manifest writer, manifest list writer,
> etc. and the right schema for the table's current version is used. We've
> also frozen the v1 schemas and added wrappers to ensure that even as the
> internal classes, like DataFile, evolve, the exact same data is written to
> v1.
>
>
>
> Next, we've added sequence numbers and the proposed inheritance scheme to
> v2, along with tests to ensure that v1 is written without sequence numbers
> and that when reading v1 metadata, the sequence numbers are all 0. This
> gives us the ability to track "when" a row-level delete occurred in a v2
> table.
>
>
>
> The next steps are to start making larger changes to metadata files.
>
>
>
> One change that I've been considering is getting rid of manifest_entry. In
> v1, a manifest stored a manifest_entry that wrapped a data_file. The intent
> was to separate data that API users needed to supply -- fields in data_file
> -- from data that was tracked internally by Iceberg -- the snapshot_id and
> status fields of manifest_entry. If we want to combine these so that a
> manifest stores one top-level data_file struct, then now is the time to
> make that change. I've prototyped this in #963
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>.
> The benefit is that the schema is flatter so we wouldn't need two metadata
> tables (entries and files). The main drawback is that we aren't going to
> stop using v1 tables, so we would effectively have two different manifest
> schemas instead of v2 as an evolution of v1. I'd love to hear more opinions
> on whether to do this. I'm leaning toward not merging the two.
>
>
>
> Another change is to start adding tracking fields for delete files and
> updating the APIs. The metadata for this is fairly simple: an enum that
> stores whether the file is data, position deletes, or equality deletes. The
> main decision point is whether to allow mixing data files and delete files
> together in manifests. I don't think that we should allow manifests with
> both delete files and data files. The reason is job planning: we want to
> start emitting splits immediately so that we can stream them, instead of
> holding them all in memory. That means we need some way to guarantee that
> we know all of the delete files to apply to a data file before we encounter
> the data file.
>
>
>
> OpenInx suggested sorting by sequence number to see delete files before
> data files, but it still requires holding all splits in memory in the worst
> case due to overlapping sequence number ranges. I think Iceberg should plan
> a scan in two phases: one to find matching delete files (held in memory)
> and one to find matching data files. That solves the problem of having all
> deletes available so a split can be immediately emitted, and also allows
> parallelizing both phases without coordination across threads.
>
>
>
> For the two-phase approach, mixing delete files and data files in a
> manifest would require reading that manifest twice, once in each phase. I
> think it makes the most sense to keep delete files and data files in
> separate manifests. But the trade-off is that Iceberg will need to track
> the content of a manifest (deletes or data) and perform actions on separate
> manifest groups.
>
>
>
> Also, because with separate delete and data manifests we _could_ use
> separate manifest schemas, I went through and wrote out a schema for a
> delete file manifest. That schema was so similar to the current data file
> schema that I think it's simpler to use the same one for both.
>
>
>
> In summary, here are the things that we need to decide and what I think we
> should do:
>
>
>
> * Merge manifest_entry and data_file? I think we should not, to
> avoid additional complexity.
>
> * How should planning with delete files work? The two-phase approach is
> the only one I think is viable.
>
> * Mix delete files and data files in manifests? I think we should not, to
> support the two-phase planning approach.
>
> * If delete files and data files are separate, should manifests use the
> same schema? Yes, because it is simpler.
>
>
>
> Let's plan on talking about these questions in tomorrow's sync. And if you
> have other topics, please send them to me!
>
>
>
> rb
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


-- 
Best Regards

Re: [DISCUSS] Changes for row-level deletes

Reply via email to