Hi Ryan Besides the reading and merging of delete files, can we talk a bit about write side of delete files? For example, generate delete files in a spark action, the metadata column support, the service to transfer equality delete files to position delete files etc..
On Wed, May 6, 2020 at 1:34 PM Miao Wang <miw...@adobe.com.invalid> wrote: > Hi Ryan, > > > > “Tables must be manually upgraded to version 2 in order to use any of the > metadata changes we are making” If I understand correctly, for exist > iceberg table in v1, we have to run some CLI/script to rewrite the > metadata. > > > > “Next, we've added sequence numbers and the proposed inheritance scheme to > v2, along with tests to ensure that v1 is written without sequence numbers > and that when reading v1 metadata, the sequence numbers are all 0.” To me, > this means V2 reader should be able to read V1 table metadata. Therefore, > the step above is not required, which only requires us to use a V2 reader > on a V1 table. > > > > However, if a table has been written in V1, we want to save it as V2. I > expect only metadata data will be rewritten into V2 and V1 metadata will be > vacuumed upon V2 success. > > > > Is my understanding correct? > > > > Thanks! > > > > Miao > > *From: *Ryan Blue <rb...@netflix.com.INVALID> > *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, " > rb...@netflix.com" <rb...@netflix.com> > *Date: *Tuesday, May 5, 2020 at 5:03 PM > *To: *Iceberg Dev List <dev@iceberg.apache.org> > *Subject: *[DISCUSS] Changes for row-level deletes > > > > Hi, everyone, > > > > I know several people that are planning to attend the sync tomorrow are > interested in the row-level delete work, so I wanted to share some of the > progress and my current thinking ahead of time. > > > > The codebase now supports a new version number, 2. Tables must be manually > upgraded to version 2 in order to use any of the metadata changes we are > making; v1 readers cannot read v2 tables. When a write takes place, the > version number is now passed to the manifest writer, manifest list writer, > etc. and the right schema for the table's current version is used. We've > also frozen the v1 schemas and added wrappers to ensure that even as the > internal classes, like DataFile, evolve, the exact same data is written to > v1. > > > > Next, we've added sequence numbers and the proposed inheritance scheme to > v2, along with tests to ensure that v1 is written without sequence numbers > and that when reading v1 metadata, the sequence numbers are all 0. This > gives us the ability to track "when" a row-level delete occurred in a v2 > table. > > > > The next steps are to start making larger changes to metadata files. > > > > One change that I've been considering is getting rid of manifest_entry. In > v1, a manifest stored a manifest_entry that wrapped a data_file. The intent > was to separate data that API users needed to supply -- fields in data_file > -- from data that was tracked internally by Iceberg -- the snapshot_id and > status fields of manifest_entry. If we want to combine these so that a > manifest stores one top-level data_file struct, then now is the time to > make that change. I've prototyped this in #963 > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>. > The benefit is that the schema is flatter so we wouldn't need two metadata > tables (entries and files). The main drawback is that we aren't going to > stop using v1 tables, so we would effectively have two different manifest > schemas instead of v2 as an evolution of v1. I'd love to hear more opinions > on whether to do this. I'm leaning toward not merging the two. > > > > Another change is to start adding tracking fields for delete files and > updating the APIs. The metadata for this is fairly simple: an enum that > stores whether the file is data, position deletes, or equality deletes. The > main decision point is whether to allow mixing data files and delete files > together in manifests. I don't think that we should allow manifests with > both delete files and data files. The reason is job planning: we want to > start emitting splits immediately so that we can stream them, instead of > holding them all in memory. That means we need some way to guarantee that > we know all of the delete files to apply to a data file before we encounter > the data file. > > > > OpenInx suggested sorting by sequence number to see delete files before > data files, but it still requires holding all splits in memory in the worst > case due to overlapping sequence number ranges. I think Iceberg should plan > a scan in two phases: one to find matching delete files (held in memory) > and one to find matching data files. That solves the problem of having all > deletes available so a split can be immediately emitted, and also allows > parallelizing both phases without coordination across threads. > > > > For the two-phase approach, mixing delete files and data files in a > manifest would require reading that manifest twice, once in each phase. I > think it makes the most sense to keep delete files and data files in > separate manifests. But the trade-off is that Iceberg will need to track > the content of a manifest (deletes or data) and perform actions on separate > manifest groups. > > > > Also, because with separate delete and data manifests we _could_ use > separate manifest schemas, I went through and wrote out a schema for a > delete file manifest. That schema was so similar to the current data file > schema that I think it's simpler to use the same one for both. > > > > In summary, here are the things that we need to decide and what I think we > should do: > > > > * Merge manifest_entry and data_file? I think we should not, to > avoid additional complexity. > > * How should planning with delete files work? The two-phase approach is > the only one I think is viable. > > * Mix delete files and data files in manifests? I think we should not, to > support the two-phase planning approach. > > * If delete files and data files are separate, should manifests use the > same schema? Yes, because it is simpler. > > > > Let's plan on talking about these questions in tomorrow's sync. And if you > have other topics, please send them to me! > > > > rb > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Best Regards