I opened more issues under the row-level delete milestone <https://github.com/apache/incubator-iceberg/milestone/4>. Hopefully that's more helpful for tracking tasks and for everyone that would like to contribute!
On Thu, May 7, 2020 at 11:52 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > I am going to write down a short doc with the current idea on how to do > job planning based on what we discussed yesterday. In addition, I would > like to cover how minor/major compaction can look like. I want this to be > quite detailed and focus on the implementation so that we can review this > property and catch any issues soon. I think we had a consensus on the > conceptual approach yesterday. > > It would be great to update the milestone for row-level deletes and add > more granular tasks so that we can parallelize the work among the community. > https://github.com/apache/incubator-iceberg/milestone/4 > > - Anton > > On 7 May 2020, at 07:12, Ryan Murray <rym...@dremio.com> wrote: > > fwiw i agree with Gautam on the changes. Keeping complexity down and > easing transition to V2 should be a goal for this work. > > Is there a list of items that need to be finished for V2 schema/row level > deletes to be ready? I would love to help but am not sure what is > missing/in-progress. > > Best, > Ryan > > On Thu, May 7, 2020 at 2:52 AM Gautam <gautamkows...@gmail.com> wrote: > >> My 2 cents : >> >> >> > * Merge manifest_entry and data_file? >> >> ... -1 .. keeping the difference between v1 and v2 metadata to a >> minimum would be my preference by keeping manifest_entries the same way in >> both v1 and v2. People using either flows would want to modify and >> contribute and shouldn't have to worry about porting things over every >> time. >> >> > * How should planning with delete files work? >> >> .. +1 on keeping these independent and in two phases , as you mentioned. >> Allows processing in parallel. Could make this a SparkAction too at some >> point? >> >> >> > * Mix delete files and data files in manifests? I think we should not, >> to support the two-phase planning approach. >> >> -1 .. We should not for the reason you mention. >> >> >> > * If delete files and data files are separate, should manifests use >> the same schema? >> >> +1. >> >> On Wed, May 6, 2020 at 10:39 AM Anton Okolnychyi < >> aokolnyc...@apple.com.invalid> wrote: >> >>> We won’t have to rewrite V1 metadata when migrating to V2. The format is >>> backward compatible and we can read V1 manifests just fine in V2. For >>> example, V1 metadata will not have have sequence number and V2 would >>> interpret that as sequence number = 0. The only thing we need to prohibit >>> is V1 writers writing to V2 tables. That check is already in place and such >>> attempts will fail. Recent changes that went in ensure that V1 and V2 >>> co-exist in the same codebase. As of now, we have a format version in >>> TableMetadata. I think the manual change Ryan was referring to would >>> simply mean updating that version flag, not rewriting the metadata. >>> That change can be done via TableOperations. >>> >>> One change that I've been considering is getting rid of manifest_entry. >>> In v1, a manifest stored a manifest_entry that wrapped a data_file. The >>> intent was to separate data that API users needed to supply -- fields in >>> data_file -- from data that was tracked internally by Iceberg -- the >>> snapshot_id and status fields of manifest_entry. If we want to combine >>> these so that a manifest stores one top-level data_file struct, then now is >>> the time to make that change. I've prototyped this in #963 >>> <https://github.com/apache/incubator-iceberg/pull/963>. The benefit is >>> that the schema is flatter so we wouldn't need two metadata tables (entries >>> and files). The main drawback is that we aren't going to stop using v1 >>> tables, so we would effectively have two different manifest schemas instead >>> of v2 as an evolution of v1. I'd love to hear more opinions on whether to >>> do this. I'm leaning toward not merging the two. >>> >>> >>> As mentioned earlier, I’d rather keep ManifestEntry to reduce the number >>> of changes we have in V1 and V2. I feel it will be easier for other people >>> who want to contribute to the core metadata management to follow it. That >>> being said, I do get the intention of merging the two. >>> >>> Another change is to start adding tracking fields for delete files and >>> updating the APIs. The metadata for this is fairly simple: an enum that >>> stores whether the file is data, position deletes, or equality deletes. The >>> main decision point is whether to allow mixing data files and delete files >>> together in manifests. I don't think that we should allow manifests with >>> both delete files and data files. The reason is job planning: we want to >>> start emitting splits immediately so that we can stream them, instead of >>> holding them all in memory. That means we need some way to guarantee that >>> we know all of the delete files to apply to a data file before we encounter >>> the data file. >>> >>> >>> I don’t see a good reason to mix delete and data files in a single >>> manifest now. In our original idea, we wanted to keep deletes separately as >>> it felt it would be easier to come up with an efficient job planning >>> approach later on. I think once we know the approach we want to take for >>> planning input splits and doing compaction, we can revisit this point again. >>> >>> - Anton >>> >>> On 6 May 2020, at 09:04, Junjie Chen <chenjunjied...@gmail.com> wrote: >>> >>> Hi Ryan >>> >>> Besides the reading and merging of delete files, can we talk a bit about >>> write side of delete files? For example, generate delete files in a spark >>> action, the metadata column support, the service to transfer equality >>> delete files to position delete files etc.. >>> >>> On Wed, May 6, 2020 at 1:34 PM Miao Wang <miw...@adobe.com.invalid> >>> wrote: >>> >>>> Hi Ryan, >>>> >>>> >>>> >>>> “Tables must be manually upgraded to version 2 in order to use any of >>>> the metadata changes we are making” If I understand correctly, for exist >>>> iceberg table in v1, we have to run some CLI/script to rewrite the >>>> metadata. >>>> >>>> >>>> >>>> “Next, we've added sequence numbers and the proposed inheritance scheme >>>> to v2, along with tests to ensure that v1 is written without sequence >>>> numbers and that when reading v1 metadata, the sequence numbers are all 0.” >>>> To me, this means V2 reader should be able to read V1 table metadata. >>>> Therefore, the step above is not required, which only requires us to use a >>>> V2 reader on a V1 table. >>>> >>>> >>>> >>>> However, if a table has been written in V1, we want to save it as V2. I >>>> expect only metadata data will be rewritten into V2 and V1 metadata will be >>>> vacuumed upon V2 success. >>>> >>>> >>>> >>>> Is my understanding correct? >>>> >>>> >>>> >>>> Thanks! >>>> >>>> >>>> >>>> Miao >>>> >>>> *From: *Ryan Blue <rb...@netflix.com.INVALID> >>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, " >>>> rb...@netflix.com" <rb...@netflix.com> >>>> *Date: *Tuesday, May 5, 2020 at 5:03 PM >>>> *To: *Iceberg Dev List <dev@iceberg.apache.org> >>>> *Subject: *[DISCUSS] Changes for row-level deletes >>>> >>>> >>>> >>>> Hi, everyone, >>>> >>>> >>>> >>>> I know several people that are planning to attend the sync tomorrow are >>>> interested in the row-level delete work, so I wanted to share some of the >>>> progress and my current thinking ahead of time. >>>> >>>> >>>> >>>> The codebase now supports a new version number, 2. Tables must be >>>> manually upgraded to version 2 in order to use any of the metadata changes >>>> we are making; v1 readers cannot read v2 tables. When a write takes place, >>>> the version number is now passed to the manifest writer, manifest list >>>> writer, etc. and the right schema for the table's current version is used. >>>> We've also frozen the v1 schemas and added wrappers to ensure that even as >>>> the internal classes, like DataFile, evolve, the exact same data is written >>>> to v1. >>>> >>>> >>>> >>>> Next, we've added sequence numbers and the proposed inheritance scheme >>>> to v2, along with tests to ensure that v1 is written without sequence >>>> numbers and that when reading v1 metadata, the sequence numbers are all 0. >>>> This gives us the ability to track "when" a row-level delete occurred in a >>>> v2 table. >>>> >>>> >>>> >>>> The next steps are to start making larger changes to metadata files. >>>> >>>> >>>> >>>> One change that I've been considering is getting rid of manifest_entry. >>>> In v1, a manifest stored a manifest_entry that wrapped a data_file. The >>>> intent was to separate data that API users needed to supply -- fields in >>>> data_file -- from data that was tracked internally by Iceberg -- the >>>> snapshot_id and status fields of manifest_entry. If we want to combine >>>> these so that a manifest stores one top-level data_file struct, then now is >>>> the time to make that change. I've prototyped this in #963 >>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>. >>>> The benefit is that the schema is flatter so we wouldn't need two metadata >>>> tables (entries and files). The main drawback is that we aren't going to >>>> stop using v1 tables, so we would effectively have two different manifest >>>> schemas instead of v2 as an evolution of v1. I'd love to hear more opinions >>>> on whether to do this. I'm leaning toward not merging the two. >>>> >>>> >>>> >>>> Another change is to start adding tracking fields for delete files and >>>> updating the APIs. The metadata for this is fairly simple: an enum that >>>> stores whether the file is data, position deletes, or equality deletes. The >>>> main decision point is whether to allow mixing data files and delete files >>>> together in manifests. I don't think that we should allow manifests with >>>> both delete files and data files. The reason is job planning: we want to >>>> start emitting splits immediately so that we can stream them, instead of >>>> holding them all in memory. That means we need some way to guarantee that >>>> we know all of the delete files to apply to a data file before we encounter >>>> the data file. >>>> >>>> >>>> >>>> OpenInx suggested sorting by sequence number to see delete files before >>>> data files, but it still requires holding all splits in memory in the worst >>>> case due to overlapping sequence number ranges. I think Iceberg should plan >>>> a scan in two phases: one to find matching delete files (held in memory) >>>> and one to find matching data files. That solves the problem of having all >>>> deletes available so a split can be immediately emitted, and also allows >>>> parallelizing both phases without coordination across threads. >>>> >>>> >>>> >>>> For the two-phase approach, mixing delete files and data files in a >>>> manifest would require reading that manifest twice, once in each phase. I >>>> think it makes the most sense to keep delete files and data files in >>>> separate manifests. But the trade-off is that Iceberg will need to track >>>> the content of a manifest (deletes or data) and perform actions on separate >>>> manifest groups. >>>> >>>> >>>> >>>> Also, because with separate delete and data manifests we _could_ use >>>> separate manifest schemas, I went through and wrote out a schema for a >>>> delete file manifest. That schema was so similar to the current data file >>>> schema that I think it's simpler to use the same one for both. >>>> >>>> >>>> >>>> In summary, here are the things that we need to decide and what I think >>>> we should do: >>>> >>>> >>>> >>>> * Merge manifest_entry and data_file? I think we should not, to >>>> avoid additional complexity. >>>> >>>> * How should planning with delete files work? The two-phase approach is >>>> the only one I think is viable. >>>> >>>> * Mix delete files and data files in manifests? I think we should not, >>>> to support the two-phase planning approach. >>>> >>>> * If delete files and data files are separate, should manifests use the >>>> same schema? Yes, because it is simpler. >>>> >>>> >>>> >>>> Let's plan on talking about these questions in tomorrow's sync. And if >>>> you have other topics, please send them to me! >>>> >>>> >>>> >>>> rb >>>> >>>> >>>> >>>> -- >>>> >>>> Ryan Blue >>>> >>>> Software Engineer >>>> >>>> Netflix >>>> >>> >>> >>> -- >>> Best Regards >>> >>> >>> > -- Ryan Blue Software Engineer Netflix