fwiw i agree with Gautam on the changes. Keeping complexity down and easing transition to V2 should be a goal for this work.
Is there a list of items that need to be finished for V2 schema/row level deletes to be ready? I would love to help but am not sure what is missing/in-progress. Best, Ryan On Thu, May 7, 2020 at 2:52 AM Gautam <gautamkows...@gmail.com> wrote: > My 2 cents : > > > > * Merge manifest_entry and data_file? > > ... -1 .. keeping the difference between v1 and v2 metadata to a > minimum would be my preference by keeping manifest_entries the same way in > both v1 and v2. People using either flows would want to modify and > contribute and shouldn't have to worry about porting things over every > time. > > > * How should planning with delete files work? > > .. +1 on keeping these independent and in two phases , as you mentioned. > Allows processing in parallel. Could make this a SparkAction too at some > point? > > > > * Mix delete files and data files in manifests? I think we should not, > to support the two-phase planning approach. > > -1 .. We should not for the reason you mention. > > > > * If delete files and data files are separate, should manifests use the > same schema? > > +1. > > On Wed, May 6, 2020 at 10:39 AM Anton Okolnychyi > <aokolnyc...@apple.com.invalid> wrote: > >> We won’t have to rewrite V1 metadata when migrating to V2. The format is >> backward compatible and we can read V1 manifests just fine in V2. For >> example, V1 metadata will not have have sequence number and V2 would >> interpret that as sequence number = 0. The only thing we need to prohibit >> is V1 writers writing to V2 tables. That check is already in place and such >> attempts will fail. Recent changes that went in ensure that V1 and V2 >> co-exist in the same codebase. As of now, we have a format version in >> TableMetadata. I think the manual change Ryan was referring to would >> simply mean updating that version flag, not rewriting the metadata. That >> change can be done via TableOperations. >> >> One change that I've been considering is getting rid of manifest_entry. >> In v1, a manifest stored a manifest_entry that wrapped a data_file. The >> intent was to separate data that API users needed to supply -- fields in >> data_file -- from data that was tracked internally by Iceberg -- the >> snapshot_id and status fields of manifest_entry. If we want to combine >> these so that a manifest stores one top-level data_file struct, then now is >> the time to make that change. I've prototyped this in #963 >> <https://github.com/apache/incubator-iceberg/pull/963>. The benefit is >> that the schema is flatter so we wouldn't need two metadata tables (entries >> and files). The main drawback is that we aren't going to stop using v1 >> tables, so we would effectively have two different manifest schemas instead >> of v2 as an evolution of v1. I'd love to hear more opinions on whether to >> do this. I'm leaning toward not merging the two. >> >> >> As mentioned earlier, I’d rather keep ManifestEntry to reduce the number >> of changes we have in V1 and V2. I feel it will be easier for other people >> who want to contribute to the core metadata management to follow it. That >> being said, I do get the intention of merging the two. >> >> Another change is to start adding tracking fields for delete files and >> updating the APIs. The metadata for this is fairly simple: an enum that >> stores whether the file is data, position deletes, or equality deletes. The >> main decision point is whether to allow mixing data files and delete files >> together in manifests. I don't think that we should allow manifests with >> both delete files and data files. The reason is job planning: we want to >> start emitting splits immediately so that we can stream them, instead of >> holding them all in memory. That means we need some way to guarantee that >> we know all of the delete files to apply to a data file before we encounter >> the data file. >> >> >> I don’t see a good reason to mix delete and data files in a single >> manifest now. In our original idea, we wanted to keep deletes separately as >> it felt it would be easier to come up with an efficient job planning >> approach later on. I think once we know the approach we want to take for >> planning input splits and doing compaction, we can revisit this point again. >> >> - Anton >> >> On 6 May 2020, at 09:04, Junjie Chen <chenjunjied...@gmail.com> wrote: >> >> Hi Ryan >> >> Besides the reading and merging of delete files, can we talk a bit about >> write side of delete files? For example, generate delete files in a spark >> action, the metadata column support, the service to transfer equality >> delete files to position delete files etc.. >> >> On Wed, May 6, 2020 at 1:34 PM Miao Wang <miw...@adobe.com.invalid> >> wrote: >> >>> Hi Ryan, >>> >>> >>> >>> “Tables must be manually upgraded to version 2 in order to use any of >>> the metadata changes we are making” If I understand correctly, for exist >>> iceberg table in v1, we have to run some CLI/script to rewrite the >>> metadata. >>> >>> >>> >>> “Next, we've added sequence numbers and the proposed inheritance scheme >>> to v2, along with tests to ensure that v1 is written without sequence >>> numbers and that when reading v1 metadata, the sequence numbers are all 0.” >>> To me, this means V2 reader should be able to read V1 table metadata. >>> Therefore, the step above is not required, which only requires us to use a >>> V2 reader on a V1 table. >>> >>> >>> >>> However, if a table has been written in V1, we want to save it as V2. I >>> expect only metadata data will be rewritten into V2 and V1 metadata will be >>> vacuumed upon V2 success. >>> >>> >>> >>> Is my understanding correct? >>> >>> >>> >>> Thanks! >>> >>> >>> >>> Miao >>> >>> *From: *Ryan Blue <rb...@netflix.com.INVALID> >>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, " >>> rb...@netflix.com" <rb...@netflix.com> >>> *Date: *Tuesday, May 5, 2020 at 5:03 PM >>> *To: *Iceberg Dev List <dev@iceberg.apache.org> >>> *Subject: *[DISCUSS] Changes for row-level deletes >>> >>> >>> >>> Hi, everyone, >>> >>> >>> >>> I know several people that are planning to attend the sync tomorrow are >>> interested in the row-level delete work, so I wanted to share some of the >>> progress and my current thinking ahead of time. >>> >>> >>> >>> The codebase now supports a new version number, 2. Tables must be >>> manually upgraded to version 2 in order to use any of the metadata changes >>> we are making; v1 readers cannot read v2 tables. When a write takes place, >>> the version number is now passed to the manifest writer, manifest list >>> writer, etc. and the right schema for the table's current version is used. >>> We've also frozen the v1 schemas and added wrappers to ensure that even as >>> the internal classes, like DataFile, evolve, the exact same data is written >>> to v1. >>> >>> >>> >>> Next, we've added sequence numbers and the proposed inheritance scheme >>> to v2, along with tests to ensure that v1 is written without sequence >>> numbers and that when reading v1 metadata, the sequence numbers are all 0. >>> This gives us the ability to track "when" a row-level delete occurred in a >>> v2 table. >>> >>> >>> >>> The next steps are to start making larger changes to metadata files. >>> >>> >>> >>> One change that I've been considering is getting rid of manifest_entry. >>> In v1, a manifest stored a manifest_entry that wrapped a data_file. The >>> intent was to separate data that API users needed to supply -- fields in >>> data_file -- from data that was tracked internally by Iceberg -- the >>> snapshot_id and status fields of manifest_entry. If we want to combine >>> these so that a manifest stores one top-level data_file struct, then now is >>> the time to make that change. I've prototyped this in #963 >>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>. >>> The benefit is that the schema is flatter so we wouldn't need two metadata >>> tables (entries and files). The main drawback is that we aren't going to >>> stop using v1 tables, so we would effectively have two different manifest >>> schemas instead of v2 as an evolution of v1. I'd love to hear more opinions >>> on whether to do this. I'm leaning toward not merging the two. >>> >>> >>> >>> Another change is to start adding tracking fields for delete files and >>> updating the APIs. The metadata for this is fairly simple: an enum that >>> stores whether the file is data, position deletes, or equality deletes. The >>> main decision point is whether to allow mixing data files and delete files >>> together in manifests. I don't think that we should allow manifests with >>> both delete files and data files. The reason is job planning: we want to >>> start emitting splits immediately so that we can stream them, instead of >>> holding them all in memory. That means we need some way to guarantee that >>> we know all of the delete files to apply to a data file before we encounter >>> the data file. >>> >>> >>> >>> OpenInx suggested sorting by sequence number to see delete files before >>> data files, but it still requires holding all splits in memory in the worst >>> case due to overlapping sequence number ranges. I think Iceberg should plan >>> a scan in two phases: one to find matching delete files (held in memory) >>> and one to find matching data files. That solves the problem of having all >>> deletes available so a split can be immediately emitted, and also allows >>> parallelizing both phases without coordination across threads. >>> >>> >>> >>> For the two-phase approach, mixing delete files and data files in a >>> manifest would require reading that manifest twice, once in each phase. I >>> think it makes the most sense to keep delete files and data files in >>> separate manifests. But the trade-off is that Iceberg will need to track >>> the content of a manifest (deletes or data) and perform actions on separate >>> manifest groups. >>> >>> >>> >>> Also, because with separate delete and data manifests we _could_ use >>> separate manifest schemas, I went through and wrote out a schema for a >>> delete file manifest. That schema was so similar to the current data file >>> schema that I think it's simpler to use the same one for both. >>> >>> >>> >>> In summary, here are the things that we need to decide and what I think >>> we should do: >>> >>> >>> >>> * Merge manifest_entry and data_file? I think we should not, to >>> avoid additional complexity. >>> >>> * How should planning with delete files work? The two-phase approach is >>> the only one I think is viable. >>> >>> * Mix delete files and data files in manifests? I think we should not, >>> to support the two-phase planning approach. >>> >>> * If delete files and data files are separate, should manifests use the >>> same schema? Yes, because it is simpler. >>> >>> >>> >>> Let's plan on talking about these questions in tomorrow's sync. And if >>> you have other topics, please send them to me! >>> >>> >>> >>> rb >>> >>> >>> >>> -- >>> >>> Ryan Blue >>> >>> Software Engineer >>> >>> Netflix >>> >> >> >> -- >> Best Regards >> >> >>