Re: [DISCUSS] Changes for row-level deletes

Anton Okolnychyi Thu, 07 May 2020 11:53:09 -0700

I am going to write down a short doc with the current idea on how to do job 
planning based on what we discussed yesterday. In addition, I would like to 
cover how minor/major compaction can look like. I want this to be quite 
detailed and focus on the implementation so that we can review this property 
and catch any issues soon. I think we had a consensus on the conceptual 
approach yesterday.


It would be great to update the milestone for row-level deletes and add more 
granular tasks so that we can parallelize the work among the community.
https://github.com/apache/incubator-iceberg/milestone/4 
<https://github.com/apache/incubator-iceberg/milestone/4>

- Anton

> On 7 May 2020, at 07:12, Ryan Murray <rym...@dremio.com> wrote:
> 
> fwiw i agree with Gautam on the changes. Keeping complexity down and easing 
> transition to V2 should be a goal for this work.
> 
> Is there a list of items that need to be finished for V2 schema/row level 
> deletes to be ready? I would love to help but am not sure what is 
> missing/in-progress.
> 
> Best,
> Ryan
> 
> On Thu, May 7, 2020 at 2:52 AM Gautam <gautamkows...@gmail.com 
> <mailto:gautamkows...@gmail.com>> wrote:
> My 2 cents : 
> 
> 
> >  * Merge manifest_entry and data_file? 
> 
>  ...   -1  ..   keeping the difference between v1 and v2 metadata to a 
> minimum would be my preference by keeping manifest_entries the same way in 
> both v1 and v2. People using either flows would want to modify and contribute 
> and shouldn't have to worry about porting  things over every time.
> 
> >  * How should planning with delete files work? 
> 
>  .. +1 on keeping these independent and in two phases , as you mentioned. 
> Allows processing in parallel. Could make this a SparkAction too at some 
> point?
> 
> 
> >  * Mix delete files and data files in manifests? I think we should not, to 
> > support the two-phase planning approach.
> 
>   -1  .. We should not for the reason you mention.
> 
> 
> >  * If delete files and data files are separate, should manifests use the 
> > same schema? 
> 
> +1. 
> 
> On Wed, May 6, 2020 at 10:39 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> We won’t have to rewrite V1 metadata when migrating to V2. The format is 
> backward compatible and we can read V1 manifests just fine in V2. For 
> example, V1 metadata will not have have sequence number and V2 would 
> interpret that as sequence number = 0. The only thing we need to prohibit is 
> V1 writers writing to V2 tables. That check is already in place and such 
> attempts will fail. Recent changes that went in ensure that V1 and V2 
> co-exist in the same codebase. As of now, we have a format version in 
> TableMetadata. I think the manual change Ryan was referring to would simply 
> mean updating that version flag, not rewriting the metadata. That change can 
> be done via TableOperations.
> 
>> One change that I've been considering is getting rid of manifest_entry. In 
>> v1, a manifest stored a manifest_entry that wrapped a data_file. The intent 
>> was to separate data that API users needed to supply -- fields in data_file 
>> -- from data that was tracked internally by Iceberg -- the snapshot_id and 
>> status fields of manifest_entry. If we want to combine these so that a 
>> manifest stores one top-level data_file struct, then now is the time to make 
>> that change. I've prototyped this in #963 
>> <https://github.com/apache/incubator-iceberg/pull/963>. The benefit is that 
>> the schema is flatter so we wouldn't need two metadata tables (entries and 
>> files). The main drawback is that we aren't going to stop using v1 tables, 
>> so we would effectively have two different manifest schemas instead of v2 as 
>> an evolution of v1. I'd love to hear more opinions on whether to do this. 
>> I'm leaning toward not merging the two.
> 
> 
> As mentioned earlier, I’d rather keep ManifestEntry to reduce the number of 
> changes we have in V1 and V2. I feel it will be easier for other people who 
> want to contribute to the core metadata management to follow it. That being 
> said, I do get the intention of merging the two.
> 
>> Another change is to start adding tracking fields for delete files and 
>> updating the APIs. The metadata for this is fairly simple: an enum that 
>> stores whether the file is data, position deletes, or equality deletes. The 
>> main decision point is whether to allow mixing data files and delete files 
>> together in manifests. I don't think that we should allow manifests with 
>> both delete files and data files. The reason is job planning: we want to 
>> start emitting splits immediately so that we can stream them, instead of 
>> holding them all in memory. That means we need some way to guarantee that we 
>> know all of the delete files to apply to a data file before we encounter the 
>> data file.
> 
> I don’t see a good reason to mix delete and data files in a single manifest 
> now. In our original idea, we wanted to keep deletes separately as it felt it 
> would be easier to come up with an efficient job planning approach later on. 
> I think once we know the approach we want to take for planning input splits 
> and doing compaction, we can revisit this point again.
> 
> - Anton
> 
>> On 6 May 2020, at 09:04, Junjie Chen <chenjunjied...@gmail.com 
>> <mailto:chenjunjied...@gmail.com>> wrote:
>> 
>> Hi Ryan
>> 
>> Besides the reading and merging of delete files, can we talk a bit about 
>> write side of delete files? For example, generate delete files in a spark 
>> action, the metadata column support, the service to transfer equality delete 
>> files to position delete files etc..
>> 
>> On Wed, May 6, 2020 at 1:34 PM Miao Wang <miw...@adobe.com.invalid 
>> <mailto:miw...@adobe.com.invalid>> wrote:
>> Hi Ryan,
>> 
>>  
>> 
>> “Tables must be manually upgraded to version 2 in order to use any of the 
>> metadata changes we are making” If I understand correctly, for exist iceberg 
>> table in v1, we have to run some CLI/script to rewrite the metadata.
>> 
>>  
>> 
>> “Next, we've added sequence numbers and the proposed inheritance scheme to 
>> v2, along with tests to ensure that v1 is written without sequence numbers 
>> and that when reading v1 metadata, the sequence numbers are all 0.” To me, 
>> this means V2 reader should be able to read V1 table metadata. Therefore, 
>> the step above is not required, which only requires us to use a V2 reader on 
>> a V1 table.
>> 
>>  
>> 
>> However, if a table has been written in V1, we want to save it as V2. I 
>> expect only metadata data will be rewritten into V2 and V1 metadata will be 
>> vacuumed upon V2 success.
>> 
>>  
>> 
>> Is my understanding correct?
>> 
>>  
>> 
>> Thanks!
>> 
>>  
>> 
>> Miao
>> 
>> From: Ryan Blue <rb...@netflix.com.INVALID 
>> <mailto:rb...@netflix.com.INVALID>>
>> Reply-To: "dev@iceberg.apache.org <mailto:dev@iceberg.apache.org>" 
>> <dev@iceberg.apache.org <mailto:dev@iceberg.apache.org>>, "rb...@netflix.com 
>> <mailto:rb...@netflix.com>" <rb...@netflix.com <mailto:rb...@netflix.com>>
>> Date: Tuesday, May 5, 2020 at 5:03 PM
>> To: Iceberg Dev List <dev@iceberg.apache.org <mailto:dev@iceberg.apache.org>>
>> Subject: [DISCUSS] Changes for row-level deletes
>> 
>>  
>> 
>> Hi, everyone,
>> 
>>  
>> 
>> I know several people that are planning to attend the sync tomorrow are 
>> interested in the row-level delete work, so I wanted to share some of the 
>> progress and my current thinking ahead of time.
>> 
>>  
>> 
>> The codebase now supports a new version number, 2. Tables must be manually 
>> upgraded to version 2 in order to use any of the metadata changes we are 
>> making; v1 readers cannot read v2 tables. When a write takes place, the 
>> version number is now passed to the manifest writer, manifest list writer, 
>> etc. and the right schema for the table's current version is used. We've 
>> also frozen the v1 schemas and added wrappers to ensure that even as the 
>> internal classes, like DataFile, evolve, the exact same data is written to 
>> v1.
>> 
>>  
>> 
>> Next, we've added sequence numbers and the proposed inheritance scheme to 
>> v2, along with tests to ensure that v1 is written without sequence numbers 
>> and that when reading v1 metadata, the sequence numbers are all 0. This 
>> gives us the ability to track "when" a row-level delete occurred in a v2 
>> table.
>> 
>>  
>> 
>> The next steps are to start making larger changes to metadata files.
>> 
>>  
>> 
>> One change that I've been considering is getting rid of manifest_entry. In 
>> v1, a manifest stored a manifest_entry that wrapped a data_file. The intent 
>> was to separate data that API users needed to supply -- fields in data_file 
>> -- from data that was tracked internally by Iceberg -- the snapshot_id and 
>> status fields of manifest_entry. If we want to combine these so that a 
>> manifest stores one top-level data_file struct, then now is the time to make 
>> that change. I've prototyped this in #963 
>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>.
>>  The benefit is that the schema is flatter so we wouldn't need two metadata 
>> tables (entries and files). The main drawback is that we aren't going to 
>> stop using v1 tables, so we would effectively have two different manifest 
>> schemas instead of v2 as an evolution of v1. I'd love to hear more opinions 
>> on whether to do this. I'm leaning toward not merging the two.
>> 
>>  
>> 
>> Another change is to start adding tracking fields for delete files and 
>> updating the APIs. The metadata for this is fairly simple: an enum that 
>> stores whether the file is data, position deletes, or equality deletes. The 
>> main decision point is whether to allow mixing data files and delete files 
>> together in manifests. I don't think that we should allow manifests with 
>> both delete files and data files. The reason is job planning: we want to 
>> start emitting splits immediately so that we can stream them, instead of 
>> holding them all in memory. That means we need some way to guarantee that we 
>> know all of the delete files to apply to a data file before we encounter the 
>> data file.
>> 
>>  
>> 
>> OpenInx suggested sorting by sequence number to see delete files before data 
>> files, but it still requires holding all splits in memory in the worst case 
>> due to overlapping sequence number ranges. I think Iceberg should plan a 
>> scan in two phases: one to find matching delete files (held in memory) and 
>> one to find matching data files. That solves the problem of having all 
>> deletes available so a split can be immediately emitted, and also allows 
>> parallelizing both phases without coordination across threads.
>> 
>>  
>> 
>> For the two-phase approach, mixing delete files and data files in a manifest 
>> would require reading that manifest twice, once in each phase. I think it 
>> makes the most sense to keep delete files and data files in separate 
>> manifests. But the trade-off is that Iceberg will need to track the content 
>> of a manifest (deletes or data) and perform actions on separate manifest 
>> groups.
>> 
>>  
>> 
>> Also, because with separate delete and data manifests we _could_ use 
>> separate manifest schemas, I went through and wrote out a schema for a 
>> delete file manifest. That schema was so similar to the current data file 
>> schema that I think it's simpler to use the same one for both.
>> 
>>  
>> 
>> In summary, here are the things that we need to decide and what I think we 
>> should do:
>> 
>>  
>> 
>> * Merge manifest_entry and data_file? I think we should not, to avoid 
>> additional complexity.
>> 
>> * How should planning with delete files work? The two-phase approach is the 
>> only one I think is viable.
>> 
>> * Mix delete files and data files in manifests? I think we should not, to 
>> support the two-phase planning approach.
>> 
>> * If delete files and data files are separate, should manifests use the same 
>> schema? Yes, because it is simpler.
>> 
>>  
>> 
>> Let's plan on talking about these questions in tomorrow's sync. And if you 
>> have other topics, please send them to me!
>> 
>>  
>> 
>> rb
>> 
>>  
>> 
>> --
>> 
>> Ryan Blue
>> 
>> Software Engineer
>> 
>> Netflix
>> 
>> 
>> 
>> -- 
>> Best Regards
>

Re: [DISCUSS] Changes for row-level deletes

Reply via email to