Upserts in Iceberg

Anton Okolnychyi Wed, 03 Jul 2019 11:27:57 -0700
Works for me too.

> On 3 Jul 2019, at 19:09, Erik Wright <erik.wri...@shopify.com.INVALID> wrote:
> 
> That works for me.
> 
> On Wed, Jul 3, 2019 at 2:01 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
> How about 9AM PDT on Friday, 5 July then?
> 
> On Wed, Jul 3, 2019 at 10:55 AM Owen O'Malley <owen.omal...@gmail.com 
> <mailto:owen.omal...@gmail.com>> wrote:
> I'd like to call in, but I'm out Thursday. Friday would work except 11am to 
> 1pm pdt.
> 
> .. Owen
> 
> On Wed, Jul 3, 2019 at 10:42 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> I'm available Thursday and Friday this week as well, but it's a holiday in 
> the US so some people may be out. If there are no objections from anyone that 
> would like to attend, then I'm up for that.
> 
> On Wed, Jul 3, 2019 at 10:40 AM Anton Okolnychyi <aokolnyc...@apple.com 
> <mailto:aokolnyc...@apple.com>> wrote:
> I apologize for the delay on my side. I’ll still have to go through the last 
> emails. I am available on Thursday/Friday this week and would be great to 
> sync.
> 
> Thanks,
> Anton
> 
>> On 3 Jul 2019, at 01:29, Ryan Blue <rb...@netflix.com.INVALID 
>> <mailto:rb...@netflix.com.INVALID>> wrote:
>> 
>> Sorry I didn't get back to this thread last week. Let's try to have a video 
>> call to sync up on this next week. What days would work for everyone?
>> 
>> rb
>> 
>> On Fri, Jun 21, 2019 at 9:06 AM Erik Wright <erik.wri...@shopify.com 
>> <mailto:erik.wri...@shopify.com>> wrote:
>> With regards to operation values. Currently they are:
>> append: data files were added and no files were removed.
>> replace: data files were rewritten with the same data; i.e., compaction, 
>> changing the data file format, or relocating data files.
>> overwrite: data files were deleted and added in a logical overwrite 
>> operation.
>> delete: data files were removed and their contents logically deleted.
>> If deletion files (with or without data files) are appended to the dataset, 
>> will we consider that an `append` operation? If so, if deletion and/or data 
>> files are appended, and whole files are also deleted, will we consider that 
>> an `overwrite`?
>> 
>> Given that the only apparent purpose of the operation field is to optimize 
>> snapshot expiration the above seems to meet its needs. An incremental reader 
>> can also skip `replace` snapshots but no others. Once it decides to read a 
>> snapshot I don't think there's any difference in how it processes the data 
>> for append/overwrite/delete cases.
>> 
>> On Thu, Jun 20, 2019 at 8:55 PM Ryan Blue <rb...@netflix.com 
>> <mailto:rb...@netflix.com>> wrote:
>> I don’t see that we need [sequence numbers] for file/offset-deletes, since 
>> they apply to a specific file. They’re not harmful, but the don’t seem 
>> relevant.
>> 
>> These delete files will probably contain a path and an offset and could 
>> contain deletes for multiple files. In that case, the sequence number can be 
>> used to eliminate delete files that don’t need to be applied to a particular 
>> data file, just like the column equality deletes. Likewise, it can be used 
>> to drop the delete files when there are no data files with an older sequence 
>> number.
>> 
>> I don’t understand the purpose of the min sequence number, nor what the “min 
>> data seq” is.
>> 
>> Min sequence number would be used for pruning delete files without reading 
>> all the manifests to find out if there are old data files. If no manifest 
>> with data for a partition contains a file older than some sequence number N, 
>> then any delete file with a sequence number < N can be removed.
>> 
>> OK, so the minimum sequence number is an attribute of manifest files. Sounds 
>> good. It can likely permit us to optimize compaction operations as well 
>> (i.e., you can easily limit the operation to a subset of manifest files as 
>> long as they are the oldest ones).
>>  
>> The “min data seq” is the minimum sequence number of a data file. That seems 
>> like what we actually want for the pruning I described above.
>> 
>> I would expect a data file (appended rows or deletions by column value) to 
>> have a single sequence number that applies to the whole file. Even a 
>> delete-by-file-and-offset file can do with only a single sequence number 
>> (which must be larger than the sequence numbers of all deleted files). Why 
>> do we need a "minimum" data sequence per file?
>> Off the top of my head [supporting non-key delete] requires adding 
>> additional information to the manifest file, indicating the columns that are 
>> used for the deletion. Only equality would be supported; if multiple columns 
>> were used, they would be combined with boolean-and. I don’t see anything too 
>> tricky about it.
>> 
>> Yes, exactly. I actually phrased it wrong initially. I think it would be 
>> simple to extend the equality deletes to do this. We just need a way to have 
>> global scope, not just partition scope.
>> 
>> I don't think anything special needs to be done with regards to 
>> scoping/partitioning of delete files. When scanning one or more data files, 
>> one must also consider any and all deletion files that could apply to them. 
>> The only way to prune deletion files from consideration is:
>> All of your data files have at least one partition column in common.
>> The deletion file is also partitioned on that column (at least).
>> The value sets of the data files do not overlap the value sets of the 
>> deletion files in that column.
>>  So given a dataset of sessions that is partitioned by device form factor 
>> and date, for example, you could have a delete (user_id=9876) in a deletion 
>> file that is not partitioned. And it would be "in scope" for all of those 
>> data files.
>> 
>> If you had the same dataset partitioned by hash(user_id) and your deletes 
>> were _also_ partitioned by hash(user_id) you would be able to prune those 
>> deletes while scanning the sessions.
>> If we add this on a per-deletion file basis it is not clear if there is any 
>> relevance in preserving the concept of a unique row ID.
>> 
>> Agreed. That’s why I’ve been steering us away from the debate about whether 
>> keys are unique or not. Either way, a natural key delete must delete all of 
>> the records it matches.
>> 
>> I would assume that the maximum sequence number should appear in the table 
>> metadata
>> 
>> Agreed.
>> 
>> [W]ould you make it optional to assign a sequence number to a snapshot? 
>> “Replace” snapshots would not need one.
>> 
>> The only requirement is that it is monotonically increasing. If one isn’t 
>> used, we don’t have to increment. I’d say it is up to the implementation to 
>> decide. I would probably increment it every time to avoid errors.
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
>> 
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
Re: Updates/Deletes/Upserts in Iceberg

Reply via email to