Upserts in Iceberg

Erik Wright Wed, 03 Jul 2019 11:10:18 -0700

That works for me.

On Wed, Jul 3, 2019 at 2:01 PM Ryan Blue <rb...@netflix.com.invalid> wrote:


> How about 9AM PDT on Friday, 5 July then?
>
> On Wed, Jul 3, 2019 at 10:55 AM Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> I'd like to call in, but I'm out Thursday. Friday would work except 11am
>> to 1pm pdt.
>>
>> .. Owen
>>
>> On Wed, Jul 3, 2019 at 10:42 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> I'm available Thursday and Friday this week as well, but it's a holiday
>>> in the US so some people may be out. If there are no objections from anyone
>>> that would like to attend, then I'm up for that.
>>>
>>> On Wed, Jul 3, 2019 at 10:40 AM Anton Okolnychyi <aokolnyc...@apple.com>
>>> wrote:
>>>
>>>> I apologize for the delay on my side. I’ll still have to go through the
>>>> last emails. I am available on Thursday/Friday this week and would be great
>>>> to sync.
>>>>
>>>> Thanks,
>>>> Anton
>>>>
>>>> On 3 Jul 2019, at 01:29, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>>>
>>>> Sorry I didn't get back to this thread last week. Let's try to have a
>>>> video call to sync up on this next week. What days would work for everyone?
>>>>
>>>> rb
>>>>
>>>> On Fri, Jun 21, 2019 at 9:06 AM Erik Wright <erik.wri...@shopify.com>
>>>> wrote:
>>>>
>>>>> With regards to operation values. Currently they are:
>>>>>
>>>>>    - append: data files were added and no files were removed.
>>>>>    - replace: data files were rewritten with the same data; i.e.,
>>>>>    compaction, changing the data file format, or relocating data files.
>>>>>    - overwrite: data files were deleted and added in a logical
>>>>>    overwrite operation.
>>>>>    - delete: data files were removed and their contents logically
>>>>>    deleted.
>>>>>
>>>>> If deletion files (with or without data files) are appended to the
>>>>> dataset, will we consider that an `append` operation? If so, if deletion
>>>>> and/or data files are appended, and whole files are also deleted, will we
>>>>> consider that an `overwrite`?
>>>>>
>>>>> Given that the only apparent purpose of the operation field is to
>>>>> optimize snapshot expiration the above seems to meet its needs. An
>>>>> incremental reader can also skip `replace` snapshots but no others. Once 
>>>>> it
>>>>> decides to read a snapshot I don't think there's any difference in how it
>>>>> processes the data for append/overwrite/delete cases.
>>>>>
>>>>> On Thu, Jun 20, 2019 at 8:55 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>
>>>>>> I don’t see that we need [sequence numbers] for file/offset-deletes,
>>>>>> since they apply to a specific file. They’re not harmful, but the don’t
>>>>>> seem relevant.
>>>>>>
>>>>>> These delete files will probably contain a path and an offset and
>>>>>> could contain deletes for multiple files. In that case, the sequence 
>>>>>> number
>>>>>> can be used to eliminate delete files that don’t need to be applied to a
>>>>>> particular data file, just like the column equality deletes. Likewise, it
>>>>>> can be used to drop the delete files when there are no data files with an
>>>>>> older sequence number.
>>>>>>
>>>>>> I don’t understand the purpose of the min sequence number, nor what
>>>>>> the “min data seq” is.
>>>>>>
>>>>>> Min sequence number would be used for pruning delete files without
>>>>>> reading all the manifests to find out if there are old data files. If no
>>>>>> manifest with data for a partition contains a file older than some 
>>>>>> sequence
>>>>>> number N, then any delete file with a sequence number < N can be removed.
>>>>>>
>>>>> OK, so the minimum sequence number is an attribute of manifest files.
>>>>> Sounds good. It can likely permit us to optimize compaction operations as
>>>>> well (i.e., you can easily limit the operation to a subset of manifest
>>>>> files as long as they are the oldest ones).
>>>>>
>>>>>
>>>>>> The “min data seq” is the minimum sequence number of a data file.
>>>>>> That seems like what we actually want for the pruning I described above.
>>>>>>
>>>>> I would expect a data file (appended rows or deletions by column
>>>>> value) to have a single sequence number that applies to the whole file.
>>>>> Even a delete-by-file-and-offset file can do with only a single sequence
>>>>> number (which must be larger than the sequence numbers of all deleted
>>>>> files). Why do we need a "minimum" data sequence per file?
>>>>>
>>>>>> Off the top of my head [supporting non-key delete] requires adding
>>>>>> additional information to the manifest file, indicating the columns that
>>>>>> are used for the deletion. Only equality would be supported; if multiple
>>>>>> columns were used, they would be combined with boolean-and. I don’t see
>>>>>> anything too tricky about it.
>>>>>>
>>>>>> Yes, exactly. I actually phrased it wrong initially. I think it would
>>>>>> be simple to extend the equality deletes to do this. We just need a way 
>>>>>> to
>>>>>> have global scope, not just partition scope.
>>>>>>
>>>>> I don't think anything special needs to be done with regards to
>>>>> scoping/partitioning of delete files. When scanning one or more data 
>>>>> files,
>>>>> one must also consider any and all deletion files that could apply to 
>>>>> them.
>>>>> The only way to prune deletion files from consideration is:
>>>>>
>>>>>    1. All of your data files have at least one partition column in
>>>>>    common.
>>>>>    2. The deletion file is also partitioned on that column (at least).
>>>>>    3. The value sets of the data files do not overlap the value sets
>>>>>    of the deletion files in that column.
>>>>>
>>>>>  So given a dataset of sessions that is partitioned by device form
>>>>> factor and date, for example, you could have a delete (user_id=9876) in a
>>>>> deletion file that is not partitioned. And it would be "in scope" for all
>>>>> of those data files.
>>>>>
>>>>> If you had the same dataset partitioned by hash(user_id) and your
>>>>> deletes were _also_ partitioned by hash(user_id) you would be able to 
>>>>> prune
>>>>> those deletes while scanning the sessions.
>>>>>
>>>>>> If we add this on a per-deletion file basis it is not clear if there
>>>>>> is any relevance in preserving the concept of a unique row ID.
>>>>>>
>>>>>> Agreed. That’s why I’ve been steering us away from the debate about
>>>>>> whether keys are unique or not. Either way, a natural key delete must
>>>>>> delete all of the records it matches.
>>>>>>
>>>>>> I would assume that the maximum sequence number should appear in the
>>>>>> table metadata
>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>> [W]ould you make it optional to assign a sequence number to a
>>>>>> snapshot? “Replace” snapshots would not need one.
>>>>>>
>>>>>> The only requirement is that it is monotonically increasing. If one
>>>>>> isn’t used, we don’t have to increment. I’d say it is up to the
>>>>>> implementation to decide. I would probably increment it every time to 
>>>>>> avoid
>>>>>> errors.
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Updates/Deletes/Upserts in Iceberg

Reply via email to