Re: Sequence number for ContentFiles

Gabor Kaszab Fri, 28 Apr 2023 01:43:52 -0700

Good to hear that the community is open for this! Thanks everyone for the
feedback, and thanks Anton for creating a ticket!


Cheers,
Gabor


On Thu, Apr 27, 2023 at 6:02 PM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> I created #7449 and added it to our 1.3.0 milestone to make sure it gets
> delivered.
>
> - Anton
>
> On Apr 26, 2023, at 1:59 PM, Steven Wu <stevenz...@gmail.com> wrote:
>
> > Will a clock skew cause any issues w.r.t. relying on the snapshot commit
> time? I think we allow a mismatch up to a minute in TableMetadata.
>
> This is probably not a problem. typically the max allowed misalignment is
> much longer than 1 minute.
>
> > We also planned to expose file sequence number (different from data
> sequence number). I believe you could lookup snapshot using that info.
>
> That would work too.
>
>
> On Wed, Apr 26, 2023 at 1:10 PM Jack Ye <yezhao...@gmail.com> wrote:
>
>> +1 for using file sequence number. This work has been discussed for a
>> long time but never got picked up, would be great if someone can drive it
>> to completion.
>>
>> -Jack
>>
>> On Wed, Apr 26, 2023 at 12:03 PM Anton Okolnychyi <
>> aokolnyc...@apple.com.invalid> wrote:
>>
>>> Will a clock skew cause any issues w.r.t. relying on the snapshot commit
>>> time? I think we allow a mismatch up to a minute in TableMetadata.
>>>
>>> We also planned to expose file sequence number (different from data
>>> sequence number). I believe you could lookup snapshot using that info.
>>>
>>> https://iceberg.apache.org/spec/#manifest-entry-fields
>>>
>>> - Anton
>>>
>>> On Apr 26, 2023, at 11:52 AM, Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>> piggyback on this thread since we are discussing exposing more metadata
>>> in ContentFile or FileScanTask. Flink source watermark alignment can
>>> potentially leverage the snapshot timestamp (when data files are
>>> committed/appended to the table).  Is it reasonable to expose some snapshot
>>> metadata in the FileScanTask?
>>>
>>> This can help the Flink job to ensure two (or more) sources are
>>> proceeding at similar paces. Some use cases may require column stats
>>> (min-max values) for the watermark alignment. Some use cases can leverage
>>> snapshot timestamps for the alignment purpose.
>>>
>>> On Wed, Apr 26, 2023 at 11:15 AM Anton Okolnychyi <
>>> aokolnyc...@apple.com.invalid> wrote:
>>>
>>>> My initial thinking is that exposing sequence numbers on ContentFile is
>>>> preferable (we would get it for free in scan tasks). That said, I’ll need
>>>> to see how complicated the implementation would be. Exposing it on
>>>> ContentScanTask is a viable alternative. However, we already have a
>>>> precedent for assigning specId in InheritableMetadata.
>>>>
>>>> - Anton
>>>>
>>>> On Apr 26, 2023, at 10:41 AM, Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>> Exposing sequence number makes sense for use cases like this. I also
>>>> like the idea of exposing it through FileScanTask. That might be easier
>>>> than trying to add it to ContentFile.
>>>>
>>>> Anton, what do you think about adding it to FileScanTask?
>>>>
>>>> On Wed, Apr 26, 2023 at 7:50 AM Anton Okolnychyi <
>>>> aokolnyc...@apple.com.invalid> wrote:
>>>>
>>>>> It is actually my bad not following up on that after #5913 and #6002.
>>>>> I’ll take a look at #5760 referenced below by the end of this week.
>>>>>
>>>>> The plan was to expose sequence numbers on ContentFile. It is needed
>>>>> in a number of use cases.
>>>>>
>>>>> - Anton
>>>>>
>>>>> On Apr 26, 2023, at 4:55 AM, Gabor Kaszab <gaborkas...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Hey Iceberg Community,
>>>>>
>>>>> I know there has been a discussion previously about exposing the
>>>>> sequence number on a ContentFile level, but if I'm not mistaken that
>>>>> conversation didn't end with a consensus. I found some relevant PRs that
>>>>> has been open for a while:
>>>>> https://github.com/apache/iceberg/pull/5760
>>>>> https://github.com/apache/iceberg/pull/4769 (merged into the above PR)
>>>>>
>>>>> The reason I bring this topic up is that we started investigating
>>>>> recently how to add read support for equality deletes to Impala.
>>>>> Apparently, implementation-wise we could save a lot of hassle if sequence
>>>>> numbers were exposed on a file level through the API, preferably somewhere
>>>>> around calling planFiles(). We could then have a virtual 'SEQUENCE_NUMBER'
>>>>> when scanning the data and delete files (separate scanners) and could
>>>>> easily filter the rows in the JOIN node that joins the rows from the data
>>>>> files with the ones from the delete files. (wouldn't go into more depth 
>>>>> atm)
>>>>>
>>>>> With this mail I'd like to revive this conversation with the hope of
>>>>> eventually coming to a solution that satisfies all participants. I've been
>>>>> thinking of implementation choices we have to somehow provide sequence
>>>>> numbers for the files:
>>>>> - Extending ContentFile with sequence number: I checked the above PRs
>>>>> and IIUC the issue with this approach is that ContentFile is meant to be
>>>>> immutable and by the time they are created we don't have sequence numbers
>>>>> to populate the ContentFile object.
>>>>> - Extend FileScanTask with the file-level sequence numbers so after
>>>>> calling planFiles() we could retrieve these numbers via a new API call on
>>>>> the FileScanTask.
>>>>>
>>>>> There might be many other ways to implement this and I'd love to hear
>>>>> what people think and would be great to find a way that would help us out
>>>>> on Impala.
>>>>>
>>>>> Cheers,
>>>>> Gabor
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>>
>>>>
>>>
>

Re: Sequence number for ContentFiles

Reply via email to