Re: Sequence number for ContentFiles

Anton Okolnychyi Thu, 27 Apr 2023 09:02:29 -0700

I created #7449 and added it to our 1.3.0 milestone to make sure it gets 
delivered.


- Anton

> On Apr 26, 2023, at 1:59 PM, Steven Wu <stevenz...@gmail.com> wrote:
> 
> > Will a clock skew cause any issues w.r.t. relying on the snapshot commit 
> > time? I think we allow a mismatch up to a minute in TableMetadata.
> 
> This is probably not a problem. typically the max allowed misalignment is 
> much longer than 1 minute.
> 
> > We also planned to expose file sequence number (different from data 
> > sequence number). I believe you could lookup snapshot using that info.
> 
> That would work too. 
> 
> 
> On Wed, Apr 26, 2023 at 1:10 PM Jack Ye <yezhao...@gmail.com 
> <mailto:yezhao...@gmail.com>> wrote:
> +1 for using file sequence number. This work has been discussed for a long 
> time but never got picked up, would be great if someone can drive it to 
> completion.
> 
> -Jack
> 
> On Wed, Apr 26, 2023 at 12:03 PM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Will a clock skew cause any issues w.r.t. relying on the snapshot commit 
> time? I think we allow a mismatch up to a minute in TableMetadata.
> 
> We also planned to expose file sequence number (different from data sequence 
> number). I believe you could lookup snapshot using that info.
> 
> https://iceberg.apache.org/spec/#manifest-entry-fields 
> <https://iceberg.apache.org/spec/#manifest-entry-fields>
> 
> - Anton
> 
>> On Apr 26, 2023, at 11:52 AM, Steven Wu <stevenz...@gmail.com 
>> <mailto:stevenz...@gmail.com>> wrote:
>> 
>> piggyback on this thread since we are discussing exposing more metadata in 
>> ContentFile or FileScanTask. Flink source watermark alignment can 
>> potentially leverage the snapshot timestamp (when data files are 
>> committed/appended to the table).  Is it reasonable to expose some snapshot 
>> metadata in the FileScanTask?
>> 
>> This can help the Flink job to ensure two (or more) sources are proceeding 
>> at similar paces. Some use cases may require column stats (min-max values) 
>> for the watermark alignment. Some use cases can leverage snapshot timestamps 
>> for the alignment purpose.
>> 
>> On Wed, Apr 26, 2023 at 11:15 AM Anton Okolnychyi 
>> <aokolnyc...@apple.com.invalid <mailto:aokolnyc...@apple.com.invalid>> wrote:
>> My initial thinking is that exposing sequence numbers on ContentFile is 
>> preferable (we would get it for free in scan tasks). That said, I’ll need to 
>> see how complicated the implementation would be. Exposing it on 
>> ContentScanTask is a viable alternative. However, we already have a 
>> precedent for assigning specId in InheritableMetadata.
>> 
>> - Anton
>> 
>>> On Apr 26, 2023, at 10:41 AM, Ryan Blue <b...@tabular.io 
>>> <mailto:b...@tabular.io>> wrote:
>>> 
>>> Exposing sequence number makes sense for use cases like this. I also like 
>>> the idea of exposing it through FileScanTask. That might be easier than 
>>> trying to add it to ContentFile.
>>> 
>>> Anton, what do you think about adding it to FileScanTask?
>>> 
>>> On Wed, Apr 26, 2023 at 7:50 AM Anton Okolnychyi 
>>> <aokolnyc...@apple.com.invalid <mailto:aokolnyc...@apple.com.invalid>> 
>>> wrote:
>>> It is actually my bad not following up on that after #5913 and #6002. I’ll 
>>> take a look at #5760 referenced below by the end of this week. 
>>> 
>>> The plan was to expose sequence numbers on ContentFile. It is needed in a 
>>> number of use cases.
>>> 
>>> - Anton
>>> 
>>>> On Apr 26, 2023, at 4:55 AM, Gabor Kaszab <gaborkas...@apache.org 
>>>> <mailto:gaborkas...@apache.org>> wrote:
>>>> 
>>>> Hey Iceberg Community,
>>>> 
>>>> I know there has been a discussion previously about exposing the sequence 
>>>> number on a ContentFile level, but if I'm not mistaken that conversation 
>>>> didn't end with a consensus. I found some relevant PRs that has been open 
>>>> for a while:
>>>> https://github.com/apache/iceberg/pull/5760 
>>>> <https://github.com/apache/iceberg/pull/5760>
>>>> https://github.com/apache/iceberg/pull/4769 
>>>> <https://github.com/apache/iceberg/pull/4769> (merged into the above PR)
>>>> 
>>>> The reason I bring this topic up is that we started investigating recently 
>>>> how to add read support for equality deletes to Impala. Apparently, 
>>>> implementation-wise we could save a lot of hassle if sequence numbers were 
>>>> exposed on a file level through the API, preferably somewhere around 
>>>> calling planFiles(). We could then have a virtual 'SEQUENCE_NUMBER' when 
>>>> scanning the data and delete files (separate scanners) and could easily 
>>>> filter the rows in the JOIN node that joins the rows from the data files 
>>>> with the ones from the delete files. (wouldn't go into more depth atm)
>>>> 
>>>> With this mail I'd like to revive this conversation with the hope of 
>>>> eventually coming to a solution that satisfies all participants. I've been 
>>>> thinking of implementation choices we have to somehow provide sequence 
>>>> numbers for the files:
>>>> - Extending ContentFile with sequence number: I checked the above PRs and 
>>>> IIUC the issue with this approach is that ContentFile is meant to be 
>>>> immutable and by the time they are created we don't have sequence numbers 
>>>> to populate the ContentFile object.
>>>> - Extend FileScanTask with the file-level sequence numbers so after 
>>>> calling planFiles() we could retrieve these numbers via a new API call on 
>>>> the FileScanTask.
>>>> 
>>>> There might be many other ways to implement this and I'd love to hear what 
>>>> people think and would be great to find a way that would help us out on 
>>>> Impala.
>>>> 
>>>> Cheers,
>>>> Gabor
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Ryan Blue
>>> Tabular
>> 
>

Re: Sequence number for ContentFiles

Reply via email to