Re: Sequence number for ContentFiles

2023-04-28 Thread Gabor Kaszab
Good to hear that the community is open for this! Thanks everyone for the feedback, and thanks Anton for creating a ticket! Cheers, Gabor On Thu, Apr 27, 2023 at 6:02 PM Anton Okolnychyi wrote: > I created #7449 and added it to our 1.3.0 milestone to make sure it gets > delivered. > > - Anton

Re: Sequence number for ContentFiles

2023-04-27 Thread Anton Okolnychyi
I created #7449 and added it to our 1.3.0 milestone to make sure it gets delivered. - Anton > On Apr 26, 2023, at 1:59 PM, Steven Wu wrote: > > > Will a clock skew cause any issues w.r.t. relying on the snapshot commit > > time? I think we allow a mismatch up to a minute in TableMetadata. >

Re: Sequence number for ContentFiles

2023-04-26 Thread Steven Wu
> Will a clock skew cause any issues w.r.t. relying on the snapshot commit time? I think we allow a mismatch up to a minute in TableMetadata. This is probably not a problem. typically the max allowed misalignment is much longer than 1 minute. > We also planned to expose file sequence number (diff

Re: Sequence number for ContentFiles

2023-04-26 Thread Jack Ye
+1 for using file sequence number. This work has been discussed for a long time but never got picked up, would be great if someone can drive it to completion. -Jack On Wed, Apr 26, 2023 at 12:03 PM Anton Okolnychyi wrote: > Will a clock skew cause any issues w.r.t. relying on the snapshot commi

Re: Sequence number for ContentFiles

2023-04-26 Thread Anton Okolnychyi
Will a clock skew cause any issues w.r.t. relying on the snapshot commit time? I think we allow a mismatch up to a minute in TableMetadata. We also planned to expose file sequence number (different from data sequence number). I believe you could lookup snapshot using that info. https://iceberg.

Re: Sequence number for ContentFiles

2023-04-26 Thread Steven Wu
piggyback on this thread since we are discussing exposing more metadata in ContentFile or FileScanTask. Flink source watermark alignment can potentially leverage the snapshot timestamp (when data files are committed/appended to the table). Is it reasonable to expose some snapshot metadata in the F

Re: Sequence number for ContentFiles

2023-04-26 Thread Anton Okolnychyi
My initial thinking is that exposing sequence numbers on ContentFile is preferable (we would get it for free in scan tasks). That said, I’ll need to see how complicated the implementation would be. Exposing it on ContentScanTask is a viable alternative. However, we already have a precedent for a

Re: Sequence number for ContentFiles

2023-04-26 Thread Ryan Blue
Exposing sequence number makes sense for use cases like this. I also like the idea of exposing it through FileScanTask. That might be easier than trying to add it to ContentFile. Anton, what do you think about adding it to FileScanTask? On Wed, Apr 26, 2023 at 7:50 AM Anton Okolnychyi wrote: >

Re: Sequence number for ContentFiles

2023-04-26 Thread Anton Okolnychyi
It is actually my bad not following up on that after #5913 and #6002. I’ll take a look at #5760 referenced below by the end of this week. The plan was to expose sequence numbers on ContentFile. It is needed in a number of use cases. - Anton > On Apr 26, 2023, at 4:55 AM, Gabor Kaszab wrote:

Sequence number for ContentFiles

2023-04-26 Thread Gabor Kaszab
Hey Iceberg Community, I know there has been a discussion previously about exposing the sequence number on a ContentFile level, but if I'm not mistaken that conversation didn't end with a consensus. I found some relevant PRs that has been open for a while: https://github.com/apache/iceberg/pull/57