It is actually my bad not following up on that after #5913 and #6002. I’ll take a look at #5760 referenced below by the end of this week.
The plan was to expose sequence numbers on ContentFile. It is needed in a number of use cases. - Anton > On Apr 26, 2023, at 4:55 AM, Gabor Kaszab <gaborkas...@apache.org> wrote: > > Hey Iceberg Community, > > I know there has been a discussion previously about exposing the sequence > number on a ContentFile level, but if I'm not mistaken that conversation > didn't end with a consensus. I found some relevant PRs that has been open for > a while: > https://github.com/apache/iceberg/pull/5760 > <https://github.com/apache/iceberg/pull/5760> > https://github.com/apache/iceberg/pull/4769 > <https://github.com/apache/iceberg/pull/4769> (merged into the above PR) > > The reason I bring this topic up is that we started investigating recently > how to add read support for equality deletes to Impala. Apparently, > implementation-wise we could save a lot of hassle if sequence numbers were > exposed on a file level through the API, preferably somewhere around calling > planFiles(). We could then have a virtual 'SEQUENCE_NUMBER' when scanning the > data and delete files (separate scanners) and could easily filter the rows in > the JOIN node that joins the rows from the data files with the ones from the > delete files. (wouldn't go into more depth atm) > > With this mail I'd like to revive this conversation with the hope of > eventually coming to a solution that satisfies all participants. I've been > thinking of implementation choices we have to somehow provide sequence > numbers for the files: > - Extending ContentFile with sequence number: I checked the above PRs and > IIUC the issue with this approach is that ContentFile is meant to be > immutable and by the time they are created we don't have sequence numbers to > populate the ContentFile object. > - Extend FileScanTask with the file-level sequence numbers so after calling > planFiles() we could retrieve these numbers via a new API call on the > FileScanTask. > > There might be many other ways to implement this and I'd love to hear what > people think and would be great to find a way that would help us out on > Impala. > > Cheers, > Gabor > >