It is actually my bad not following up on that after #5913 and #6002. I’ll take 
a look at #5760 referenced below by the end of this week. 

The plan was to expose sequence numbers on ContentFile. It is needed in a 
number of use cases.

- Anton

> On Apr 26, 2023, at 4:55 AM, Gabor Kaszab <gaborkas...@apache.org> wrote:
> 
> Hey Iceberg Community,
> 
> I know there has been a discussion previously about exposing the sequence 
> number on a ContentFile level, but if I'm not mistaken that conversation 
> didn't end with a consensus. I found some relevant PRs that has been open for 
> a while:
> https://github.com/apache/iceberg/pull/5760 
> <https://github.com/apache/iceberg/pull/5760>
> https://github.com/apache/iceberg/pull/4769 
> <https://github.com/apache/iceberg/pull/4769> (merged into the above PR)
> 
> The reason I bring this topic up is that we started investigating recently 
> how to add read support for equality deletes to Impala. Apparently, 
> implementation-wise we could save a lot of hassle if sequence numbers were 
> exposed on a file level through the API, preferably somewhere around calling 
> planFiles(). We could then have a virtual 'SEQUENCE_NUMBER' when scanning the 
> data and delete files (separate scanners) and could easily filter the rows in 
> the JOIN node that joins the rows from the data files with the ones from the 
> delete files. (wouldn't go into more depth atm)
> 
> With this mail I'd like to revive this conversation with the hope of 
> eventually coming to a solution that satisfies all participants. I've been 
> thinking of implementation choices we have to somehow provide sequence 
> numbers for the files:
> - Extending ContentFile with sequence number: I checked the above PRs and 
> IIUC the issue with this approach is that ContentFile is meant to be 
> immutable and by the time they are created we don't have sequence numbers to 
> populate the ContentFile object.
> - Extend FileScanTask with the file-level sequence numbers so after calling 
> planFiles() we could retrieve these numbers via a new API call on the 
> FileScanTask.
> 
> There might be many other ways to implement this and I'd love to hear what 
> people think and would be great to find a way that would help us out on 
> Impala.
> 
> Cheers,
> Gabor
> 
> 

Reply via email to