Hi, I have a PR open to add changelog support for the case where delete files are present (https://github.com/apache/iceberg/pull/10935). I have a question about what the changelog should emit in the following scenario:
The table has a schema with a primary key/identifier column PK and additional column V. In snapshot 1, we write a data file DF1 with rows PK1, V1 PK2, V2 etc. In snapshot 2, we write an equality delete file ED1 with PK=PK1, and new data file DF2 with rows PK1, V1b (possibly other rows) In snapshot 3, we write an equality delete file ED2 with PK=PK1, and new data file DF3 with rows PK1, V1c (possibly other rows) Thus, in snapshot 2 and snapshot 3, we update the row identified by PK1 with new values by using an equality delete and writing new data for the row. These are the files present in snapshot 3: DF1 (sequence number 1) DF2 (sequence number 2) DF3 (sequence number 3) ED1 (sequence number 2) ED2 (sequence number 3) The question I have is what should the changelog emit for snapshot 3? For snapshot 1, the changelog should emit a row for each row in DF1 as INSERTED. For snapshot 2, it should emit a row for PK1, V1 as DELETED; and a row for PK1, V1b as INSERTED. For snapshot 3, I see two possibilities: (a) PK1,V1b,DELETED PK1,V1c,INSERTED (b) PK1,V1,DELETED PK1,V1b,DELETED PK1,V1c,INSERTED The interpretation for (b) is that both ED1 and ED2 apply to DF1, with ED1 being an existing delete file and ED2 being an added delete file for it. We discount ED1 and apply ED2 and get a DELETED row for PK1,V1. ED2 also applies to DF2, from which we get a DELETED row for PK1, V1b. The interpretation for (a) is that ED1 is an existing delete file for DF1 and in snapshot 3, the row PK1,V1 already does not exist before the snapshot. Thus we do emit a row for it. (We can think of it as ED1 is already applied to DF1, and we only consider any additional rows that get deleted when ED2 is applied.) I lean towards (a), as I think it is more reflective of net changes. I am interested to hear what folks think. Thank you, Wing Yew