Say if I have a table that contains the following data rows. date, content 20210301, "a1" 20210302, "a2" 20210303, "a3" ... 20210401, "b1" 20210402, "b2" 20210403, "b3" 20210404, "b4" 20210405, "b5"
The table is partitioned by month(date) and data is properly stored in partitioned data files in sorted order when writing. If I want to delete a range of data rows by a date range [20210402, 20210404] in partition 202104, as shown below. *Assuming I can only use Iceberg core API*: date, content 20210301, "a1" 20210302, "a2" 20210303, "a3" ... 20210401, "b1" 20210402, "b2" 20210403, "b3" 20210404, "b4" 20210405, "b5" I can think of the following options. 1. I know I can rewrite the entire partition by reading the data and remove the range of rows. That will create new data files and delete the old data files. 2. I looked a bit on in position delete files <https://iceberg.apache.org/spec/#position-delete-files> and equality delete files <https://iceberg.apache.org/spec/#equality-delete-files> V2 to see if I can use row level delete files to include the rows to be deleted. Equality delete won't work here because it needs to match for a range (or some predicate) but not a single value. Position delete doesn't seem working too because I would not know beforehand the exact positions of rows within the data file to be deleted (I only know the key range). I know I can read the data file and then figure out the positions but that is effectively the same as re-reading the data. My question is, when using Iceberg core API, is there a way to compose a range delete like the above, w/o overwrite the entire partition, or reading back the data? Any thoughts? -- Chen Song