Changing default delete file granularity for Spark writes from partition to file scoped

Amogh Jahagirdar Mon, 11 Nov 2024 11:55:27 -0800

Hi all,

I wanted to discuss changing the default position delete file granularity
for Spark from partition to file level for any newly created V2 tables. See
this PR [1]

Context on delete file granularity:

- Partition granularity: Writers group delete files for multiple data
files from the same partition into the same delete file. This leads to
fewer files on disk, but higher read amplification from reading delete
information from irrelevant data files for a scan.
- File granularity: Writers write a new delete file for every changed
data file. More targeted reads of relevant delete information occur but
this can lead to more files on disk.

With the recent merge of synchronous position delete maintenance on write
in Spark [2], file granularity as a default is more compelling since reads
would be more targeted *and* files would be maintained on disk. I also
recommend folks go through the deletion vector design doc for more details
[3].

Note that for existing tables with high delete-to-data file ratios,
Iceberg's rewrite position deletes procedure can compact the table and
every subsequent write would continuously maintain the position deletes.
Additionally note that in V3, at most one puffin position delete file is
allowed per data file; what's being discussed here is changing the default
granularity for new V2 tables since it should generally be better after the
sync maintenance addition.

What are folks' thoughts on this?

[1] https://github.com/apache/iceberg/pull/11478
[2] https://github.com/apache/iceberg/pull/11273
[3]
https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM/edit?tab=t.0#heading=h.193fl7s89tcg

Thanks,

Amogh Jahagirdar

Changing default delete file granularity for Spark writes from partition to file scoped

Reply via email to