Hi community, We are seeing Spark Iceberg table metadata deletion consuming very high diver memory and seems causing s3 client failures, I would like to present my findings and seek comments from community:
- My table is a v1 table with 3k manifests, each manifest is around 20-30mb so in total, 77GB manifest. it is partitioned by 1 column using identity transform (dt column). - My Spark query is "delete from table where dt <= "xyz"" and from my analysis of the manifest table, only 1 manifest should be rewrite only (i.e. contains records whose partitions are <= xyz). I am observing the Spark driver memory consumption needs 46+ GB, and potentially have GC issues causing S3 client failure. During my investigation, I found that when performing metadata deletion, spark driver will go through all manifests and all the data entries to perform manifest filtering, then manifests are cached in "filteredManifests" in https://github.com/apache/iceberg/blob/1f8cad3c71850645f6656df652df523a5d8108a7/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L293 I suspect this is causing memory consumption to go high with all manifest cache in spark driver memory. Has the community experienced the same issue and have seen similar things? have we considered optimization such as distributing the manifest filtering and writing? Best, Pucheng