Spark metadata deletion reliability issues of high memory usage cause s3 client failures, potentially due to cached high volume of manifests

Pucheng Yang Thu, 18 Apr 2024 20:56:18 -0700

Hi community,

We are seeing Spark Iceberg table metadata deletion consuming very high
diver memory and seems causing s3 client failures, I would like to present
my findings and seek comments from community:


   - My table is a v1 table with 3k manifests, each manifest is around
   20-30mb so in total, 77GB manifest. it is partitioned by 1 column using
   identity transform (dt column).
   - My Spark query is "delete from table where dt <= "xyz""  and from my
   analysis of the manifest table, only 1 manifest should be rewrite only
   (i.e. contains records whose partitions are <= xyz).

I am observing the Spark driver memory consumption needs 46+ GB, and
potentially have GC issues causing S3 client failure. During my
investigation, I found that when performing metadata deletion, spark driver
will go through all manifests and all the data entries to perform manifest
filtering, then manifests are cached in "filteredManifests" in
https://github.com/apache/iceberg/blob/1f8cad3c71850645f6656df652df523a5d8108a7/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L293

I suspect this is causing memory consumption to go high with all manifest
cache in spark driver memory. Has the community experienced the same issue
and have seen similar things? have we considered optimization such as
distributing the manifest filtering and writing?

Best,
Pucheng

Spark metadata deletion reliability issues of high memory usage cause s3 client failures, potentially due to cached high volume of manifests

Reply via email to