Hi, I've been investigating an OOM issue during planning in the Trino coordinator, and I've found that the main cause is the column stats handling in the DeleteFileIndex class - it loads all delete files into memory. While rewriting delete files is one option, I'd like to explore reducing memory usage within the Iceberg library itself.
I've opened a PR (#13161 <https://github.com/apache/iceberg/pull/13161>) that reduces memory usage on the Trino coordinator from 12.8 GB to 2.5 GB in my benchmark. The change copies only the file_path stats in DeleteFileIndex when the file is a positional delete. I'd appreciate your feedback on whether this is an acceptable approach, or if you have other suggestions. I understand that v4 will improve stats handling as part of #13153 <https://github.com/apache/iceberg/issues/13153>, but in the Trino community, we're also interested in reducing memory usage for tables using formats earlier than v4. Thanks, Yuya