What kind of stats do we produce for position delete files beyond the file
path and row positions? Are we dealing with a writer that persists the
entire row in the position delete file? So far we modified the writer in
Iceberg core to discard all bounds if a position delete file references
more tha
I think we can discard column stats for position deletes, as long as the
data file path is preserved (as it is in #13161). For position deletes, we
need to preserve the stats for any equality ID columns. That reduces false
positives by ensuring that the IDs being deleted might be in the data file
t
It seems like a reasonable approach for DeleteFileIndex . I saw equality
delete file matching uses column stats. But it seems that column stats
(like lower/upper bounds) aren't used for associating position delete files
with a data file. Plus with file-scoped position delete files (V2),
matching wo
Hi,
I've been investigating an OOM issue during planning in the Trino
coordinator, and I've found that the main cause is the column stats
handling in the DeleteFileIndex class - it loads all delete files into
memory.
While rewriting delete files is one option, I'd like to explore reducing
memory u