I would love to see more flexibility in file stats. Together with the
change which allows storing metadata in columnar formats will open up many
new possibilities. Bloom filters in metadata which could be used for
filtering out files, HLL scratches etc
+1 for the change
On Tue, Jun 3, 2025, 0
Hi Xiaoxuan,
> 2. File-Level Indexing
> [..]
> To make this efficient, the table should be partitioned and sorted by the
PK.
If the table is partitioned and sorted by the PK, we don't really need to
have any index. We can find the data file containing the record based on
the Content File statisti
A quick update on the release. We're seeing an issue publishing to crates.io
using Github Action, the secret token required seems to be empty.
I opened https://issues.apache.org/jira/browse/INFRA-26882 to coordinate
with Apache Infra and set the secret.
Best,
Kevin Liu
On Sat, May 31, 2025 at 1
Hi Peter,
> If the table is partitioned and sorted by the PK, we don't really need to
have any index. We can find the data file containing the record based on
the Content File statistics, and the RowGroup containing the record based
on the Parquet metadata.
Our primary strategy for accelerating l
Hi,
I've been investigating an OOM issue during planning in the Trino
coordinator, and I've found that the main cause is the column stats
handling in the DeleteFileIndex class - it loads all delete files into
memory.
While rewriting delete files is one option, I'd like to explore reducing
memory u