Hey Edgar,
Thanks for the well articulated proposal.
I'm a little concerned that the proposed approach only partially addresses
the underlying challenge with equality deletes. Equality deletes are
extremely powerful because you can delete a row anywhere in the dataset
without any read cost. The
Hi Edgar,
Thanks for the well described proposal!
Knowing the Flink connector, I have the following concerns:
- Flink connector currently doesn't sort the rows in the data files. It
"chickens" out of this to avoid keeping anything in memory.
- Sorting the equality delete rows would also add memor
CMIW, the spec does not enforce `identifier fields` for equality
delete files. Engines are free to use different `equality_ids` among
commits, though the use case should be rare. Similarly, what sort order
should we use? It is common for a table to set sort order on columns other
than the primary k
Hi all,
I know there's been some conversations regarding optimization of equality
deletes and even their possible deprecation. We have been thinking
internally about a way to optimize merge-on-read with equality deletes to
better balance the read performance while having the benefits of performant