Hey folks, As discussed during the sync, I've been working on a proposal to improve the handling of position deletes in V3. It builds on lessons learned from deploying the current approach at scale and addresses all unresolved questions from past community discussions and proposals around this topic.
In particular, the proposal attempts to address the following shortcomings we observe today: - Choosing between fewer delete files on disk or targeted deletes. - Dependence on external maintenance for consistent write and read performance. - Writing and reading overhead as in-memory and on-disk representations differ. Please, take a look at the doc [1] and let me know what you think. Any feedback is highly appreciated! - Anton [1] - https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM