Hi everyone, I would like to open a discussion on introducing remote compaction for disaggregated state[1].
Flink state backends rely on LSM-Trees for large-scale storage, with file compaction executed locally in TaskManager background threads. This co-location creates local resource contention, causing latency spikes and resource instability. Flink 2.0 introduces disaggregated state management through the ForSt StateBackend[2], employing a shared DFS as primary storage. This allows ForSt to implement compaction-as-a-service (Remote Compaction) through dedicated compaction workers. This approach can clearly separate the responsibilities between computing and storage nodes, therefore further complement Flink's disaggregated architecture. Introducing a compaction service aligns with the pooling concept prevalent in the cloud-native era, and can significantly improve the resource efficiency and elasticity of Flink stateful jobs. Looking forward to your comments or feedback. Best regards, Han Yin [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-430%3A+Remote+Compaction+For+Disaggregated+State [2] https://cwiki.apache.org/confluence/x/R4p3EQ