Hi Devs, This is a joint work of Yuan Mei, Zakelly Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang. We'd like to start a discussion about introducing Disaggregated State Storage and Management in Flink 2.0.
The past decade has witnessed a dramatic shift in Flink's deployment mode, workload patterns, and hardware improvements. We've moved from the map-reduce era where workers are computation-storage tightly coupled nodes to a cloud-native world where containerized deployments on Kubernetes become standard. To enable Flink's Cloud-Native future, we introduce Disaggregated State Storage and Management that uses DFS as primary storage in Flink 2.0, as promised in the Flink 2.0 Roadmap. Design Details can be found in FLIP-423[1]. This new architecture is aimed to solve the following challenges brought in the cloud-native era for Flink. 1. Local Disk Constraints in containerization 2. Spiky Resource Usage caused by compaction in the current state model 3. Fast Rescaling for jobs with large states (hundreds of Terabytes) 4. Light and Fast Checkpoint in a native way More specifically, we want to reach a consensus on the following issues in this discussion: 1. Overall design 2. Proposed Changes 3. Design details to achieve Milestone1 In M1, we aim to achieve an end-to-end baseline version using DFS as primary storage and complete core functionalities, including: - Asynchronous State APIs (FLIP-424)[2]: Introduce new APIs for asynchronous state access. - Asynchronous Execution Model (FLIP-425)[3]: Implement a non-blocking execution model leveraging the asynchronous APIs introduced in FLIP-424. - Grouping Remote State Access (FLIP-426)[4]: Enable retrieval of remote state data in batches to avoid unnecessary round-trip costs for remote access - Disaggregated State Store (FLIP-427)[5]: Introduce the initial version of the ForSt disaggregated state store. - Fault Tolerance/Rescale Integration (FLIP-428)[6]: Integrate checkpointing mechanisms with the disaggregated state store for fault tolerance and fast rescaling. We will vote on each FLIP in separate threads to make sure each FLIP reaches a consensus. But we want to keep the discussion within a focused thread (this thread) for easier tracking of contexts to avoid duplicated questions/discussions and also to think of the problem/solution in a full picture. Looking forward to your feedback Best, Yuan, Zakelly, Jinzhong, Hangxiang, Yanfei and Feng [1] https://cwiki.apache.org/confluence/x/R4p3EQ [2] https://cwiki.apache.org/confluence/x/SYp3EQ [3] https://cwiki.apache.org/confluence/x/S4p3EQ [4] https://cwiki.apache.org/confluence/x/TYp3EQ [5] https://cwiki.apache.org/confluence/x/T4p3EQ [6] https://cwiki.apache.org/confluence/x/UYp3EQ