[ https://issues.apache.org/jira/browse/FLINK-34975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hangxiang Yu updated FLINK-34975: --------------------------------- Fix Version/s: 2.0.0 > FLIP-427: ForSt - Disaggregated state Store > ------------------------------------------- > > Key: FLINK-34975 > URL: https://issues.apache.org/jira/browse/FLINK-34975 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends > Reporter: Hangxiang Yu > Assignee: Hangxiang Yu > Priority: Major > Fix For: 2.0.0 > > > This is a sub-FLIP for the disaggregated state management and its related > work, please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] > first to know the whole story. > As described in FLIP-423, there are some tough issues about embedded state > backend on local file system, respecially when dealing with extremely large > state: > # {*}Constraints of local disk space complicate the prediction of storage > requirements, potentially leading to job failures{*}: Especially in cloud > native deployment mode, pre-allocated local disks typically face strict > capacity constraints, making it challenging to forecast the size requirements > of job states. Over-provisioning disk space results in unnecessary resource > overhead, while under-provisioning risks job failure due to insufficient > space. > # *The tight coupling of compute and storage resources leads to > underutilization and increased waste:* Jobs can generally be categorized as > either CPU-intensive or IO-intensive. In a coupled architecture, > CPU-intensive jobs leave a significant portion of storage resources > underutilized, whereas IO-intensive jobs result in idle computing resources. > By considering remote storage as the primary storage, all working states are > maintained on the remote file system, which brings several advantages: > # *Remote storages e.g. S3/HDFS typically offer elastic scalability, > theoretically providing unlimited space.* > # *The allocation of remote storage resources can be optimized by reducing > them for CPU-intensive jobs and augmenting them for IO-intensive jobs, thus > enhancing overall resource utilization.* > # *This architecture facilitates a highly efficient and lightweight process > for checkpointing, recovery, and rescaling through fast copy or simple move.* > This FLIP aims to realize disaggregated state for our new key-value store > named *ForSt* which evloves from RocksDB and supports remote file system. > This makes Flink get rid of the disadvantages by coupled state architecture > and embrace the scalable as well as flexible cloud-native storage. > Please see [FLIP-427 |https://cwiki.apache.org/confluence/x/T4p3EQ]for more > details. -- This message was sent by Atlassian Jira (v8.20.10#820010)