[ https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410474#comment-15410474 ]
Chen Qin commented on FLINK-4266: --------------------------------- https://docs.google.com/document/d/1diHQyOPZVxgmnmYfiTa6glLf-FlFjSHcL8J3YR2xLdk/edit?usp=sharing > Remote Database Statebackend > ---------------------------- > > Key: FLINK-4266 > URL: https://issues.apache.org/jira/browse/FLINK-4266 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing > Affects Versions: 1.0.3, 1.2.0 > Reporter: Chen Qin > Priority: Minor > > Current FileSystem statebackend limits whole state size to disk space. > Dealing with scale out checkpoint states beyond local disk space such as long > running task that hold window content for long period of time. Pipelines > needs to split out states to durable remote storage even replicated to > different data centers. > We draft a design that leverage checkpoint id as mono incremental logic > timestamp and perform range query to get evicited state k/v. we also > introduce checkpoint time commit and eviction threshold that reduce "hot > states" hitting remote db per every update between adjacent checkpoints by > tracking update logs and merge them, do batch updates only when checkpoint; > lastly, we are looking for eviction policy that can identify "hot keys" in > k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra). > For now, we don't have good story regarding to data retirement. We might have > to keep forever until manually run command and clean per job related state > data. Some of features might related to incremental checkpointing feature, we > hope to align with effort there also. > Welcome comments, I will try to put a draft design doc after gathering some > feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)