[ https://issues.apache.org/jira/browse/FLINK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinzhong Li updated FLINK-32953: -------------------------------- Fix Version/s: (was: 1.20.0) > [State TTL]resolve data correctness problem after ttl was changed > ------------------------------------------------------------------ > > Key: FLINK-32953 > URL: https://issues.apache.org/jira/browse/FLINK-32953 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends > Reporter: Jinzhong Li > Assignee: Jinzhong Li > Priority: Major > Labels: pull-request-available > > Because expired data is cleaned up in background on a best effort basis > (hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use > ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted > into snapshots. > > In some scenarios, user changes the state ttl of the job and then restore job > from the old state. If the user adjust the state ttl from a short value to a > long value (eg, from 12 hours to 24 hours), some expired data that was not > cleaned up will be alive after restore. Obviously this is unreasonable, and > may break data regulatory requirements. > > Particularly, rocksdb stateBackend may cause data correctness problems due to > level compaction in this case.(eg. One key has two versions at level-1 and > level-2,both of which are ttl expired. Then level-1 version is cleaned up by > compaction, and level-2 version isn't. If we adjust state ttl and restart > job, the incorrect data of level-2 will become valid after restore) > > To solve this problem, I think we can > 1) persist old state ttl into snapshot meta info; (eg. > RegisteredKeyValueStateBackendMetaInfo or others) > 2) During state restore, check the size between the current ttl and old ttl; > 3) If current ttl is longer than old ttl, we need to iterate over all data, > filter out expired data with old ttl, and wirte valid data into stateBackend. -- This message was sent by Atlassian Jira (v8.20.10#820010)