Jinzhong Li created FLINK-34050: ----------------------------------- Summary: Rocksdb state has space amplification after rescaling with DeleteRange Key: FLINK-34050 URL: https://issues.apache.org/jira/browse/FLINK-34050 Project: Flink Issue Type: Bug Components: Runtime / State Backends Reporter: Jinzhong Li Attachments: image-2024-01-10-21-23-48-134.png, image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png
FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will cause space amplification in some case. We can reproduce this problem using wordCount job: 1) before rescaling, state operator in wordCount job has 2 parallelism and 4G+ full checkpoint size; !image-2024-01-10-21-24-10-983.png|width=266,height=130! 2) then restart job with 4 parallelism (for state operator), the full checkpoint size of new job will be 8G+ ; 3) after many successful checkpoints, the full checkpoint size is still 8G+; !image-2024-01-10-21-28-24-312.png|width=454,height=111! The root cause of this issue is that the deleted keyGroupRange does not overlap with current DB keyGroupRange, so new data written into rocksdb after rescaling almost never do LSM compaction with the deleted data (belonging to other keyGroupRange.) And the space amplification may affect Rocksdb read performance and disk space usage after rescaling. It looks like a regression due to the introduction of deleteRange for rescaling optimization. To slove this problem, I think maybe we can invoke Rocksdb.deleteFilesInRanges after deleteRange? {code:java} public static void clipDBWithKeyGroupRange() { //....... List<byte[]> ranges = new ArrayList<>(); //....... deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes); ranges.add(beginKeyGroupBytes); ranges.add(endKeyGroupBytes); //.... for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) { db.deleteFilesInRanges(columnFamilyHandle, ranges, false); } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)