[ https://issues.apache.org/jira/browse/FLINK-34050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805392#comment-17805392 ]
Yue Ma commented on FLINK-34050: -------------------------------- Hi [~lijinzhong] Thanks for reporting this issue. We have also encountered it before. I think this is a great suggestion. Overall, this is still a trade off of time and space If recovery time is the most important, then we can use deleteRange If we want to achieve good recovery time and space amplification, then we can use deleteRange+deleteFilesInRanges If space enlargement is very important, then we can consider deleteRange+deleteFilesInRanges+CompactRanges (Of course, perhaps we can see if there are other ways to change space reclamation to an asynchronous process) > Rocksdb state has space amplification after rescaling with DeleteRange > ---------------------------------------------------------------------- > > Key: FLINK-34050 > URL: https://issues.apache.org/jira/browse/FLINK-34050 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends > Reporter: Jinzhong Li > Priority: Major > Attachments: image-2024-01-10-21-23-48-134.png, > image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png > > > FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will > cause space amplification in some case. > We can reproduce this problem using wordCount job: > 1) before rescaling, state operator in wordCount job has 2 parallelism and > 4G+ full checkpoint size; > !image-2024-01-10-21-24-10-983.png|width=266,height=130! > 2) then restart job with 4 parallelism (for state operator), the full > checkpoint size of new job will be 8G+ ; > 3) after many successful checkpoints, the full checkpoint size is still 8G+; > !image-2024-01-10-21-28-24-312.png|width=454,height=111! > > The root cause of this issue is that the deleted keyGroupRange does not > overlap with current DB keyGroupRange, so new data written into rocksdb after > rescaling almost never do LSM compaction with the deleted data (belonging to > other keyGroupRange.) > > And the space amplification may affect Rocksdb read performance and disk > space usage after rescaling. It looks like a regression due to the > introduction of deleteRange for rescaling optimization. > > To slove this problem, I think maybe we can invoke > Rocksdb.deleteFilesInRanges after deleteRange? > {code:java} > public static void clipDBWithKeyGroupRange() { > //....... > List<byte[]> ranges = new ArrayList<>(); > //....... > deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes); > ranges.add(beginKeyGroupBytes); > ranges.add(endKeyGroupBytes); > //.... > for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) { > db.deleteFilesInRanges(columnFamilyHandle, ranges, false); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)