[jira] [Comment Edited] (FLINK-34050) Rocksdb state has space amplification after rescaling with DeleteRange

Stefan Richter (Jira) Mon, 05 Feb 2024 02:42:41 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814312#comment-17814312
 ]


Stefan Richter edited comment on FLINK-34050 at 2/5/24 10:41 AM:
-----------------------------------------------------------------

Just one idea: since the current proposal is making the rescaling times worse, 
it can have significant drawback. How about we call deleteFiles async before 
the next checkpoint after a rescaling, thus making sure that the space 
amplification never makes it into the checkpoint and doing it outside of a 
critical path for restoring. Wdyt?


was (Author: srichter):
Just one idea: since the current proposal is making the rescaling times worse, 
it can have significant drawback. How about we call deleteFiles in the async 
part of the next checkpoint after a rescaling, thus making sure that the space 
amplification never makes it into the checkpoint and doing it outside of a 
critical path for restoring or processing. Wdyt?

> Rocksdb state has space amplification after rescaling with DeleteRange
> ----------------------------------------------------------------------
>
>                 Key: FLINK-34050
>                 URL: https://issues.apache.org/jira/browse/FLINK-34050
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>            Reporter: Jinzhong Li
>            Assignee: Jinzhong Li
>            Priority: Major
>         Attachments: image-2024-01-10-21-23-48-134.png, 
> image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png
>
>
> FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will 
> cause space amplification in some case.
> We can reproduce this problem using wordCount job:
> 1) before rescaling, state operator in wordCount job has 2 parallelism and 
> 4G+ full checkpoint size;
> !image-2024-01-10-21-24-10-983.png|width=266,height=130!
> 2) then restart job with 4 parallelism (for state operator),  the full 
> checkpoint size of new job will be 8G+ ;
> 3) after many successful checkpoints, the full checkpoint size is still 8G+;
> !image-2024-01-10-21-28-24-312.png|width=454,height=111!
>  
> The root cause of this issue is that the deleted keyGroupRange does not 
> overlap with current DB keyGroupRange, so new data written into rocksdb after 
> rescaling almost never do LSM compaction with the deleted data (belonging to 
> other keyGroupRange.)
>  
> And the space amplification may affect Rocksdb read performance and disk 
> space usage after rescaling. It looks like a regression due to the 
> introduction of deleteRange for rescaling optimization.
>  
> To slove this problem, I think maybe we can invoke 
> Rocksdb.deleteFilesInRanges after deleteRange?
> {code:java}
> public static void clipDBWithKeyGroupRange() {
>   //.......
>   List<byte[]> ranges = new ArrayList<>();
>   //.......
>   deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes);
>   ranges.add(beginKeyGroupBytes);
>   ranges.add(endKeyGroupBytes);
>   //....
>   for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) {
>      db.deleteFilesInRanges(columnFamilyHandle, ranges, false);
>   }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-34050) Rocksdb state has space amplification after rescaling with DeleteRange

Reply via email to