Jinzhong Li created FLINK-34050:
-----------------------------------

             Summary: Rocksdb state has space amplification after rescaling 
with DeleteRange
                 Key: FLINK-34050
                 URL: https://issues.apache.org/jira/browse/FLINK-34050
             Project: Flink
          Issue Type: Bug
          Components: Runtime / State Backends
            Reporter: Jinzhong Li
         Attachments: image-2024-01-10-21-23-48-134.png, 
image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png

FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will 
cause space amplification in some case.

We can reproduce this problem using wordCount job:

1) before rescaling, state operator in wordCount job has 2 parallelism and 4G+ 
full checkpoint size;

!image-2024-01-10-21-24-10-983.png|width=266,height=130!

2) then restart job with 4 parallelism (for state operator),  the full 
checkpoint size of new job will be 8G+ ;

3) after many successful checkpoints, the full checkpoint size is still 8G+;

!image-2024-01-10-21-28-24-312.png|width=454,height=111!

 

The root cause of this issue is that the deleted keyGroupRange does not overlap 
with current DB keyGroupRange, so new data written into rocksdb after rescaling 
almost never do LSM compaction with the deleted data (belonging to other 
keyGroupRange.)

 

And the space amplification may affect Rocksdb read performance and disk space 
usage after rescaling. It looks like a regression due to the introduction of 
deleteRange for rescaling optimization.

 

To slove this problem, I think maybe we can invoke Rocksdb.deleteFilesInRanges 
after deleteRange?
{code:java}
public static void clipDBWithKeyGroupRange() {
  //.......
  List<byte[]> ranges = new ArrayList<>();
  //.......
  deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes);
  ranges.add(beginKeyGroupBytes);
  ranges.add(endKeyGroupBytes);
  //....

  for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) {
     db.deleteFilesInRanges(columnFamilyHandle, ranges, false);
  }
}


{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to