Yanfei Lei created FLINK-28614:
----------------------------------

             Summary: Empty local state folders not cleanup on retrieving local 
state
                 Key: FLINK-28614
                 URL: https://issues.apache.org/jira/browse/FLINK-28614
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.15.1, 1.15.0, 1.16.0
            Reporter: Yanfei Lei
             Fix For: 1.16.0


It would create a checkpoint directory when trying to load 
{{TaskStateSnapshot}} from the disk. The local checkpoint directory is not 
deleted on exit {{tryLoadTaskStateSnapshotFromDisk() }}even though 
{{TaskStateSnapshot}} doesn't exist. 

 
{code:java}
File getTaskStateSnapshotFile(long checkpointId) {
    final File checkpointDirectory =
            localRecoveryConfig
                    .getLocalStateDirectoryProvider()
                    .orElseThrow(
                            () -> new IllegalStateException("Local recovery 
must be enabled."))
                    .subtaskSpecificCheckpointDirectory(checkpointId);

    if (!checkpointDirectory.exists() && !checkpointDirectory.mkdirs()) {
        throw new FlinkRuntimeException(
                String.format(
                        "Could not create the checkpoint directory '%s'", 
checkpointDirectory));
    }

    return new File(checkpointDirectory, TASK_STATE_SNAPSHOT_FILENAME);
} {code}
 

 

This will cause the folder in /{{{}localState{}}} to remain after failover. 
Here is an example: 
{code:java}
41854 [flink-akka.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 
35644df535ca04613d6a6116dcfcfd59 from Checkpoint 2 @ 1658292943408 for 
35644df535ca04613d6a6116dcfcfd59 located at 
file:/var/folders/4n/q3r37vws2f910rt_f469kwg00000gn/T/junit1426665332205293555/junit63847204117629783/35644df535ca04613d6a6116dcfcfd59/chk-2.

_______________________________________
directory of localState
_______________________________________ 
tm_2
    │   ├── blobStorage
    │   ├── localState
    │   │   └── aid_6df21e53ca06ea69ee0643d25d27dbee
    │   │       └── jid_35644df535ca04613d6a6116dcfcfd59
    │   │           └── vtx_0a448493b4782967b150582570326227_sti_1
    │   │               ├── chk_2
    │   │               └── chk_5
    │   │                   ├── _task_state_snapshot
    │   │                   ├── edab98058083464a9ca29b6d7a950c68
    │   │                   │   ├── 000014.sst
    │   │                   │   ├── 000015.sst
    │   │                   │   ├── 000022.sst
    │   │                   │   ├── 000023.sst
    │   │                   │   ├── CURRENT
    │   │                   │   ├── MANIFEST-000018
    │   │                   │   └── OPTIONS-000021
    │   │                   └── f3724ae6-fd24-4e9a-80a8-02aa34bca0f0 {code}
cc: [~trohrmann] , [~masteryhx] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to