Marc Aurel Fritz created FLINK-35217:
----------------------------------------

             Summary: Missing fsync in FileSystemCheckpointStorage
                 Key: FLINK-35217
                 URL: https://issues.apache.org/jira/browse/FLINK-35217
             Project: Flink
          Issue Type: Bug
          Components: FileSystems, Runtime / Checkpointing
    Affects Versions: 1.19.0, 1.18.0, 1.17.0
            Reporter: Marc Aurel Fritz


While running Flink on a system with unstable power supply checkpoints were 
regularly corrupted in the form of "_metadata" files with a file size of 0 
bytes. In all cases the previous checkpoint data had already been deleted, 
causing progress to be lost completely.

Further investigation revealed that the "FileSystemCheckpointStorage" doesn't 
perform "fsync" when writing a new checkpoint to disk. This means the old 
checkpoint gets removed without making sure that the new one is durably 
persisted on disk. "strace" on the jobmanager's process confirms this behavior:
 # The checkpoint chk-60's in-progress metadata is written at "openat"
 # The checkpoint chk-60's in-progress metadata is atomically renamed at 
"rename"
 # The old checkpoint chk-59 is deleted at "unlink"

For durable persistence an "fsync" call is missing before step 3.
Full "strace" log:
{code:java}
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
0x7fd2ad5fc970) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
0x7fd2ad5fca00) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc", 
{st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
[pid 51618] 11:44:30 
mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777) = 0
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", 
0x7fd2ad5fc860) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", 
0x7fd2ad5fc740) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51618] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 O_WRONLY|O_CREAT|O_EXCL, 0666) = 168
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
[pid 51618] 11:44:30 
rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata", 
{st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
[pid 51644] 11:44:30 
unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata")
 = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
[pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
[pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 51644] 11:44:30 
unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = -1 
EISDIR (Is a directory)
[pid 51644] 11:44:30 
rmdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = 0 
{code}
To fix this I'm currently testing the following commit: 
[https://github.com/Planet-X/flink/commit/24196cc897533b654f44e2b612543ff023cdb123]

"strace" can confirm that "fsync" is now called before the previous checkpoint 
is removed at "unlink":
{code:java}
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",  
<unfinished ...>
[pid 40393] 11:30:17 <... stat resumed>0x7fc887efc970) = -1 ENOENT (No such 
file or directory)
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
0x7fc887efca00) = -1 ENOENT (No such file or directory)
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e", 
{st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
[pid 40393] 11:30:17 
mkdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 0777) = 0
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata", 
0x7fc887efc870) = -1 ENOENT (No such file or directory)
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata", 
0x7fc887efc750) = -1 ENOENT (No such file or directory)
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
 0x7fc887efc7e0) = -1 ENOENT (No such file or directory)
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 40393] 11:30:17 openat(AT_FDCWD, 
"/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
 O_WRONLY|O_CREAT|O_EXCL, 0666) = 194
[pid 40393] 11:30:17 fsync(194)         = 0
[pid 40393] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
 {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
[pid 40393] 11:30:17 
rename("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
 "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata") = 0
[pid 39230] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata", 
{st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
[pid 39230] 11:30:17 
unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata")
 = 0
[pid 39230] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 39230] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 39230] 11:30:17 openat(AT_FDCWD, 
"/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
[pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 39230] 11:30:17 
stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 39230] 11:30:17 openat(AT_FDCWD, 
"/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
[pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 39230] 11:30:17 
unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = -1 
EISDIR (Is a directory)
[pid 39230] 11:30:17 
rmdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = 0 
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to