Hello Kuroda-san,
23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote:
I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.
Thanks, here are comments. I'm quite not sure for the windows, so I may say
something wrong.
* I'm not sure why the file/directory name was changed before doing a unlink.
Could you add descriptions?
Please look at the simple test program attached. It demonstrates the
failure for me when running in two sessions as follows:
unlink-open test 150 1000
unlink-open test2 150 1000
...
iteration 60
iteration 61
fopen() after unlink() failed (13)
Process Monitor shows:
...
9:16:55.9249792 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 138,412,032,
Length: 1,048,576
### unlink() performed for the file "test":
9:16:55.9852903 AM unlink-open.exe 4968 CreateFile C:\src\test SUCCESS Desired Access: Read Attributes,
Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write,
Delete, AllocationSize: n/a, OpenResult: Opened
9:16:55.9853637 AM unlink-open.exe 4968 QueryAttributeTagFile C:\src\test SUCCESS Attributes: A,
ReparseTag: 0x0
### file "test" gets into state DELETE PENDING:
9:16:55.9853756 AM unlink-open.exe 4968 SetDispositionInformationFile
C:\src\test SUCCESS Delete: True
9:16:55.9853888 AM unlink-open.exe 4968 CloseFile C:\src\test
SUCCESS
### concurrent operations with file "test2":
9:16:55.9866362 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 139,460,608,
Length: 1,048,576
...
9:16:55.9972373 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 157,286,400,
Length: 1,048,576
9:16:55.9979040 AM unlink-open.exe 3232 CloseFile C:\src\test2
SUCCESS
### unlink() for "test2":
9:16:56.1029981 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:16:56.1030432 AM unlink-open.exe 3232 QueryAttributeTagFile C:\src\test2 SUCCESS Attributes:
A, ReparseTag: 0x0
### file "test2" gets into state DELETE PENDING:
9:16:56.1030517 AM unlink-open.exe 3232 SetDispositionInformationFile C:\src\test2 SUCCESS
Delete: True
9:16:56.1030625 AM unlink-open.exe 3232 CloseFile C:\src\test2
SUCCESS
### and then it opened successfully:
9:16:56.1189503 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
9:16:56.1192016 AM unlink-open.exe 3232 CloseFile C:\src\test2
SUCCESS
### operations with file "test2" continued:
9:16:56.1193394 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:16:56.1193895 AM unlink-open.exe 3232 QueryAttributeTagFile C:\src\test2 SUCCESS Attributes:
A, ReparseTag: 0x0
9:16:56.1194042 AM unlink-open.exe 3232 SetDispositionInformationFile C:\src\test2 SUCCESS
Delete: True
9:16:56.1194188 AM unlink-open.exe 3232 CloseFile C:\src\test2
SUCCESS
9:16:56.1198459 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
9:16:56.1200302 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 0, Length:
1,048,576, Priority: Normal
...
9:16:56.1275871 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 10,485,760,
Length: 1,048,576
### at the same time, CreateFile() for file "test" failed:
9:16:56.1276453 AM unlink-open.exe 4968 CreateFile C:\src\test DELETE PENDING Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0
9:16:56.1279359 AM unlink-open.exe 3232 WriteFile C:\src\test2
SUCCESS Offset: 11,534,336, Length: 1,048,576
9:16:56.1283452 AM unlink-open.exe 3232 WriteFile C:\src\test2
SUCCESS Offset: 12,582,912, Length: 1,048,576
...
But with rename(MoveFileEx), I see:
unlink-open test 150 1000 rename
...
9:38:01.7035286 AM unlink-open.exe 10208 WriteFile C:\src\test
SUCCESS Offset: 156,237,824, Length: 1,048,576
9:38:01.7075621 AM unlink-open.exe 10208 WriteFile C:\src\test
SUCCESS Offset: 157,286,400, Length: 1,048,576
9:38:01.7101299 AM unlink-open.exe 10208 CloseFile C:\src\test
SUCCESS
9:38:01.7130802 AM unlink-open.exe 10208 CreateFile C:\src\test SUCCESS Desired Access: Read Attributes,
Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a,
ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.7132844 AM unlink-open.exe 10208 QueryAttributeTagFile C:\src\test SUCCESS Attributes: A,
ReparseTag: 0x0
9:38:01.7133420 AM unlink-open.exe 10208 QueryBasicInformationFile C:\src\test SUCCESS CreationTime:
11/24/2023 9:38:01 AM, LastAccessTime: 11/24/2023 9:38:01 AM, LastWriteTime: 11/24/2023 9:38:01 AM, ChangeTime:
11/24/2023 9:38:01 AM, FileAttributes: A
9:38:01.7135191 AM unlink-open.exe 10208 CreateFile C:\src SUCCESS Desired Access: Write Data/Add File,
Synchronize, Disposition: Open, Options: , Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
### file "test" renamed to "test.tmp", it doesn't get into state DELETE PENDING
9:38:01.7136221 AM unlink-open.exe 10208 SetRenameInformationFile C:\src\test SUCCESS ReplaceIfExists: True,
FileName: C:\src\test.tmp
9:38:01.8384110 AM unlink-open.exe 10208 CloseFile C:\src SUCCESS
9:38:01.8388203 AM unlink-open.exe 10208 CloseFile C:\src\test.tmp
SUCCESS
### then file "test.tmp" deleted as usual:
9:38:01.8394278 AM unlink-open.exe 10208 CreateFile C:\src\test.tmp SUCCESS Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.8396534 AM unlink-open.exe 10208 QueryAttributeTagFile C:\src\test.tmp SUCCESS Attributes: A,
ReparseTag: 0x0
9:38:01.8396885 AM unlink-open.exe 10208 SetDispositionInformationFile
C:\src\test.tmp SUCCESS Delete: True
9:38:01.8397312 AM unlink-open.exe 10208 CloseFile C:\src\test.tmp
SUCCESS
9:38:01.9162566 AM unlink-open.exe 10208 CreateFile C:\src\test SUCCESS Desired Access: Generic Write,
Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
9:38:01.9167628 AM unlink-open.exe 10208 CloseFile C:\src\test
SUCCESS
So the same test run with MoveFileEx():
unlink-open test 150 1000 rename
unlink-open test2 150 1000 rename
successfully passes for me in the same environment (Windows VM slowed down to
50%).
That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.
* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from
cleanup_output_dirs()?
They seems to be add for the similar motivation.
I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]), with c28911750 reverted, so I need more time to
research that issue to answer this question.
[1] https://www.postgresql.org/message-id/20230131172806.GM22427%40telsasoft.com
Best regards,
Alexander
#include <stdio.h>
#include <string.h>
#include <windows.h>
char buf[1048576] = {0};
int main(int argc, char *argv[])
{
int nmb;
int res;
int num_iterations;
char *filename, *curfilename;
char tmpfilename[MAX_PATH];
FILE *fh;
char do_rename;
if (argc < 4)
{ printf("Usage: %s {filename} {size_mb} {num_iterations}", argv[0]); return 1; }
filename = argv[1];
nmb = atoi(argv[2]);
if (nmb < 1) nmb = 1;
num_iterations = atoi(argv[3]);
if (num_iterations < 1) num_iterations = 1;
do_rename = (strcmp(argv[argc - 1], "rename") == 0);
for (int i = 1; i <= num_iterations; i++) {
printf("iteration %d\n", i);
fh = fopen(filename, "wb");
if (fh == NULL)
{ printf("first fopen() failed (%d)\n", errno); return 1; }
for (int n = 0; n <= nmb; n++) {
if (fwrite(buf, 1, sizeof(buf), fh) != sizeof(buf))
{ printf("fwrite() failed (%d)\n", errno); return 1; }
}
res = fclose(fh);
if (res < 0)
{ printf("first fclose() failed (%d)\n", errno); return 1; }
curfilename = filename;
if (do_rename)
{
sprintf(tmpfilename, "%s.tmp", filename);
res = MoveFileEx(filename, tmpfilename, MOVEFILE_REPLACE_EXISTING); // from dirmod.c
if (res == 0)
{ printf("MoveFileEx() failed (%d)\n", errno); return 1; }
curfilename = tmpfilename;
}
res = unlink(curfilename);
if (res < 0)
{ printf("first unlink() failed (%d)\n", errno); return 1; }
fh = fopen(filename, "w");
if (fh == NULL)
{ printf("fopen() after unlink() failed (%d)\n", errno); return 2; }
res = fclose(fh);
if (res < 0)
{ printf("fclose() failed (%d)\n", errno); return 1; }
res = unlink(filename);
if (res < 0)
{ printf("unlink() failed (%d)\n", errno); return 1; }
}
return 0;
}