Hi, Thomas!
Thanks for your rapid answer and sorry for my delay with reply.
On 01.02.2023 09:45, Thomas Munro wrote:
Might add a custom error message for EDEADLK
since it absent in errcode_for_file_access()?
Ah, good thought. I think it shouldn't happen™, so it's OK that
errcode_for_file_access() would classify it as ERRCODE_INTERNAL_ERROR.
Yes, i also think that is impossible since the lock is taken on
the entire file, so ERRCODE_INTERNAL_ERROR will be right here.
Other interesting errors are:
ENOLCK: system limits exceeded; PANIC seems reasonable
EOPNOTSUPP: this file doesn't support locking (seen on FreeBSD man
pages, not on POSIX)
Agreed that ENOLCK is a PANIC or at least FATAL. Maybe it's even better
to do it FATAL to allow other backends to survive?
As for EOPNOTSUPP, maybe make a fallback to the workaround from the
first variant of the patch? (In my previous letter i forgot the pause
after break;, of cause)
I don't know if Windows suffers from this type of problem.
Unfortunately its equivalent functionality LockFile() looks non-ideal
for this purpose: if your program crashes, the documentation is very
vague on when exactly it is released by the OS, but it's not
immediately on process exit. That seems non-ideal for a control file
you might want to access again very soon after a crash, to be able to
recover.
Unfortunately i've not had time to reproduce the problem and apply this patch on
Windows yet but i'm going to do it soon on windows 10. If a crc error
will occur there, then we might use the workaround from the first
variant of the patch.
Thank you for investigating. I am afraid to read your results.
First of all it seemed to me that is not a problem at all since msdn
guarantees sector-by-sector atomicity.
"Physical Sector: The unit for which read and write operations to the device
are completed in a single operation. This is the unit of atomic write..."
https://learn.microsoft.com/en-us/windows/win32/fileio/file-buffering
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
(Of course, only if the 512 bytes lays from the beginning of the file with a
zero
offset, but this is our case. The current size of ControlFileData is
296 bytes at offset = 0.)
I tried to verify this fact experimentally and was very surprised.
Unfortunately it crashed in an hour during torture test:
2023-02-13 15:10:52.675 MSK [5704] LOG: starting PostgreSQL 16devel, compiled
by Visual C++ build 1929, 64-bit
2023-02-13 15:10:52.768 MSK [5704] LOG: database system is ready to accept
connections
@@@@@@ sizeof(ControlFileData) = 296
.........
2023-02-13 16:10:41.997 MSK [9380] ERROR: calculated CRC checksum does not
match value stored in file
But fortunately, this only happens when fsync=off.
Also i did several experiments with fsync=on and found more appropriate
behavior:
The stress test with sizeof(ControlFileData) = 512+8 = 520 bytes failed in a
4,5 hours,
but the other one with ordinary sizeof(ControlFileData) = 296 not crashed in
more than 12 hours.
Seems in that case the behavior corresponds to msdn. So if it is possible
to use fsync() under windows when the GUC fsync is off it maybe a solution
for this problem. If so there is no need to lock the pg_control file under
windows at all.
May be choose it in accordance with GUC wal_sync_method?
Here's a patch like that. I don't know if it's a good idea for
wal_sync_method to affect other kinds of files or not, but, then, it
already does (fsync_writethough changes this behaviour).
+1. Looks like it needs a little fix:
+++ b/src/common/controldata_utils.c
@@ -316,7 +316,7 @@ update_controlfile(const char *DataDir,
if (pg_fsync(fd) != 0)
ereport(PANIC,
(errcode_for_file_access(),
- errmsg("could not fdatasync file
\"%s\": %m",
+ errmsg("could not fsync file
\"%s\": %m",
ControlFilePath)));
And it may be combined with 0001-Lock-pg_control-while-reading-or-writing.patch
I would
only want to consider this if we also stop choosing "open_datasync" as
a default on at least Windows.
I didn't quite understand this point. Could you clarify it for me, please? If
the performance
of UpdateMinRecoveryPoint() wasn't a problem we could just use fsync in all
platforms.
Sincerely yours,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company