Hello,
Running 13.1 on Windows Server 2019, I am getting the following log
entries occasionally:
2021-02-11 12:34:10.149 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
2021-02-11 12:40:31.377 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
2021-02-11 12:46:06.294 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
2021-02-11 12:46:16.502 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000DA": Permission denied
2021-02-11 12:50:20.917 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
2021-02-11 12:50:31.098 NZDT [6072] LOG: could not rename file
"pg_wal/0000000100000099000000DA": Permission denied
What appears to be happening is the affected WAL files (which is usually
only 2 or 3 WAL files at a time) are somehow "losing" their NTFS
permissions, so the PG process can't rename them - though of course the
PG process created them. Even running icacls as admin gives "Access is
denied" on those files. A further oddity is the affected files do end up
disappearing after a while.
The NTFS permissions on the pg_wal directory are correct, and most WAL
files are unaffected. Chkdsk reports no problems, and the database is
working fine otherwise. Have tried disabling antivirus software in case
that was doing something but no difference.
I found another recent report of similar behaviour here:
https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0000000100000001000000c6
WAL config as follows:
wal_level = replica
fsync = on
synchronous_commit = on
wal_sync_method = fsync
full_page_writes = on
wal_compression = off
wal_log_hints = off
wal_init_zero = on
wal_recycle = on
wal_buffers = -1
wal_writer_delay = 200ms
wal_writer_flush_after = 1MB
wal_skip_threshold = 2MB
commit_delay = 0
commit_siblings = 5
checkpoint_timeout = 5min
max_wal_size = 2GB
min_wal_size = 256MB
checkpoint_completion_target = 0.7
checkpoint_flush_after = 0
checkpoint_warning = 30s
archive_mode = off
I'm thinking of disabling wal_recycle as a first step to see if that
makes any difference, but thought I'd seek some comments first.
Not sure how much of a problem this is - the database is running fine
otherwise - but any thoughts would be appreciated.
Thanks & regards,
Guy