On Wed, Nov 16, 2022 at 1:24 AM <klaus.mailingli...@pernau.at> wrote: > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. > Kernel is 5.15.0-52-generic. > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > have noticed it). > > I guess upgrading to postgresql 13/14/15 does not help as the problem > happens in the kernel. > > Do you have any advice how to go further? Shall I lookout for certain > kernel changes? In the kernel itself or in ext4 changelog?
It'd be good to figure out what is up with Linux or tuning. I'll go write a patch to reduce that error level for non-EIO errors, to discuss for the next point release. In the meantime, you could experiment with setting checkpoint_flush_after to 0, so the checkpointer/bgwriter/other backends don't call sync_file_range() all day long. That would have performance consequences for checkpoints which might be unacceptable though. The checkpointer will fsync relations one after another, with less I/O concurrency. Linux is generally quite lazy at writing back dirty data, and doesn't know about our checkpointer's plans to fsync files on a certain schedule, which is why we ask it to get started on multiple files concurrently using sync_file_range(). https://www.postgresql.org/docs/15/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS