On Wed, Jun 29, 2022 at 4:00 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > I suppose this could indicate that the machine and/or RAM disk is > overloaded/swapping and one of those open() or unlink() calls is > taking a really long time, and that could be fixed with some system > tuning.
Hmm, I take that bit back. Every backend that starts up is trying to attach to the same segment, the one with the new pgstats stuff in it (once the small space in the main shmem segment is used up and we create a DSM segment). There's no fairness/queue, random back-off or guarantee of progress in that librt lock code, so you can get into lock-step with other backends retrying, and although some waiter always gets to make progress, any given backend can lose every round and run out of retries. Even when you're lucky and don't fail with an undocumented incomprehensible error, it's very slow, and I'd considering filing a bug report about that. A work-around on PostgreSQL would be to set dynamic_shared_memory_type to mmap (= we just open our own files and map them directly), and making pg_dynshmem a symlink to something under /tmp (or some other RAM disk) to avoid touch regular disk file systems.