dsm code

Thomas Munro Wed, 29 Jun 2022 03:18:20 -0700

On Wed, Jun 29, 2022 at 4:00 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> I suppose this could indicate that the machine and/or RAM disk is
> overloaded/swapping and one of those open() or unlink() calls is
> taking a really long time, and that could be fixed with some system
> tuning.


Hmm, I take that bit back.  Every backend that starts up is trying to
attach to the same segment, the one with the new pgstats stuff in it
(once the small space in the main shmem segment is used up and we
create a DSM segment).  There's no fairness/queue, random back-off or
guarantee of progress in that librt lock code, so you can get into
lock-step with other backends retrying, and although some waiter
always gets to make progress, any given backend can lose every round
and run out of retries.  Even when you're lucky and don't fail with an
undocumented incomprehensible error, it's very slow, and I'd
considering filing a bug report about that.  A work-around on
PostgreSQL would be to set dynamic_shared_memory_type to mmap (= we
just open our own files and map them directly), and making pg_dynshmem
a symlink to something under /tmp (or some other RAM disk) to avoid
touch regular disk file systems.

Re: margay fails assertion in stats/dsa/dsm code

Reply via email to