We recently saw a follower database taken down as the result of maxed out
connection slots.

The logs showed that the lock was held by PID 7 and was blocking an
AccessShareLock

[9-1] sql_error_code = 00000 LOG: process 5148 still waiting for
AccessShareLock on relation 2840 of database 16402 after 1000.103 ms
9-2] sql_error_code = 00000 DETAIL: Process holding the lock: 7. Wait
queue: ...

Given that I did not catch the locking in progress I am having to rely on
these logs to determine what occurred. The  waiting for an AccessShareLock
leads me to believe that the actual lock must have been AccessExclusive.

Relation 2840 is a TOAST table whose primary table is `pg_statistic`.

One other piece of potential evidence was that there were auto vacuums
occurring on the primary around this time.

Outside of the lock logs there is no logging around what PID7 was doing.
Additionally the other followers in the formation did not suffer from the
same locking.

Any ideas on what this might be or how we could further troubleshoot this
issue?

-- 
Andy Cooper

Reply via email to