On Mon, Jun 19, 2023 at 5:13 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Sun, Jun 18, 2023 at 12:18 AM Peter Geoghegan <p...@bowt.ie> wrote: > > > > On Sat, Jun 17, 2023 at 11:29 AM Jaime Casanova > > <jcasa...@systemguards.com.ec> wrote: > > > I have been testing 16beta1, last commit > > > a14e75eb0b6a73821e0d66c0d407372ec8376105 > > > I just let sqlsmith do its magic before trying something else, and > > > today I found a core with the attached backtrace. > > > > The assertion that fails is the IsPageLockHeld assertion from commit > > 72e78d831a. > > > > I'll look into this and share my analysis. >
This failure mode appears to be introduced in commit 7d71d3dd08 (in PG16) where we started to process the config file after acquiring page lock during autovacuum. The problem here is that after acquiring page lock (a heavy-weight lock), while processing the config file, we tried to access the catalog cache which in turn attempts to acquire a lock on the catalog relation, and that leads to the assertion failure. This is because of an existing rule that we don't acquire any other heavyweight lock while holding the page lock except for relation extension. I think normally we should be careful about the lock ordering for heavy-weight locks to avoid deadlocks but here there may not be any existing hazard in acquiring a lock on the catalog table after acquiring page lock on the gin index's metapage as I am not aware of a scenario where we can acquire them in reverse order. One naive idea is to have a parameter like vacuum_config_reload_safe to allow config reload during autovacuum and make it false for the gin index cleanup code path. The reason for the existing rule for page lock and relation extension locks was to not allow them to participate in group locking which will allow other parallel operations like a parallel vacuum where multiple workers can work on the same index, or parallel inserts, parallel copy, etc. The related commits are 15ef6ff4b9, 72e78d831ab, 85f6b49c2c, and 3ba59ccc89. See 3ba59ccc89 for more details (To allow parallel inserts and parallel copy, we have ensured that relation extension and page locks don't participate in group locking which means such locks can conflict among the same group members. This is required as it is no safer for two related processes to extend the same relation or perform clean up in gin indexes at a time than for unrelated processes to do the same....). -- With Regards, Amit Kapila.