On 6/14/21 9:39 AM, Tom Lane wrote: > Andrew Dunstan <and...@dunslane.net> writes: >> I've been looking at the recent spate of intermittent failures on my >> Cygwin animal lorikeet. Most of them look something like this, where >> there's 'VACUUM FULL pg_class' and an almost simultaneous "CREATE TABLE' >> which fails. > Do you have any idea what "exit code 127" signifies on that platform? > (BTW, not all of them look like that; many are reported as plain > segfaults.) I hadn't spotted the association with a concurrent "VACUUM > FULL pg_class" before, that does seem interesting. > >> Getting stack traces in this platform can be very difficult. I'm going >> to try forcing complete serialization of the regression tests >> (MAX_CONNECTIONS=1) to see if the problem goes away. Any other >> suggestions might be useful. Note that we're not getting the same issue >> on REL_13_STABLE, where the same group pf tests run together (inherit >> typed_table, vacuum) > If it does go away, that'd be interesting, but I don't see how it gets > us any closer to a fix. Seems like a stack trace is a necessity to > narrow it down. > >
Some have given stack traces and some not, not sure why. The one from June 13 has this: ---- backtrace ---- ?? ??:0 WaitOnLock src/backend/storage/lmgr/lock.c:1831 LockAcquireExtended src/backend/storage/lmgr/lock.c:1119 LockRelationOid src/backend/storage/lmgr/lmgr.c:135 relation_open src/backend/access/common/relation.c:59 table_open src/backend/access/table/table.c:43 ScanPgRelation src/backend/utils/cache/relcache.c:322 RelationBuildDesc src/backend/utils/cache/relcache.c:1039 RelationIdGetRelation src/backend/utils/cache/relcache.c:2045 relation_open src/backend/access/common/relation.c:59 table_open src/backend/access/table/table.c:43 ExecInitPartitionInfo src/backend/executor/execPartition.c:510 ExecPrepareTupleRouting src/backend/executor/nodeModifyTable.c:2311 ExecModifyTable src/backend/executor/nodeModifyTable.c:2559 ExecutePlan src/backend/executor/execMain.c:1557 The line in lmgr.c is where the process title gets changed to "waiting". I recently stopped setting process title on this animal on REL_13_STABLE and its similar errors have largely gone away. I can do the same on HEAD. But it does make me wonder what the heck has changed to make this code fragile. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com