On 2021-Apr-08, Tom Lane wrote: > Alvaro Herrera <alvhe...@alvh.no-ip.org> writes: > > autovacuum: handle analyze for partitioned tables > > Looks like this has issues under EXEC_BACKEND: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08
Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I think this is unrelated to that, but rather a race condition. The backtrace saved by buildfarm is: Program terminated with signal SIGSEGV, Segmentation fault. #0 relation_needs_vacanalyze (relid=relid@entry=43057, relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, tabentry=0x5611ec71b030, effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000, dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, wraparound=0x7ffd78cc4ee2) at /mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237 3237 childclass = (Form_pg_class) GETSTRUCT(childtuple); #0 relation_needs_vacanalyze (relid=relid@entry=43057, relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, tabentry=0x5611ec71b030, effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000, dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, wraparound=0x7ffd78cc4ee2) at /mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237 #1 0x00005611eb09fc91 in do_autovacuum () at /mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:2168 #2 0x00005611eb0a0f8b in AutoVacWorkerMain (argc=argc@entry=1, argv=argv@entry=0x5611ec61f1e0) at /mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:1715 the code in question is: children = find_all_inheritors(relid, AccessShareLock, NULL); foreach(lc, children) { Oid childOID = lfirst_oid(lc); HeapTuple childtuple; Form_pg_class childclass; childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID)); childclass = (Form_pg_class) GETSTRUCT(childtuple); Evidently SearchSysCache must be returning NULL, but how come that happens, when we have acquired lock on the rel during find_all_inheritors? I would suggest that we do not take lock here at all, and just skip the rel if SearchSysCache returns empty, as in the attached. Still, I am baffled about this crash. -- Álvaro Herrera Valdivia, Chile "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
>From 2bb3e54862c37ee2a20fed21513a3df309381919 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Thu, 8 Apr 2021 11:10:44 -0400 Subject: [PATCH] Fix race condition in relation_needs_vacanalyze --- src/backend/postmaster/autovacuum.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index aef9ac4dd2..96073d4597 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -3223,18 +3223,23 @@ relation_needs_vacanalyze(Oid relid, ListCell *lc; reltuples = 0; - /* Find all members of inheritance set taking AccessShareLock */ - children = find_all_inheritors(relid, AccessShareLock, NULL); + /* + * Find all members of inheritance set. Beware that they may + * disappear from under us, since we don't acquire any locks. + */ + children = find_all_inheritors(relid, NoLock, NULL); foreach(lc, children) { Oid childOID = lfirst_oid(lc); HeapTuple childtuple; Form_pg_class childclass; childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID)); + if (childtuple == NULL) + continue; childclass = (Form_pg_class) GETSTRUCT(childtuple); /* Skip a partitioned table and foreign partitions */ if (RELKIND_HAS_STORAGE(childclass->relkind)) -- 2.20.1