On 2021-Apr-08, Tom Lane wrote:

> Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> > autovacuum: handle analyze for partitioned tables
> 
> Looks like this has issues under EXEC_BACKEND:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08

Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
think this is unrelated to that, but rather a race condition.

The backtrace saved by buildfarm is:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  relation_needs_vacanalyze (relid=relid@entry=43057, 
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, 
tabentry=0x5611ec71b030, 
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
 dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, 
wraparound=0x7ffd78cc4ee2) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
3237                                    childclass = (Form_pg_class) 
GETSTRUCT(childtuple);
#0  relation_needs_vacanalyze (relid=relid@entry=43057, 
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, 
tabentry=0x5611ec71b030, 
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
 dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, 
wraparound=0x7ffd78cc4ee2) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
#1  0x00005611eb09fc91 in do_autovacuum () at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:2168
#2  0x00005611eb0a0f8b in AutoVacWorkerMain (argc=argc@entry=1, 
argv=argv@entry=0x5611ec61f1e0) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:1715

the code in question is:

                        children = find_all_inheritors(relid, AccessShareLock, 
NULL);

                        foreach(lc, children)
                        {
                                Oid                     childOID = 
lfirst_oid(lc);
                                HeapTuple       childtuple;
                                Form_pg_class childclass;

                                childtuple = SearchSysCache1(RELOID, 
ObjectIdGetDatum(childOID));
                                childclass = (Form_pg_class) 
GETSTRUCT(childtuple);

Evidently SearchSysCache must be returning NULL, but how come that
happens, when we have acquired lock on the rel during
find_all_inheritors?

I would suggest that we do not take lock here at all, and just skip the
rel if SearchSysCache returns empty, as in the attached.  Still, I am
baffled about this crash.

-- 
Álvaro Herrera       Valdivia, Chile
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)
>From 2bb3e54862c37ee2a20fed21513a3df309381919 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Thu, 8 Apr 2021 11:10:44 -0400
Subject: [PATCH] Fix race condition in relation_needs_vacanalyze

---
 src/backend/postmaster/autovacuum.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aef9ac4dd2..96073d4597 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3223,18 +3223,23 @@ relation_needs_vacanalyze(Oid relid,
 			ListCell   *lc;
 
 			reltuples = 0;
 
-			/* Find all members of inheritance set taking AccessShareLock */
-			children = find_all_inheritors(relid, AccessShareLock, NULL);
+			/*
+			 * Find all members of inheritance set.  Beware that they may
+			 * disappear from under us, since we don't acquire any locks.
+			 */
+			children = find_all_inheritors(relid, NoLock, NULL);
 
 			foreach(lc, children)
 			{
 				Oid			childOID = lfirst_oid(lc);
 				HeapTuple	childtuple;
 				Form_pg_class childclass;
 
 				childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID));
+				if (childtuple == NULL)
+					continue;
 				childclass = (Form_pg_class) GETSTRUCT(childtuple);
 
 				/* Skip a partitioned table and foreign partitions */
 				if (RELKIND_HAS_STORAGE(childclass->relkind))
-- 
2.20.1

Reply via email to