Tom Lane wrote: > What I'm suspicious of as the actual bug cause is the comment in > perform_work_item about how we need to be sure that we're allocating these > strings in a long-lived context. If, in fact, they were allocated in some > context that could get reset during the PG_TRY (particularly in the > error-cleanup path, which I bet we don't test), then the observed symptom > that the pointers seem to be pointing at garbage could be explained. > > So what I'm thinking is that you need an error during perform_work_item, > and/or more than one work_item picked up in the calling loop, to make this > bug manifest. You would need to enter perform_work_item in a > non-long-lived context, so the first time through the loop is probably > safe anyway.
I created a reproducer for this bug, by 1) adding some sleeps to make the summarization process take longer, 2) injecting errors randomly during the summarization run, 3) inserting lots of rows using the attached pgbench script running with 8 clients (creation script below). Takes less than a second to crash. What is happening is that we're freeing the strings (allocated in TopTransactionContext) even in the case where the PG_CATCH block aborts the transaction, which is obviously bogus. I moved the frees to inside the PG_TRY block and no crash occurs (I didn't like a 'goto' from outside to inside the PG_TRY block, so I duplicated those lines instead). I propose the attached patch. Before pushing, I'll give a look to the regular autovacuum path to see if it needs a similar fix. initialization: CREATE TABLE t (a integer); CREATE INDEX t_a_idx ON t USING brin (a) WITH (autosummarize='true', pages_per_range='16'); -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2543c9ee25245f63653bf342a0240eaa8a1dcd6f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Mon, 23 Oct 2017 18:55:12 +0200 Subject: [PATCH] Fix autovacuum work items --- src/backend/postmaster/autovacuum.c | 39 +++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 776b1c0a9d..f5aa520d39 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2525,12 +2525,15 @@ deleted: continue; if (workitem->avw_active) continue; + if (workitem->avw_database != MyDatabaseId) + continue; /* claim this one, and release lock while performing it */ workitem->avw_active = true; LWLockRelease(AutovacuumLock); perform_work_item(workitem); + /* we intentially do not set did_vacuum here */ /* * Check for config changes before acquiring lock for further @@ -2601,16 +2604,22 @@ perform_work_item(AutoVacuumWorkItem *workitem) /* * Save the relation name for a possible error message, to avoid a catalog * lookup in case of an error. If any of these return NULL, then the - * relation has been dropped since last we checked; skip it. Note: they - * must live in a long-lived memory context because we call vacuum and - * analyze in different transactions. + * relation has been dropped since last we checked; skip it. */ - + MemoryContextSwitchTo(TopTransactionContext); cur_relname = get_rel_name(workitem->avw_relation); cur_nspname = get_namespace_name(get_rel_namespace(workitem->avw_relation)); cur_datname = get_database_name(MyDatabaseId); if (!cur_relname || !cur_nspname || !cur_datname) - goto deleted2; + { + if (cur_datname) + pfree(cur_datname); + if (cur_nspname) + pfree(cur_nspname); + if (cur_relname) + pfree(cur_relname); + return; + } autovac_report_workitem(workitem, cur_nspname, cur_datname); @@ -2623,7 +2632,6 @@ perform_work_item(AutoVacuumWorkItem *workitem) PG_TRY(); { /* have at it */ - MemoryContextSwitchTo(TopTransactionContext); switch (workitem->avw_type) { @@ -2645,6 +2653,14 @@ perform_work_item(AutoVacuumWorkItem *workitem) * point.) */ QueryCancelPending = false; + + /* be tidy */ + if (cur_datname) + pfree(cur_datname); + if (cur_nspname) + pfree(cur_nspname); + if (cur_relname) + pfree(cur_relname); } PG_CATCH(); { @@ -2667,17 +2683,6 @@ perform_work_item(AutoVacuumWorkItem *workitem) RESUME_INTERRUPTS(); } PG_END_TRY(); - - /* We intentionally do not set did_vacuum here */ - - /* be tidy */ -deleted2: - if (cur_datname) - pfree(cur_datname); - if (cur_nspname) - pfree(cur_nspname); - if (cur_relname) - pfree(cur_relname); } /* -- 2.11.0
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index efebeb035a..6dad516a5c 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -1286,6 +1286,8 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, BrinTuple *tup; OffsetNumber off; + pg_usleep(1000 * 10); /* 10 ms */ + CHECK_FOR_INTERRUPTS(); tup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL, diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index f5aa520d39..57b488d7a5 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2633,6 +2633,10 @@ perform_work_item(AutoVacuumWorkItem *workitem) { /* have at it */ + if (random() > RAND_MAX * 0.9) + elog(ERROR, "some error in block %u", + workitem->avw_blockNumber); + switch (workitem->avw_type) { case AVW_BRINSummarizeRange:
\set c1 random(1, 100) do $$ begin for i in 1 .. :c1 loop insert into t values (ceil(random() * 131072)); end loop; end; $$;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers