On 08/10/2018 11:13 PM, Andres Freund wrote: > On 2018-08-10 22:57:57 +0200, Tomas Vondra wrote: >> >> >> On 08/09/2018 07:47 PM, Alvaro Herrera wrote: >>> On 2018-Aug-09, Tomas Vondra wrote: >>> >>>> I suppose there are reasons why it's done this way, and admittedly the test >>>> that happens to trigger this is a bit extreme (essentially running pgbench >>>> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's >>>> extreme enough to deem it not an issue, because people using many temporary >>>> tables often deal with bloat by doing frequent vacuum full on catalogs. >>> >>> Actually, it seems to me that ApplyLogicalMappingFile is just leaking >>> the file descriptor for no good reason. There's a different >>> OpenTransientFile call in ReorderBufferRestoreChanges that is not >>> intended to be closed immediately, but the other one seems a plain bug, >>> easy enough to fix. >>> >> >> Indeed. Adding a CloseTransientFile to ApplyLogicalMappingFile solves >> the issue with hitting maxAllocatedDecs. Barring objections I'll commit >> this shortly. > > Yea, that's clearly a bug. I've not seen a patch, so I can't quite > formally sign off, but it seems fairly obvious. > > >> But while running the tests on this machine, I repeatedly got pgbench >> failures like this: >> >> client 2 aborted in command 0 of script 0; ERROR: could not read block >> 3 in file "base/16384/24573": read only 0 of 8192 bytes >> >> That kinda reminds me the issues we're observing on some buildfarm >> machines, I wonder if it's the same thing. > > Oooh, that's interesting! What's the precise recipe that gets you there? >
I don't have an exact reproducer - it's kinda rare and unpredictable, and I'm not sure how much it depends on the environment etc. But I'm doing this: 1) one cluster with publication (wal_level=logical) 2) one cluster with subscription to (1) 3) simple table, replicated from (1) to (2) -- publisher create table t (a serial primary key, b int, c int); create publication p for table t; -- subscriber create table t (a serial primary key, b int, c int); create subscription s CONNECTION '...' publication p; 4) pgbench inserting rows into the replicated table pgbench -n -c 4 -T 300 -p 5433 -f insert.sql test 5) pgbench doing vacuum full on pg_class pgbench -n -f vacuum.sql -T 300 -p 5433 test And once in a while I see failures like this: client 0 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/86242": read only 0 of 8192 bytes client 3 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/86242": read only 0 of 8192 bytes client 2 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/86242": read only 0 of 8192 bytes or this: client 2 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/89369": read only 0 of 8192 bytes client 1 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/89369": read only 0 of 8192 bytes I suspect there's some other ingredient, e.g. some manipulation with the subscription. Or maybe it's not needed at all and I'm just imagining things. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
vacuum.sql
Description: application/sql
insert.sql
Description: application/sql