On 08/09/2018 07:47 PM, Alvaro Herrera wrote:
On 2018-Aug-09, Tomas Vondra wrote:
I suppose there are reasons why it's done this way, and admittedly the test
that happens to trigger this is a bit extreme (essentially running pgbench
concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
extreme enough to deem it not an issue, because people using many temporary
tables often deal with bloat by doing frequent vacuum full on catalogs.
Actually, it seems to me that ApplyLogicalMappingFile is just leaking
the file descriptor for no good reason. There's a different
OpenTransientFile call in ReorderBufferRestoreChanges that is not
intended to be closed immediately, but the other one seems a plain bug,
easy enough to fix.
But wait, there's more - what happens after hitting the limit? We restart
the decoding process, and end up getting this:
ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open
file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
LOCATION: OpenTransientFile, fd.c:2161
LOG: 00000: starting logical decoding for slot "s"
DETAIL: streaming transactions committing after 1/6097DD48, reading
WAL from 1/60275848
LOCATION: CreateDecodingContext, logical.c:414
LOG: 00000: logical decoding found consistent point at 1/60275848
DETAIL: Logical decoding will begin using saved snapshot.
LOCATION: SnapBuildRestore, snapbuild.c:1872
ERROR: XX000: subtransaction logged without previous top-level txn
record
Hmm, I wonder if we introduced some bug in f49a80c481f7.
That's possible. I'm running on f85537a88d, i.e. with f49a80c481f7.
I'd say this implies the "can't open file" is handled in a way that corrupts
the mapping files, making it impossible to restart the decoding. AFAICS the
only option at this point is to drop the subscription and start over.
Wow, that seems pretty serious. Clearly that's a different bug that
must be fixed.
Yep.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services