pg_dump does way too much before acquiring table locks

Tom Lane Sun, 17 Oct 2021 11:45:33 -0700

While poking at pg_dump for some work I'll show later, I grew quite
unhappy with the extent to which people have ignored this advice
given near the head of getTables():


     * Note: in this phase we should collect only a minimal amount of
     * information about each table, basically just enough to decide if it is
     * interesting. We must fetch all tables in this phase because otherwise
     * we cannot correctly identify inherited columns, owned sequences, etc.

Far from collecting "a minimal amount of information", we have
repeatedly stuffed extra joins and expensive sub-selects into this
query.  That's fairly inefficient if we aren't interested in dumping
every table, but it's much worse than that: we are collecting all this
info *before we have lock* on the tables.  That means we are at
serious risk of data skew in any place where we consult server-side
functions, eg pg_get_partkeydef().  For example:

regression=# create table foo(f1 int) partition by range(f1);
CREATE TABLE
regression=# begin; drop table foo;
BEGIN
DROP TABLE

Now start a pg_dump in another shell session, wait a second or two,
and

regression=*# commit;
COMMIT

and the pg_dump blows up:

$ pg_dump -s regression
pg_dump: error: query failed: ERROR:  could not open relation with OID 37796

(Note: it's not so easy to provoke this failure manually before
e2ff7d9a8 guaranteed that pg_dump would wait around for you,
but it's certainly possible.)

To add insult to injury, all that work being done in this query
makes the time window for trouble wider.

Testing on the regression database, which isn't all that big,
I observe getTable's query taking 140 ms, substantially longer
than the next slowest thing done by "pg_dump -s regression".
It seems that the largest chunk of this can be blamed on the
sub-selects added by the pg_init_privs patch: EXPLAIN ANALYZE
puts their total runtime at ~100ms.

I believe that the pg_init_privs sub-selects can probably be
nuked, or at least their cost can be paid later.  The other work
I mentioned has proven that we do not actually need to know
the ACL situation to determine whether a table is "interesting",
so we don't have to do that work before acquiring lock.

However, that still leaves us doing several inessential joins,
not to mention those unsafe partition-examination functions,
before acquiring lock.

I am thinking that the best fix is to make getTables() perform
two queries (or, possibly, split it into two functions).  The
first pass would acquire only the absolutely minimal amount of
data needed to decide whether a table is "interesting", and then
lock such tables.  Then we'd go back to fill in the remaining
data.  While it's fairly annoying to read pg_class twice, it
looks like the extra query might only take a handful of msec.

Also, if we don't do it like that, it seems that we'd have to
add entirely new queries to call pg_get_partkeydef() and
pg_get_expr(relpartbound) in.  So I'm not really seeing a
better-performing alternative.

Thoughts?

                        regards, tom lane

pg_dump does way too much before acquiring table locks

Reply via email to