Possible performance regression with pg_dump of a large number of relations
I've been troubleshooting an issue with slow pg_dump times on postgres 9.6.6. I believe something changed between 9.5.10 and 9.6.6 that has made dumps significantly slower for databases with a large number of relations. I posted this in irc and someone suggested that I should post this here. I'm sorry if this isn't the right place. To simulate the issue I generated 150,000 relations spread across 1000 schemas (this roughly reflects my production setup). ```ruby File.write "many_relations.sql", (15 / 150).times.flat_map {|n| [ "create schema s_#{n};", 150.times.map do |t| "create table s_#{n}.test_#{t} (id int);" end ] }.join("\n") ``` I have 2 identical pieces of hardware. I've installed 9.5 on one and 9.6 on the other. I've run the same generated piece of sql in a fresh database on both systems. On my 9.5.10 system: > time pg_dump -n s_10 testing > /dev/null real0m5.492s user0m1.424s sys 0m0.184s On my 9.6.6 system: > time pg_dump -n s_10 testing > /dev/null real0m27.342s user0m1.748s sys 0m0.248s If I call that same pg_dump command with the verbose option, the delay is at `pg_dump: reading user-defined tables` step. I don't have identical hardware, so I can't say for sure, but I believe this issue is still present in 10.1. Is this a legitimate issue? Is there more information I can provide to help better assess the situation? Thanks in advance everyone! Luke
Re: Possible performance regression with pg_dump of a large number of relations
> On Jan 12, 2018, at 8:01 AM, Jeff Janes wrote: > > On Thu, Jan 11, 2018 at 5:26 PM, Luke Cowell <mailto:lcow...@gmail.com>> wrote: > I've been troubleshooting an issue with slow pg_dump times on postgres 9.6.6. > I believe something changed between 9.5.10 and 9.6.6 that has made dumps > significantly slower for databases with a large number of relations. I posted > this in irc and someone suggested that I should post this here. I'm sorry if > this isn't the right place. > > To simulate the issue I generated 150,000 relations spread across 1000 > schemas (this roughly reflects my production setup). > > ```ruby > File.write "many_relations.sql", (15 / 150).times.flat_map {|n| > [ >"create schema s_#{n};", >150.times.map do |t| > "create table s_#{n}.test_#{t} (id int);" >end >] > }.join("\n") > ``` > > I have 2 identical pieces of hardware. I've installed 9.5 on one and 9.6 on > the other. I've run the same generated piece of sql in a fresh database on > both systems. > > On my 9.5.10 system: > > time pg_dump -n s_10 testing > /dev/null > real0m5.492s > user0m1.424s > sys 0m0.184s > > On my 9.6.6 system: > > time pg_dump -n s_10 testing > /dev/null > real0m27.342s > user0m1.748s > sys 0m0.248s > > I don't get quite as large a regression as you do, from 6s to 19s. It looks > like there are multiple of them, but the biggest is caused by: > > commit 5d589993cad212f7d556d52cc1e42fe18f65b057 > Author: Stephen Frost mailto:sfr...@snowman.net>> > Date: Fri May 6 14:06:50 2016 -0400 > > pg_dump performance and other fixes > > That commit covered a few different things, and I don't what improvement it > mentions is the one that motivated this, but the key change was to add this > query: > > EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid = > pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname = > 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND > at.attnum>0 and ((SELECT count(acl) FROM (SELECT > unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner as foo) >1 OR > (SELECT count(acl) FROM (SELECT > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT > SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner as foo) >0))AS > changed_acl > > Considering it runs 2 subqueries for every column (including the 6 hidden > system columns) of every table, even ones that don't end up getting dumped > out, it is no wonder it is slow. > > If you were just dumping the database with 150,000 objects, I wouldn't worry > about a 20 second regression. But I assume you intend to loop over every > schema and dump each individually? > > Cheers, > > Jeff Hi Jeff, thanks for your attention on this. Yes, that is exactly our use case. We dump each schema individually so we would be paying that 20 second penalty each time. As a workaround I've been dumping the schemas in batches of 20, but this isn't really ideal as we'll lose access to a number of our existing workflows. Luke
Re: Possible performance regression with pg_dump of a large number of relations
Hi Stephen, thank you for putting this together. > If folks get a chance to take a look at the query and/or test, that'd be > great. I'll try to work up an actual patch to pg_dump this weekend to > run it through the regression tests and see if anything breaks. I'm not sure how I can help other than testing that this runs. I can confirm that it runs on 10.1. It does not run on 9.5 or 9.6 and gives this error: > ERROR: relation "pg_init_privs" does not exist > LINE 139: LEFT JOIN pg_init_privs pip I'm guessing that the error is not surprising and that the query is intended for an upcoming release of postgres and wouldn't be backported to 9.6.x Luke
Re: Possible performance regression with pg_dump of a large number of relations
> On Jan 24, 2018, at 2:56 PM, Stephen Frost wrote: > > Hi there! > > >>> ERROR: relation "pg_init_privs" does not exist >>> LINE 139: LEFT JOIN pg_init_privs pip > > I certainly hope that works on 9.6, since that's when pg_init_privs was > added.. My mistake. That error is from my 9.5 server. It does error on 9.6, but I get the following error: $ psql --version psql (PostgreSQL) 9.6.6 $ psql postgres < query.sql ERROR: function pg_get_partkeydef(oid) does not exist LINE 126: pg_get_partkeydef(c.oid) AS partkeydef, ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. > Presuming I can make it work, the idea would be to back-port it to 9.6 > and 10, since pg_init_privs and this code was added in 9.6. A 9.6 backport would be excellent. Thanks again! Luke