On Sun, Jul 12, 2020 at 4:29 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > On Sun, Jul 12, 2020 at 07:48:50AM +0200, Julien Rouhaud wrote: > > Currently, getTableAttrs() always retrieves info about columns defaults and > > check constraints, while this will never be used if --data-only option if > > used. > > This seems like a waste of resources, so here's a patch to skip those parts > > when the DDL won't be generated. > > Note that the speed of default and constraint handling has come up before: > https://www.postgresql.org/message-id/flat/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com > https://www.postgresql.org/message-id/CAMkU=1x-e+maqefhm1ymesij8j9z+sjhgw7c9bqo3e3jmg4...@mail.gmail.com
Oh, I wasn't aware of that. > I'd pointed out that a significant fraction of our pg_upgrade time was in > pg_dump, due to having wide tables with many child tables, and "default 0" on > every column. (I've since dropped our defaults so this is no longer an issue > here). > > It appears your patch would avoid doing unnecessary work in the --data-only > case, but it wouldn't help the pg_upgrade case. Indeed. Making the schema part faster would probably require a bigger refactoring. I'm wondering if we could introduce some facility to temporarily deny any DDL change, so that the initial pg_dump -s done by pg_upgrade can be performed before shutting down the instance. Note that those extraneous queries were found while trying to dump data out of a corrupted database. The issue wasn't an excessive runtime but corrupted catalog entries, so bypassing this code (since I was only interested in the data anyway) was simpler than trying to recover yet other corrupted rows.