Andres Freund <and...@anarazel.de> writes: > And in contrast to analyzing the database in parallel, the pg_dump/restore > work to restore stats afaict happens single-threaded for each database.
In principle we should be able to do stats dump/restore parallelized just as we do for data. There are some stumbling blocks in the way of that: 1. pg_upgrade has made a policy judgement to apply parallelism across databases not within a database, ie it will launch concurrent dump/ restore tasks in different DBs but not authorize any one of them to eat multiple CPUs. That needs to be re-thought probably, as I think that decision dates to before we had useful parallelism in pg_dump and pg_restore. I wonder if we could just rip out pg_upgrade's support for DB-level parallelism, which is not terribly pretty anyway, and simply pass the -j switch straight to pg_dump and pg_restore. 2. pg_restore should already be able to perform stats restores in parallel (if authorized to use multiple threads), but I'm less clear on whether that works right now for pg_dump. 3. Also, parallel restore depends critically on the TOC entries' dependencies being sane, and right now I do not think they are. I looked at "pg_restore -l -v" output for the regression DB, and it seems like it's not taking care to ensure that table/MV data is loaded before the table/MV's stats. (Maybe that accounts for some of the complaints we've seen about stats getting mangled??) > I think the stats need to be handled much more like we handle the actual table > data, which are obviously *not* stored in memory for the whole run of pg_dump. +1 regards, tom lane