> Can you expand on some of those cases? Certainly. I think one of the problems is that because this patch is solving a pg_upgrade issue, the focus is on the "dump and restore" scenarios. But pg_dump is used for much more than that, especially "dump and examine".
Although pg_dump is meant to be a canonical, logical representation of your schema and data, the stats add a non-determinant element to that. Statistical sampling is random, so pg_dump output changes with each run. (yes, COPY can also change, but much less so, as I argue later). One use case is a program that is simply using pg_dump to verify that nothing has modified your table data (I'll use a single table for these examples, but obviously this applies to a whole database as well). So let's say we create a table and populate it at time X, then check back at a later time to verify things are still exactly as we left them. dropdb gregtest createdb gregtest pgbench gregtest -i 2> /dev/null pg_dump gregtest -t pgbench_accounts > a1 sleep 10 pg_dump gregtest -t pgbench_accounts > a2 diff a1 a2 | cut -c1-50 100078c100078 < 'histogram_bounds', '{2,964,1921,2917,3892,4935 --- > 'histogram_bounds', '{7,989,1990,2969,3973,4977 While COPY is not going to promise a particular output order, the order should not change except for manual things: insert, update, delete, truncate, vacuum full, cluster (off the top of my head). What should not change the output is a background process gathering some metadata. Or someone running a database-wide ANALYZE. Another use case is someone rolling out their schema to a QA box. All the table definitions and data are checked into a git repository, with a checksum. They want to roll it out, and then verify that everything is exactly as they expect it to be. Or the program is part of a test suite that does a sanity check that the database is in an exact known state before starting. (Our system catalogs are very difficult when reverse engineering objects. Thus, many programs rely on pg_dump to do the heavy lifting for them. Parsing the text file generated by pg_dump is much easier than trying to manipulate the system catalogs.) So let's say the process is to create a new database, load things into it, and then checksum the result. We can simulate that with pg_bench: dropdb qa1; dropdb qa2 createdb qa1; createdb qa2 pgbench qa1 -i 2>/dev/null pgbench qa2 -i 2>/dev/null pg_dump qa1 > dump1; pg_dump qa2 > dump2 $ md5sum dump1 39a2da5e51e8541e9a2c025c918bf463 dump1 This md5sum does not match our repo! It doesn't even match the other one: $ md5sum dump2 4a977657dfdf910cb66c875d29cfebf2 dump2 It's the stats, or course, which has added a dose of randomness that was not there before, and makes our checksums useless: $ diff dump1 dump2 | cut -c1-50 100172c100172 < 'histogram_bounds', '{1,979,1974,2952,3973,4900 --- > 'histogram_bounds', '{8,1017,2054,3034,4045,513 With --no-statistics, the diff shows no difference, and the md5sum is always the same. Just to be clear, I love this patch, and I love the fact that one of our major upgrade warts is finally getting fixed. I've tried fixing it myself a few times over the last decade or so, but lacked the skills to do so. :) So I am thrilled to have this finally done. I just don't think it should be enabled by default for everything using pg_dump. For the record, I would not strongly object to having stats on by default for binary dumps, although I would prefer them off. So why not just expect people to modify their programs to use --no-statistics for cases like this? That's certainly an option, but it's going to break a lot of existing things, and create branching code: old code: pg_dump mydb -f pg.dump new code: if pg_dump.version >= 18 pg_dump --no-statistics mydb -f pg.dump else pg_dump mydb -f pg.dump Also, anything trained to parse pg_dump output will have to learn about the new SELECT pg_restore_ calls with their multi-line formats (not 100% sure we don't have that anywhere, as things like "SELECT setval" and "SELECT set_config" are single line, but there may be existing things) Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support