Greetings, * Corey Huinker (corey.huin...@gmail.com) wrote: > On Thu, Feb 15, 2024 at 4:09 AM Corey Huinker <corey.huin...@gmail.com> > wrote: > > Posting v5 updates of pg_import_rel_stats() and pg_import_ext_stats(), > > which address many of the concerns listed earlier. > > > > Leaving the export/import scripts off for the time being, as they haven't > > changed and the next likely change is to fold them into pg_dump.
> v6 posted below. > > Changes: > > - Additional documentation about the overall process. > - Rewording of SGML docs. > - removed a fair number of columns from the transformation queries. > - enabled require_match_oids in extended statistics, but I'm having my > doubts about the value of that. > - moved stats extraction functions to an fe_utils file stats_export.c that > will be used by both pg_export_stats and pg_dump. > - pg_export_stats now generates SQL statements rather than a tsv, and has > boolean flags to set the validate and require_match_oids parameters in the > calls to pg_import_(rel|ext)_stats. > - pg_import_stats is gone, as importing can now be done with psql. Having looked through this thread and discussed a bit with Corey off-line, the approach that Tom laid out up-thread seems like it would make the most sense overall- that is, eliminate the JSON bits and the SPI and instead export the stats data by running queries from the new version of pg_dump/server (in the FDW case) against the old server with the intelligence of how to transform the data into the format needed for the current pg_dump/server to accept, through function calls where the function calls generally map up to the rows/information being updated- a call to update the information in pg_class for each relation and then a call for each attribute to update the information in pg_statistic. Part of this process would include mapping from OIDs/attrnum's to names on the source side and then from those names to the appropriate OIDs/attrnum's on the destination side. As this code would be used by both pg_dump and the postgres_fdw, it seems logical that it would go into the common library. Further, it would make sense to have this code be able to handle multiple major versions for the foreign side, such as how postgres_fdw and pg_dump already do. In terms of working to ensure that newer versions support loading from older dumps (that is, that v18 would be able to load a dump file created by a v17 pg_dump against a v17 server in the face of changes having been made to the statistics system in v18), we could have the functions take a version parameter (to handle cases where the data structure is the same but the contents have to be handled differently), use overloaded functions, or have version-specific names for the functions. I'm also generally supportive of the idea that we, perhaps initially, only support dumping/loading stats with pg_dump when in binary-upgrade mode, which removes our need to be concerned with this (perhaps that would be a good v1 of this feature?) as the version of pg_dump needs to match that of pg_upgrade and the destination server for various other reasons. Including a switch to exclude stats on restore might also be an acceptable answer, or even simply excluding them by default when going between major versions except in binary-upgrade mode. Along those same lines when it comes to a 'v1', I'd say that we may wish to consider excluding extended statistics, which I am fairly confident Corey's heard a number of times previously already but thought I would add my own support for that. To the extent that we do want to make extended stats work down the road, we should probably have some pre-patches to flush out the missing _in/_recv functions for those types which don't have them today- and that would include modifying the _out of those types to use names instead of OIDs/attrnums. In thinking about this, I was reviewing specifically pg_dependencies. To the extent that there are people who depend on the current output, I would think that they'd actually appreciate this change. I don't generally feel like we need to be checking that the OIDs between the old server and the new server match- I appreciate that that should be the case in a binary-upgrade situation but it still feels unnecessary and complicated and clutters up the output and the function calls. Overall, I definitely think this is a good project to work on as it's an often, rightfully, complained about issue when it comes to pg_upgrade and the amount of downtime required for it before the upgraded system can be reasonably used again. Thanks, Stephen
signature.asc
Description: PGP signature