On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker <corey.huin...@gmail.com> wrote: > > Anyway, here's v7. Eagerly awaiting feedback.
Thanks for working on this. It looks useful to have the ability to restore the stats after upgrade and restore. But, the exported stats are valid only until the next ANALYZE is run on the table. IIUC, postgres collects stats during VACUUM, autovacuum and ANALYZE, right? Perhaps there are other ways to collect stats. I'm thinking what problems does the user face if they are just asked to run ANALYZE on the tables (I'm assuming ANALYZE doesn't block concurrent access to the tables) instead of automatically exporting stats. Here are some comments on the v7 patch. I've not dived into the whole thread, so some comments may be of type repeated or need clarification. Please bear with me. 1. The following two are unnecessary changes in pg_proc.dat, please remove them. --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8818,7 +8818,6 @@ { oid => '3813', descr => 'generate XML text node', proname => 'xmltext', proisstrict => 't', prorettype => 'xml', proargtypes => 'text', prosrc => 'xmltext' }, - { oid => '2923', descr => 'map table contents to XML', proname => 'table_to_xml', procost => '100', provolatile => 's', proparallel => 'r', prorettype => 'xml', @@ -12163,8 +12162,24 @@ # GiST stratnum implementations { oid => '8047', descr => 'GiST support', - proname => 'gist_stratnum_identity', prorettype => 'int2', + proname => 'gist_stratnum_identity',prorettype => 'int2', proargtypes => 'int2', prosrc => 'gist_stratnum_identity' }, 2. + they are replaced by the next auto-analyze. This function is used by + <command>pg_upgrade</command> and <command>pg_restore</command> to + convey the statistics from the old system version into the new one. + </para> Is there any demonstration of pg_set_relation_stats and pg_set_attribute_stats being used either in pg_upgrade or in pg_restore? Perhaps, having them as 0002, 0003 and so on patches might show real need for functions like this. It also clarifies how these functions pull stats from tables on the old cluster to the tables on the new cluster. 3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing to pg_class and might affect the plans as stats can get tampered. Can we REVOKE the execute permissions from the public out of the box in src/backend/catalog/system_functions.sql? This way one can decide who to give permissions to. 4. +SELECT pg_set_relation_stats('stats_export_import.test'::regclass, 3.6::float4, 15000); + pg_set_relation_stats +----------------------- + t +(1 row) + +SELECT reltuples, relpages FROM pg_class WHERE oid = 'stats_export_import.test'::regclass; + reltuples | relpages +-----------+---------- + 3.6 | 15000 Isn't this test case showing a misuse of these functions? Table actually has no rows, but we are lying to the postgres optimizer on stats. I think altering stats of a table mustn't be that easy for the end user. As mentioned in comment #3, permissions need to be tightened. In addition, we can also mark the functions pg_upgrade only with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore (or I don't know if we have a way to know within the server that the server is running for pg_restore). 5. In continuation to the comment #2, is pg_dump supposed to generate pg_set_relation_stats and pg_set_attribute_stats statements for each table? When pg_dump does that , pg_restore can automatically load the stats. 6. +/*------------------------------------------------------------------------- * * statistics.c * + * IDENTIFICATION + * src/backend/statistics/statistics.c + * + *------------------------------------------------------------------------- A description of what the new file statistics.c does is missing. 7. pgindent isn't happy with new file statistics.c, please check. 8. +/* + * Import statistics for a given relation attribute + * + * pg_set_attribute_stats(relation Oid, attname name, stainherit bool, + * stanullfrac float4, stawidth int, stadistinct float4, Having function definition in the function comment isn't necessary - it's hard to keep it consistent with pg_proc.dat in future. If required, one can either look at pg_proc.dat or docs. 9. Isn't it good to add a test case where the plan of a query on table after exporting the stats would remain same as that of the original table from which the stats are exported? IMO, this is a more realistic than just comparing pg_statistic of the tables because this is what an end-user wants eventually. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com