On Tue, Aug 12, 2025 at 10:33 PM Corey Huinker <corey.huin...@gmail.com> wrote: > > > Attached is my current work on adding remote fetching of statistics to > postgres_fdw, and opening the possibility of doing so to other foreign data > wrappers. > > This involves adding two new options to postgres_fdw at the server and table > level. > > The first option, fetch_stats, defaults to true at both levels. If enabled, > it will cause an ANALYZE of a postgres_fdw foreign table to first attempt to > fetch relation and attribute statistics from the remote table. If this > succeeds, then those statistics are imported into the local foreign table, > allowing us to skip a potentially expensive sampling operation. > > The second option, remote_analyze, defaults to false at both levels, and only > comes into play if the first fetch succeeds but no attribute statistics (i.e. > the stats from pg_stats) are found. If enabled then the function will attempt > to ANALYZE the remote table, and if that is successful then a second attempt > at fetching remote statistics will be made. > > If no statistics were fetched, then the operation will fall back to the > normal sampling operation per settings. > > Note patches 0001 and 0002 are already a part of a separate thread > https://www.postgresql.org/message-id/flat/CADkLM%3DcpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR%2BgJX-Og%40mail.gmail.com > regarding a bug (0001) and a nitpick (0002) that came about as a side-effect > to this effort, and but I expect those to be resolved one way or another > soon. Any feedback on those two can be handled there.
I think this is very useful to avoid fetching rows from foreign server and analyzing them locally. This isn't a full review. I looked at the patches mainly to find out how does it fit into the current method of analysing a foreign table. Right now, do_analyze_rel() is called with FDW specific acquireFunc, which collects a sample of rows. The sample is passed to attribute specific compute_stats which fills VacAttrStats for that attribute. VacAttrStats for all the attributes is passed to update_attstats(), which updates pg_statistics. The patch changes that to fetch the statistics from the foreign server and call pg_restore_attribute_stats for each attribute. Instead I was expecting that after fetching the stats from the foreign server, it would construct VacAttrStats and call update_attstats(). That might be marginally faster since it avoids SPI call and updates stats for all the attributes. Did you consider this alternate approach and why it was discarded? If a foreign table points to an inheritance parent on the foreign server, we will receive two rows for that table - one with inherited = false and other with true in that order. I think the stats with inherited=true are relevant to the local server since querying the parent will fetch rows from children. Since that stats is applied last, the pg_statistics will retain the intended statistics. But why to fetch two rows in the first place and waste computing cycles? -- Best Wishes, Ashutosh Bapat