On Tue, Aug 12, 2025 at 10:33 PM Corey Huinker <corey.huin...@gmail.com> wrote:
>
>
> Attached is my current work on adding remote fetching of statistics to 
> postgres_fdw, and opening the possibility of doing so to other foreign data 
> wrappers.
>
> This involves adding two new options to postgres_fdw at the server and table 
> level.
>
> The first option, fetch_stats, defaults to true at both levels. If enabled, 
> it will cause an ANALYZE of a postgres_fdw foreign table to first attempt to 
> fetch relation and attribute statistics from the remote table. If this 
> succeeds, then those statistics are imported into the local foreign table, 
> allowing us to skip a potentially expensive sampling operation.
>
> The second option, remote_analyze, defaults to false at both levels, and only 
> comes into play if the first fetch succeeds but no attribute statistics (i.e. 
> the stats from pg_stats) are found. If enabled then the function will attempt 
> to ANALYZE the remote table, and if that is successful then a second attempt 
> at fetching remote statistics will be made.
>
> If no statistics were fetched, then the operation will fall back to the 
> normal sampling operation per settings.
>
> Note patches 0001 and 0002 are already a part of a separate thread 
> https://www.postgresql.org/message-id/flat/CADkLM%3DcpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR%2BgJX-Og%40mail.gmail.com
>  regarding a bug (0001) and a nitpick (0002) that came about as a side-effect 
> to this effort, and but I expect those to be resolved one way or another 
> soon. Any feedback on those two can be handled there.

I think this is very useful to avoid fetching rows from foreign server
and analyzing them locally.

This isn't a full review. I looked at the patches mainly to find out
how does it fit into the current method of analysing a foreign table.
Right now, do_analyze_rel() is called with FDW specific acquireFunc,
which collects a sample of rows. The sample is passed to attribute
specific compute_stats which fills VacAttrStats for that attribute.
VacAttrStats for all the attributes is passed to update_attstats(),
which updates pg_statistics. The patch changes that to fetch the
statistics from the foreign server and call pg_restore_attribute_stats
for each attribute. Instead I was expecting that after fetching the
stats from the foreign server, it would construct VacAttrStats and
call update_attstats(). That might be marginally faster since it
avoids SPI call and updates stats for all the attributes. Did you
consider this alternate approach and why it was discarded?

If a foreign table points to an inheritance parent on the foreign
server, we will receive two rows for that table - one with inherited =
false and other with true in that order. I think the stats with
inherited=true are relevant to the local server since querying the
parent will fetch rows from children. Since that stats is applied
last, the pg_statistics will retain the intended statistics. But why
to fetch two rows in the first place and waste computing cycles?

-- 
Best Wishes,
Ashutosh Bapat


Reply via email to