On Thu, Jul 2, 2020 at 11:46 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Etsuro Fujita <etsuro.fuj...@gmail.com> writes: > > On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Short of sending a whole second query to the remote server, it's > >> not clear to me how we could get the full table size (or equivalently > >> the target query's selectivity for that table). The best we realistically > >> can do is to adopt pg_class.reltuples if there's been an ANALYZE of > >> the foreign table. That case already works (and this proposal doesn't > >> break it). The problem is what to do when pg_class.reltuples is zero > >> or otherwise badly out-of-date. > > > In estimate_path_cost_size(), if use_remote_estimate is true, we > > adjust the rows estimate returned from the remote server, by factoring > > in the selectivity of the locally-checked quals. I thought what I > > proposed above would be more consistent with that. > > No, I don't think that would be very helpful. There are really three > different numbers of interest here: > > 1. The actual total rowcount of the remote table. > > 2. The number of rows returned by the remote query (which is #1 times > the selectivity of the shippable quals). > > 3. The number of rows returned by the foreign scan (which is #2 times > the selectivity of the non-shippable quals)). > > Clearly, rel->rows should be set to #3. However, what we really want > for rel->tuples is #1. That's because, to the extent that the planner > inspects rel->tuples at all, it's to adjust whole-table stats such as > we might have from ANALYZE. What you're suggesting is that we use #2, > but I doubt that that's a big improvement. In a decently tuned query > it's going to be a lot closer to #3 than to #1. > > We could perhaps try to make our own estimate of the selectivity of the > shippable quals and then back into #1 from the value we got for #2 from > the remote server.
Actually, that is what I suggested: + /* + * plancat.c copied baserel->pages and baserel->tuples from pg_class. + * If the foreign table has never been ANALYZEd, or if its stats are + * out of date, baserel->tuples might now be less than baserel->rows, + * which will confuse assorted logic. Hack it to appear minimally + * sensible. (Do we need to hack baserel->pages too?) + */ + baserel->tuples = Max(baserel->tuples, baserel->rows); for consistency, this should be baserel->tuples = clamp_row_est(baserel->rows / sel); where sel is the selectivity of the baserestrictinfo clauses? By "the baserestrictinfo clauses", I mean the shippable clauses as well as the non-shippable clauses. Since baserel->rows stores the rows estimate returned by estimate_path_cost_size(), which is #3, this estimates #1. > But that sounds mighty error-prone, so I doubt it'd > make for much of an improvement. I have to admit the error-proneness. > In any case, the proposal I'm making is just to add a sanity-check > clamp to prevent the worst effects of not setting rel->tuples sanely. > It doesn't foreclose future improvements inside the FDW. Agreed. Best regards, Etsuro Fujita