On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> This whole discussion seems to me to be about trying to do things outside >>> the FDW that should properly be left inside the FDW. Who's to say that >>> the remote side even *has* statistics of the sort that PG creates? >>> >>> We should provide an API that lets the FDW return a cost estimate for a >>> proposed access path. Where it gets the cost estimate from is not >>> something that should be presupposed. > >> Unless there's some way for the FDW to have local tables for caching >> its statistics, the chances of this having decent performance seem to >> be near-zero. > > Perhaps, but that would be the FDW's problem to implement. Trying to > design such tables in advance of actually writing an FDW seems like a > completely backwards design process.
Oh, I agree. I don't want to dictate the structure of those tables; I just think it's inevitable that an FDW is going to need the ability to be bound to some local tables which the admin should set up before installing it. That is, we need a general capability, not a specific set of tables. > (I'd also say that your performance estimate is miles in advance of any > facts; but even if it's true, the caching ought to be inside the FDW, > because we have no clear idea of what it will need to cache.) I can't imagine how an FDW could possibly be expected to perform well without some persistent local data storage. Even assume the remote end is PG. To return a cost, it's going to need the contents of pg_statistic cached locally, for each remote table. Do you really think it's going to work to incur that overhead once per table per backend startup? Or else every time we try to plan against a foreign table we can fire off an SQL query to the remote side instead of trying to compute the cost locally. That's got to be two orders of magnitude slower than planning based off local stats. We could punt the issue of stats altogether for the first version and simply say, hey, this is only intended for things like reading from CSV files. But if we're going to have it at all then I can't see how we're going to get by without persistent local storage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers