postgres_fdw has values called fdw_startup_cost and fdw_tuple_cost that you can adjust in order to control the perceived cost of starting a remote query and schlepping a tuple over the network. Parallel query, similarly, has parallel_startup_cost and parallel_tuple_cost. Today, I noticed that on a relative basis, the defaults are insane.
parallel_startup_cost = 1000 fdw_startup_cost = 100 parallel_tuple_cost = 0.1 fdw_tuple_cost = 0.01 So, we think that starting a query on a remote node is 10x cheaper than starting a worker within the same cluster, and similarly we think that the cost of moving a tuple from the worker to the leader is 10x more than the cost of moving a tuple from a remote node to the local node. I think that's gotta be wrong. Communication within the same database should be considerably *cheaper*, because we don't have to convert tuples to next or fit them through the network pipe. Another useful point of comparison is the cost of an Append node, which is currently defined to be one-half of cpu_tuple_cost, or by default 0.005 - or in other words half the cost of sending a tuple over the network. It's good that it's less, but I think the actual cost of sending a tuple over the network is considerably more than twice the cost of passing it through an Append node. I suspect it's at least 10x more expensive. I don't think all of this mattered very much before we had join pushdown, aggregate pushdown, and partition-wise join and aggregate, but it does matter now. I'm seeing the planner fail to push down aggregates in cases that are obvious wins unless I crank up fdw_tuple_cost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company