On Fri, Sep 15, 2017 at 5:29 PM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > >> >> Apart from these there is a regression case on a custom table, on head >> query completes in 20s and with this patch it takes 27s. Please find >> the attached .out and .sql file for the output and schema for the test >> case respectively. I have reported this case before (sometime around >> March this year) as well, but I am not sure if it was overlooked or is >> an unimportant and expected behaviour for some reason. >> > > Are you talking about [1]? I have explained about the regression in > [2] and [3]. This looks like an issue with the existing costing model. >
I debugged this case further. There are two partitioned tables being joined prt (with partitions prt_p1, prt_p2 and so on) and prt2 (with partitions prt2_p1, prt2_p2, and so on). When join is executed without partition-wise join, prt2 is used to build hash table and prt is used to probe that hash table. prt2 has lesser number of rows than prt. But when partition-wise join is used, individual partitions are joined in reverse join order i.e. partitions of prt are used to build the hash table and partitions of prt2 are used to probe. This happens because the path for the other join order (partition of prt2 used to build the hash table and partition of prt used to probe) has huge cost compared to the first one (74459 and 313109) and a portion worth 259094 comes from lines 3226/7 of final_cost_hashjoin() 3215 /* 3216 * The number of tuple comparisons needed is the number of outer 3217 * tuples times the typical number of tuples in a hash bucket, which 3218 * is the inner relation size times its bucketsize fraction. At each 3219 * one, we need to evaluate the hashjoin quals. But actually, 3220 * charging the full qual eval cost at each tuple is pessimistic, 3221 * since we don't evaluate the quals unless the hash values match 3222 * exactly. For lack of a better idea, halve the cost estimate to 3223 * allow for that. 3224 */ 3225 startup_cost += hash_qual_cost.startup; 3226 run_cost += hash_qual_cost.per_tuple * outer_path_rows * 3227 clamp_row_est(inner_path_rows * innerbucketsize) * 0.5; That's because for some reason innerbucketsize for partition of prt is 22 times more than that for partition of prt2. Looks like we have some estimation error in estimating bucket sizes. If I force partitions to be joined with the same order as partitioned tables (without partition-wise join), child-joins execute faster and in turn partition-wise join performs better than the non-partition-wise join. So, this is clearly some estimation and costing problem with regular joins. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers