Thank you Amit and Ashutosh for your reply! On Wed, Oct 7, 2020 at 8:41 PM Amit Langote <amitlangot...@gmail.com> wrote:
> Hi Ashutosh, > > On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1...@gmail.com> > wrote: > > > On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat < > ashutosh.bapat....@gmail.com> wrote: > > >> What happens when we > > >> execute plans with values that have estimates similar to the generic > > >> plan later when we moderate generic plan costs based on the custom > > >> plans? > > >> > > > > > > The example at the beginning of this thread, I used the exact same > values > > > every time, the custom plan will be chosen all the time, which is bad, > > > The main reason is the custom plan knows the exact value in Execute > > > message, so it run plan time partition prune, then the total cost is > low, however > > > for the generic plan the partition prune happens at Runtime > initial_partition prune > > > stage, so the cost of the partitions which can be pruned at that > stage is still > > > included the total cost, so generic plans can't be chosen. that would > be the > > > thing I want to fix. > > > > Something is wrong in the generic plan costing then. > > That's right. > > > IIUC, the > > selectivity estimate for only a single partition should come out >= 1. > > For all the other partitions, it should be 1 and that too because we > > clamp the row counts. If the planner knows the exact parameter, the planner can estimate other partitions as 1 row (actually 0 row). However if it doesn't know that, it will be rows / ndistinctVals for "partkey = $1" case. > So the final costs for generic and custom plans > > shouldn't be far off unless there's large deviation in the selectivity > > of a partition key. I am assuming that there's an equality condition > > on a partition key. That's what I meant by the paragraph below. > > > > > >> If the table has good distribution of a partition key, which also > > >> results in good distribution of data across partitions, generic plan > > >> cost will be similar to the custom plan costs. If not that's something > > >> we want to fix. > > > > Can you please investigate on these lines? > > The planner currently costs a generic plan assuming that *all* > partitions will be scanned, which is not the case, because run-time > pruning will ensure that scan nodes of all but one partition will be > discarded, assuming a simple query like `select * from parted_table > where a = $1`. With N partitions, a generic plan for such a query is > N times as expensive as a custom plan, so will never win. > > For example: > > create table bar (a int, b int, c int) partition by range (a); > create table bar_1 partition of bar for values from (1) to (100001); > create table bar_2 partition of bar for values from (100001) to (200001); > create table bar_3 partition of bar for values from (200001) to (300001); > create table bar_4 partition of bar for values from (300001) to (400001); > create table bar_5 partition of bar for values from (400001) to (500001); > create table bar_6 partition of bar for values from (500001) to (600001); > create table bar_7 partition of bar for values from (600001) to (700001); > create table bar_8 partition of bar for values from (700001) to (800001); > create table bar_9 partition of bar for values from (800001) to (900001); > create table bar_10 partition of bar for values from (900001) to (1000001); > insert into bar select generate_series(1, 1000000); > create index on bar (a); > analyze bar; > > set plan_cache_mode to force_custom_plan ; > prepare q as select * from bar where a = $1; > explain execute q (1); > QUERY PLAN > > ------------------------------------------------------------------------------ > Index Scan using bar_1_a_idx on bar_1 bar (cost=0.29..8.31 rows=1 > width=12) > Index Cond: (a = 1) > (2 rows) > > deallocate q; > set plan_cache_mode to force_generic_plan ; > prepare q as select * from bar where a = $1; > explain execute q (1); > QUERY PLAN > > -------------------------------------------------------------------------------- > Append (cost=0.29..83.15 rows=10 width=12) > Subplans Removed: 9 > -> Index Scan using bar_1_a_idx on bar_1 (cost=0.29..8.31 rows=1 > width=12) > Index Cond: (a = $1) > (4 rows) > > As you can see, the cost of the generic plan is 83.13 which is 8.31 * > 10, even though only one partition is actually scanned, so the actual > cost is the same as a custom plan. The planner fails to consider that > the those 9 subplans will be removed during execution. > > -- > Amit Langote > EDB: http://www.enterprisedb.com > I also had some replies at the original runtime partition prune thread [1], I am very sorry that I mess up this topic with 2 threads, I have some new understanding about this (might be old ideas for others), I'd like to continue this discussion in that thread and add you to the cc list there. Thanks! [1] https://www.postgresql.org/message-id/CAKU4AWrzi7f1Y1J8q0xWO8fXOwf-BtdG%2BM9_7bVPnQyd5cLS0Q%40mail.gmail.com -- Best Regards Andy Fan