On Mon, Jan 25, 2021 at 10:21 AM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> > > On Sun, Jan 24, 2021 at 6:34 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: > >> Hi: >> >> I recently found a use case like this. SELECT * FROM p, q WHERE >> p.partkey = >> q.colx AND (q.colx = $1 OR q.colx = $2); Then we can't do either >> planning time >> partition prune or init partition prune. Even though we have run-time >> partition pruning work at last, it is too late in some cases since we >> have >> to init all the plan nodes in advance. In my case, there are 10+ >> partitioned relation in one query and the execution time is short, so the >> init plan a lot of plan nodes cares a lot. >> >> The attached patches fix this issue. It just get the "p.partkey = q.colx" >> case in root->eq_classes or rel->joinlist (outer join), and then check if >> there >> is some baserestrictinfo in another relation which can be used for >> partition >> pruning. To make the things easier, both partkey and colx must be Var >> expression in implementation. >> >> - v1-0001-Make-some-static-functions-as-extern-and-extend-C.patch >> >> Just some existing refactoring and extending ChangeVarNodes to be able >> to change var->attno. >> >> - v1-0002-Build-some-implied-pruning-quals-to-extend-the-us.patch >> >> Do the real job. >> >> Thought? >> >> >> >> -- >> Best Regards >> Andy Fan (https://www.aliyun.com/) >> > > > Some results from this patch. > > create table p (a int, b int, c character varying(8)) partition by list(c); > create table p1 partition of p for values in ('000001'); > create table p2 partition of p for values in ('000002'); > create table p3 partition of p for values in ('000003'); > create table q (a int, c character varying(8), b int) partition by list(c); > create table q1 partition of q for values in ('000001'); > create table q2 partition of q for values in ('000002'); > create table q3 partition of q for values in ('000003'); > > Before the patch: > postgres=# explain (costs off) select * from p inner join q on p.c = q.c > and q.c > '000002'; > QUERY PLAN > ---------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Append > -> Seq Scan on p1 p_1 > -> Seq Scan on p2 p_2 > -> Seq Scan on p3 p_3 > -> Hash > -> Seq Scan on q3 q > Filter: ((c)::text > '000002'::text) > (9 rows) > > After the patch: > > QUERY PLAN > ---------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Seq Scan on p3 p > -> Hash > -> Seq Scan on q3 q > Filter: ((c)::text > '000002'::text) > (6 rows) > > > Before the patch: > postgres=# explain (costs off) select * from p inner join q on p.c = q.c > and (q.c = '000002' or q.c = '000001'); > QUERY PLAN > > -------------------------------------------------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Append > -> Seq Scan on p1 p_1 > -> Seq Scan on p2 p_2 > -> Seq Scan on p3 p_3 > -> Hash > -> Append > -> Seq Scan on q1 q_1 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > -> Seq Scan on q2 q_2 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > (12 rows) > > After the patch: > QUERY PLAN > > -------------------------------------------------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Append > -> Seq Scan on p1 p_1 > -> Seq Scan on p2 p_2 > -> Hash > -> Append > -> Seq Scan on q1 q_1 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > -> Seq Scan on q2 q_2 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > (11 rows) > > Before the patch: > postgres=# explain (costs off) select * from p left join q on p.c = q.c > where (q.c = '000002' or q.c = '000001'); > QUERY PLAN > > -------------------------------------------------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Append > -> Seq Scan on p1 p_1 > -> Seq Scan on p2 p_2 > -> Seq Scan on p3 p_3 > -> Hash > -> Append > -> Seq Scan on q1 q_1 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > -> Seq Scan on q2 q_2 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > (12 rows) > > After the patch: > QUERY PLAN > > -------------------------------------------------------------------------------------------- > Hash Join > Hash Cond: ((p.c)::text = (q.c)::text) > -> Append > -> Seq Scan on p1 p_1 > -> Seq Scan on p2 p_2 > -> Hash > -> Append > -> Seq Scan on q1 q_1 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > -> Seq Scan on q2 q_2 > Filter: (((c)::text = '000002'::text) OR ((c)::text = > '000001'::text)) > (11 rows) > > -- > Best Regards > Andy Fan (https://www.aliyun.com/) > Here is a performance test regarding this patch. In the following simple case, we can get 3x faster than before. create table p (a int, b int, c int) partition by list(c); select 'create table p_'||i||' partition of p for values in (' || i || ');' from generate_series(1, 100)i; \gexec insert into p select i, i, i from generate_series(1, 100)i; create table m as select * from p; analyze m; analyze p; test sql: select * from m, p where m.c = p.c and m.c in (3, 10); With this patch: 1.1ms Without this patch: 3.4ms I'm happy with the result and the implementation, I have add this into commitfest https://commitfest.postgresql.org/32/2975/ Thanks. -- Best Regards Andy Fan (https://www.aliyun.com/)