After we pull up this sublink as semi join , when make join rel for semi join, the optimizer will take hash join method into account if a unique path can be created with the RHS, for detail please check make_join_rel in src/backend/optimizer/path/joinrels.c. For this case, the cost of hash join is cheaper than semi join, so you can see the planner chose the hash join rather than semi join.
------------------ Jerry Yu https://github.com/scarbrofair ------------------ Original ------------------ From: "Robert Haas";<robertmh...@gmail.com>; Date: Fri, Jul 22, 2016 00:23 AM To: "Armor"<yupengst...@qq.com>; Cc: "pgsql-hackers"<pgsql-hackers@postgresql.org>; Subject: Re: [HACKERS] One question about transformation ANY Sublinks into joins On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengst...@qq.com> wrote: > Hi > I run a simple SQL with latest PG?? > postgres=# explain select * from t1 where id1 in (select id2 from t2 where > c1=c2); > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8) > Filter: (SubPlan 1) > SubPlan 1 > -> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4) > Filter: (t1.c1 = c2) > (5 rows) > > and the table schema are as following: > > postgres=# \d t1 > Table "public.t1" > Column | Type | Modifiers > --------+---------+----------- > id1 | integer | > c1 | integer | > > postgres=# \d t2 > Table "public.t2" > Column | Type | Modifiers > --------+---------+----------- > id2 | integer | > c2 | integer | > > I find PG decide not to pull up this sublink because the whereClauses > in this sublink refer to the Vars of parent query, for detail please check > the function named convert_ANY_sublink_to_join in > src/backend/optimizer/plan/subselect.c. > However, for such simple sublink which has no agg, no window function, > no limit, may be we can carefully pull up the predicates in whereCluase > which refers to the Vars of parent query, then pull up this sublink and > produce a query plan as following: > > postgres=# explain select * from t1 where id1 in (select id2 from t2 where > c1=c2); > QUERY PLAN > ------------------------------------------------------------------------ > Hash Join (cost=49.55..99.23 rows=565 width=8) > Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2)) > -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) > -> Hash (cost=46.16..46.16 rows=226 width=8) > -> HashAggregate (cost=43.90..46.16 rows=226 width=8) > Group Key: t2.id2, t2.c2 > -> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8) It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company