On Fri, Nov 20, 2020 at 12:06 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: > > I'm still not entirely sure I understand what's happening, or what the > > exact rule is. Consider this query: > > > > explain (verbose) select distinct i, t, md5(t) from ref_0; > > > > which on PG12 (i.e. before incremental sort) is planned like this: > > > > QUERY PLAN > > ---------------------------------------------------------------------------------- > > Unique (cost=78120.92..83120.92 rows=500000 width=65) > > Output: i, t, (md5(t)) > > -> Sort (cost=78120.92..79370.92 rows=500000 width=65) > > Output: i, t, (md5(t)) > > Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t)) > > -> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 > > width=65) > > Output: i, t, md5(t) > > (7 rows) > > > > i.e. the (stable) function is pushed all the way to the scan node. And > > even if we replace it with a volatile expression it gets pushed down: > > > > explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0; > > > > QUERY PLAN > > ---------------------------------------------------------------------------------- > > Unique (cost=83120.92..88120.92 rows=500000 width=65) > > Output: i, t, (md5(((random())::text || t))) > > -> Sort (cost=83120.92..84370.92 rows=500000 width=65) > > Output: i, t, (md5(((random())::text || t))) > > Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t))) > > -> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 > > width=65) > > Output: i, t, md5(((random())::text || t)) > > (7 rows) > > > > > > But perhaps I just don't understand the assumption correctly? > > This isn't a counterexample, because there's no join tree here -- or, > well, there is, but it's trivial, because there's only one relation > involved. You can't have a non-Var expression computed before you > finish all the joins, because there are no joins. > > What I said was: "target lists for any nodes below the top of the join > tree were previously always just Var nodes." The topmost join allowed > non-Var nodes before, but not lower levels.
As I understand what you're saying, the attached (from the repro case in [1]'s discussion about parallel safety here) is a counterexample. Specifically we have a plan like: Merge Right Join -> Unique -> Gather Merge -> Sort -> Nested Loop The pathtarget of the nested loop contains non-var expressions (in this case a CASE expression). Am I misunderstanding what you're saying? I've attached verbose output (and the query). James
test=# explain (verbose) select m.id2, m.id, s.description FROM main m LEFT JOIN ( SELECT DISTINCT m.id, CASE WHEN m.id2 = 15 AND (SELECT name FROM secondary x WHERE x.id = s2.id AND x.id2 = 10) = md5(123::text) THEN 'description' WHEN m.id2 = 15 THEN (SELECT name FROM secondary x WHERE x.id = s2.id AND x.id2 = 5) END AS description FROM main m JOIN secondary s2 ON m.id = s2.id WHERE m.id2 = 15 and type = 0 ) s ON s.id = m.id WHERE m.id2 IN (15) and type = 0 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Right Join (cost=126496.41..9205100.33 rows=34236 width=44) Output: m.id2, m.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END) Inner Unique: true Merge Cond: (m_1.id = m.id) -> Unique (cost=125495.95..9092259.53 rows=34236 width=40) Output: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END) -> Gather Merge (cost=125495.95..9089667.83 rows=518341 width=40) Output: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END) Workers Planned: 2 -> Sort (cost=124495.93..125035.87 rows=215975 width=20) Output: m_1.id, m_1.id2, s2.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END) Sort Key: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END) -> Nested Loop (cost=388.90..100929.00 rows=215975 width=20) Output: m_1.id, m_1.id2, s2.id, CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END -> Parallel Bitmap Heap Scan on public.main m_1 (cost=388.34..33793.29 rows=14265 width=12) Output: m_1.id, m_1.id2, m_1.type, m_1.name Recheck Cond: (m_1.id2 = 15) Filter: (m_1.type = 0) -> Bitmap Index Scan on main_id2_idx (cost=0.00..379.78 rows=34580 width=0) Index Cond: (m_1.id2 = 15) -> Index Only Scan using secondary_pkey on public.secondary s2 (cost=0.56..4.56 rows=15 width=8) Output: s2.id, s2.id2 Index Cond: (s2.id = m_1.id) SubPlan 1 -> Index Scan using secondary_pkey on public.secondary x (cost=0.56..8.58 rows=1 width=33) Output: x.name Index Cond: ((x.id = s2.id) AND (x.id2 = 10)) SubPlan 2 -> Index Scan using secondary_pkey on public.secondary x_1 (cost=0.56..8.58 rows=1 width=33) Output: x_1.name Index Cond: ((x_1.id = s2.id) AND (x_1.id2 = 5)) -> Gather Merge (cost=1000.45..112323.81 rows=34236 width=12) Output: m.id2, m.id Workers Planned: 2 -> Parallel Index Scan using main_pkey on public.main m (cost=0.43..107372.10 rows=14265 width=12) Output: m.id2, m.id Filter: ((m.id2 = 15) AND (m.type = 0))