Re: enable_incremental_sort changes query behavior

James Coleman Fri, 20 Nov 2020 10:52:14 -0800

On Fri, Nov 20, 2020 at 12:06 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
> > I'm still not entirely sure I understand what's happening, or what the
> > exact rule is. Consider this query:
> >
> >    explain (verbose) select distinct i, t, md5(t) from ref_0;
> >
> > which on PG12 (i.e. before incremental sort) is planned like this:
> >
> >                                   QUERY PLAN
> > ----------------------------------------------------------------------------------
> >   Unique  (cost=78120.92..83120.92 rows=500000 width=65)
> >     Output: i, t, (md5(t))
> >     ->  Sort  (cost=78120.92..79370.92 rows=500000 width=65)
> >           Output: i, t, (md5(t))
> >           Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
> >           ->  Seq Scan on public.ref_0  (cost=0.00..10282.00 rows=500000 
> > width=65)
> >                 Output: i, t, md5(t)
> > (7 rows)
> >
> > i.e. the (stable) function is pushed all the way to the scan node. And
> > even if we replace it with a volatile expression it gets pushed down:
> >
> > explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
> >
> >                                   QUERY PLAN
> > ----------------------------------------------------------------------------------
> >   Unique  (cost=83120.92..88120.92 rows=500000 width=65)
> >     Output: i, t, (md5(((random())::text || t)))
> >     ->  Sort  (cost=83120.92..84370.92 rows=500000 width=65)
> >           Output: i, t, (md5(((random())::text || t)))
> >           Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
> >           ->  Seq Scan on public.ref_0  (cost=0.00..15282.00 rows=500000 
> > width=65)
> >                 Output: i, t, md5(((random())::text || t))
> > (7 rows)
> >
> >
> > But perhaps I just don't understand the assumption correctly?
>
> This isn't a counterexample, because there's no join tree here -- or,
> well, there is, but it's trivial, because there's only one relation
> involved. You can't have a non-Var expression computed before you
> finish all the joins, because there are no joins.
>
> What I said was: "target lists for any nodes below the top of the join
> tree were previously always just Var nodes." The topmost join allowed
> non-Var nodes before, but not lower levels.


As I understand what you're saying, the attached (from the repro case
in [1]'s discussion about parallel safety here) is a counterexample.

Specifically we have a plan like:

Merge Right Join
  -> Unique
    -> Gather Merge
      -> Sort
        -> Nested Loop

The pathtarget of the nested loop contains non-var expressions (in
this case a CASE expression).

Am I misunderstanding what you're saying?

I've attached verbose output (and the query).

James

test=# explain (verbose)
select m.id2, m.id, s.description
FROM main m
LEFT JOIN (
  SELECT DISTINCT
    m.id,
    CASE
    WHEN m.id2 = 15 AND (SELECT name FROM secondary x WHERE x.id = s2.id AND 
x.id2 = 10) = md5(123::text) THEN 'description'
    WHEN m.id2 = 15 THEN (SELECT name FROM secondary x WHERE x.id = s2.id AND 
x.id2 = 5)
    END AS description
  FROM main m
  JOIN secondary s2 ON m.id = s2.id
  WHERE m.id2 = 15 and type = 0
) s ON s.id = m.id
WHERE m.id2 IN (15) and type = 0 ;
                                                                                
                             QUERY PLAN                                         
                                                                                
                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                
  
 Merge Right Join  (cost=126496.41..9205100.33 rows=34236 width=44)
   Output: m.id2, m.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = 
'202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN 
(m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END)                            
                                                                                
        
   Inner Unique: true
   Merge Cond: (m_1.id = m.id)
   ->  Unique  (cost=125495.95..9092259.53 rows=34236 width=40)
         Output: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = 
'202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN 
(m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END)                            
                                                                                
       
         ->  Gather Merge  (cost=125495.95..9089667.83 rows=518341 width=40)
               Output: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 1) = 
'202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN 
(m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END)                            
                                                                                
 
               Workers Planned: 2
               ->  Sort  (cost=124495.93..125035.87 rows=215975 width=20)
                     Output: m_1.id, m_1.id2, s2.id, (CASE WHEN ((m_1.id2 = 15) 
AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 
'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END)   
                                                                                
    
                     Sort Key: m_1.id, (CASE WHEN ((m_1.id2 = 15) AND ((SubPlan 
1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 'description'::text WHEN 
(m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END)                            
                                                                         
                     ->  Nested Loop  (cost=388.90..100929.00 rows=215975 
width=20)
                           Output: m_1.id, m_1.id2, s2.id, CASE WHEN ((m_1.id2 
= 15) AND ((SubPlan 1) = '202cb962ac59075b964b07152d234b70'::text)) THEN 
'description'::text WHEN (m_1.id2 = 15) THEN (SubPlan 2) ELSE NULL::text END    
                                                                               
                           ->  Parallel Bitmap Heap Scan on public.main m_1  
(cost=388.34..33793.29 rows=14265 width=12)
                                 Output: m_1.id, m_1.id2, m_1.type, m_1.name
                                 Recheck Cond: (m_1.id2 = 15)
                                 Filter: (m_1.type = 0)
                                 ->  Bitmap Index Scan on main_id2_idx  
(cost=0.00..379.78 rows=34580 width=0)
                                       Index Cond: (m_1.id2 = 15)
                           ->  Index Only Scan using secondary_pkey on 
public.secondary s2  (cost=0.56..4.56 rows=15 width=8)
                                 Output: s2.id, s2.id2
                                 Index Cond: (s2.id = m_1.id)
                           SubPlan 1
                             ->  Index Scan using secondary_pkey on 
public.secondary x  (cost=0.56..8.58 rows=1 width=33)
                                   Output: x.name
                                   Index Cond: ((x.id = s2.id) AND (x.id2 = 10))
                           SubPlan 2
                             ->  Index Scan using secondary_pkey on 
public.secondary x_1  (cost=0.56..8.58 rows=1 width=33)
                                   Output: x_1.name
                                   Index Cond: ((x_1.id = s2.id) AND (x_1.id2 = 
5))
   ->  Gather Merge  (cost=1000.45..112323.81 rows=34236 width=12)
         Output: m.id2, m.id
         Workers Planned: 2
         ->  Parallel Index Scan using main_pkey on public.main m  
(cost=0.43..107372.10 rows=14265 width=12)
               Output: m.id2, m.id
               Filter: ((m.id2 = 15) AND (m.type = 0))

Re: enable_incremental_sort changes query behavior

Reply via email to