Hi,

Attached is a query and its corresponding plan, where sorting of the CTE
acts seems to be the bottle neck. It is a real execution plan captured
with the auto_explain module.

The query is recursive. In each iteration CTE acts is sorted again,
which is obviously quite expensive for about 24000 rows and the same
number of iterations.

So I tried to put the ordering over the keys (d_id, activation_count)
into the CTE definition itself. This is honoured, when evaluating the
CTE but not for the iteration, where the CTE acts is still sorted again.
I cannot see a reason for this. A simple CTE scan with filter condition
should be enough.

Removing the order by from the definition of the CTE has absolutely no
impact on the performance, which is quite obvious regarding the number
of iterations. Further it has no impact on the query plan at all. It
only removes the sort node from the CTE acts node.

Do I miss something which would make the plan incorrect or is the
planner just not intelligent enough to recognize that a table is sorted
by the desired keys?

I hope the attachments prevent outlook from destroying any text
formatting.

Thanks in advance
--
Regards,
Robert

        CTE Scan on he_per_dispenser  (cost=75008.64..75720.96 rows=35616 
width=16) (actual time=141.806..205579.272 rows=53152 loops=1)        
          Output: he_per_dispenser.he_id, he_per_dispenser.activation_id        
          CTE acts      
            ->  Sort  (cost=8298.16..8431.04 rows=53152 width=24) (actual 
time=113.128..117.709 rows=53152 loops=1)     
                  Output: a.id, e.dispenser_id, a.activation_count, 
e."timestamp"       
                  Sort Key: e.dispenser_id, a.activation_count  
                  Sort Method:  quicksort  Memory: 5689kB       
                  ->  Hash Join  (cost=1531.92..4126.30 rows=53152 width=24) 
(actual time=25.653..66.205 rows=53152 loops=1)    
                        Output: a.id, e.dispenser_id, a.activation_count, 
e."timestamp" 
                        Hash Cond: (e.id = a.id)        
                        ->  Seq Scan on events e  (cost=0.00..1280.93 
rows=78193 width=20) (actual time=0.006..8.615 rows=78193 loops=1)        
                              Output: e.id, e."timestamp", e.dispenser_id, 
e.relocation_id, e.synthetic 
                        ->  Hash  (cost=867.52..867.52 rows=53152 width=12) 
(actual time=18.737..18.737 rows=53152 loops=1)     
                              Output: a.id, a.activation_count  
                              ->  Seq Scan on activations a  (cost=0.00..867.52 
rows=53152 width=12) (actual time=0.005..8.938 rows=53152 loops=1)      
                                    Output: a.id, a.activation_count    
          CTE he_per_dispenser  
            ->  Recursive Union  (cost=1369.30..66577.60 rows=35616 width=40) 
(actual time=141.803..205530.157 rows=53152 loops=1)      
                  ->  Hash Join  (cost=1369.30..5359.69 rows=266 width=40) 
(actual time=141.803..155.424 rows=11 loops=1)       
                        Output: a.activation_id, acts.dispenser_id, 
(min(acts.activation_count)), a."timestamp", CASE WHEN (a."timestamp" <= 
(p."timestamp" + $1)) THEN p.last_hygiene_event ELSE a.activation_id END   
                        Hash Cond: ((a.dispenser_id = acts.dispenser_id) AND 
(a.activation_count = (min(acts.activation_count))))       
                        ->  CTE Scan on acts a  (cost=0.00..1063.04 rows=53152 
width=24) (actual time=113.130..118.725 rows=53152 loops=1)      
                              Output: a.activation_id, a.dispenser_id, 
a.activation_count, a."timestamp"        
                        ->  Hash  (cost=1366.30..1366.30 rows=200 width=24) 
(actual time=28.644..28.644 rows=11 loops=1)        
                              Output: acts.dispenser_id, 
(min(acts.activation_count)), p."timestamp", p.last_hygiene_event      
                              ->  Hash Left Join  (cost=1359.05..1366.30 
rows=200 width=24) (actual time=28.634..28.639 rows=11 loops=1)        
                                    Output: acts.dispenser_id, 
(min(acts.activation_count)), p."timestamp", p.last_hygiene_event        
                                    Hash Cond: (acts.dispenser_id = 
p.dispenser_id)     
                                    ->  HashAggregate  (cost=1328.80..1331.30 
rows=200 width=8) (actual time=28.617..28.620 rows=11 loops=1)    
                                          Output: acts.dispenser_id, 
min(acts.activation_count) 
                                          ->  CTE Scan on acts  
(cost=0.00..1063.04 rows=53152 width=8) (actual time=0.001..15.560 rows=53152 
loops=1)  
                                                Output: acts.activation_id, 
acts.dispenser_id, acts.activation_count, acts."timestamp"  
                                    ->  Hash  (cost=19.00..19.00 rows=900 
width=20) (actual time=0.002..0.002 rows=0 loops=1)   
                                          Output: p."timestamp", 
p.last_hygiene_event, p.dispenser_id   
                                          ->  Seq Scan on 
processed_activations_cache p  (cost=0.00..19.00 rows=900 width=20) (actual 
time=0.002..0.002 rows=0 loops=1) 
                                                Output: p."timestamp", 
p.last_hygiene_event, p.dispenser_id     
                  ->  Merge Join  (cost=5439.41..6050.56 rows=3535 width=40) 
(actual time=3.172..8.274 rows=2 loops=24813)      
                        Output: a.activation_id, a.dispenser_id, 
a.activation_count, a."timestamp", CASE WHEN ((a."timestamp" - s."timestamp") 
<= $1) THEN s.he_id ELSE a.activation_id END     
                        Merge Cond: ((s.dispenser_id = a.dispenser_id) AND 
(((s.activation_count + 1)) = a.activation_count))   
                        ->  Sort  (cost=204.52..211.17 rows=2660 width=24) 
(actual time=0.005..0.006 rows=2 loops=24813)        
                              Output: s."timestamp", s.he_id, s.dispenser_id, 
s.activation_count        
                              Sort Key: s.dispenser_id, ((s.activation_count + 
1))      
                              Sort Method:  quicksort  Memory: 25kB     
                              ->  WorkTable Scan on he_per_dispenser s  
(cost=0.00..53.20 rows=2660 width=24) (actual time=0.001..0.001 rows=2 
loops=24813)     
                                    Output: s."timestamp", s.he_id, 
s.dispenser_id, s.activation_count  
                        ->  Sort  (cost=5234.90..5367.78 rows=53152 width=24) 
(actual time=0.001..2.540 rows=40748 loops=24813) 
                              Output: a.activation_id, a.dispenser_id, 
a.activation_count, a."timestamp"        
                              Sort Key: a.dispenser_id, a.activation_count      
                              Sort Method:  quicksort  Memory: 5689kB   
                              ->  CTE Scan on acts a  (cost=0.00..1063.04 
rows=53152 width=24) (actual time=0.000..5.727 rows=53152 loops=1)    
                                    Output: a.activation_id, a.dispenser_id, 
a.activation_count, a."timestamp"  
-- $1 = INTERVAL'1.5s' in the execution plan
INSERT INTO transformation.high_level_events(
        id,
        activation_id)
WITH RECURSIVE
        acts AS (
                /* This CTE should prevent duplicate physical tablescans.
                 * An order by as the plan suggests, seems worthless. The
                 * executed plan will will sort this CTE again in each 
iteration 
                 * of he_per_day.
                 */
                SELECT a.id AS activation_id,
                           e.day_id,
                           a.activation_count,
                           e.timestamp
                FROM transformation.activations a JOIN transformation.events e
                         USING(id)
                ORDER BY day_id, activatoin_count),
        he_per_day AS (
                SELECT a.activation_id,
                        min_acts.day_id,
                        min_acts.activation_count,
                        a.timestamp,
                        CASE
                                WHEN a.timestamp <= p.timestamp + $1 THEN
                                        p.last_high_level_event
                                ELSE a.activation_id
                        END AS he_id -- identifies a specific high level event
                FROM (SELECT day_id,
                                        MIN(activation_count) AS 
activation_count
                                FROM acts
                                GROUP BY day_id) min_acts
                        JOIN acts a
                                USING (day_id, activation_count)
                        -- check whether it belongs to the last high_level 
event processed
                        LEFT OUTER JOIN 
transformation.processed_activations_cache p
                                 USING (day_id)
                UNION ALL
                SELECT a.activation_id,
                        a.day_id,
                        a.activation_count,
                        a.timestamp,
                        CASE
                                WHEN a.timestamp - s.timestamp <= $1 THEN
                                        s.he_id
                                ELSE a.activation_id
                        END AS he_id
                FROM acts a     JOIN he_per_day s
                                ON a.day_id = s.day_id
                                        AND a.activation_count = 
s.activation_count + 1)
SELECT he_id AS id, activation_id
FROM he_per_day;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to