We recently took a copy of our production data (running on 8.4.2), scrubbed 
many data fields, and then loaded it onto a qa server (running 8.4.8). We're 
seeing some odd planner performance that I think might be a bug, though I'm 
hoping it's just idiocy on my part. I've analyzed things and looked into 
pg_stats and it seems as if the relevant columns have about the same 
statistics. 

I've managed to simplify the query, but if I make it any simpler, then the two 
servers end up with the same good plan. The query is down to:

SELECT machines.quota_purchased 
        FROM machines
        WHERE NOT deleted AND machines.user_id IN (
            SELECT id FROM users WHERE user_group_id IN (
             select 607547 offset 0
            ) OFFSET 0
          );


(Those "offset 0" are in there to protect us from planner regressions we saw 
when moving to 8.4. When we move to 9, they can hopefully go away.)

On the production server, this returns a fairly accurate plan:

                                                                        QUERY 
PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=843.59..1447.90 rows=243 width=8) (actual time=0.044..0.045 
rows=1 loops=1)
   ->  HashAggregate  (cost=843.59..845.59 rows=200 width=4) (actual 
time=0.027..0.027 rows=1 loops=1)
         ->  Limit  (cost=0.02..823.90 rows=1575 width=4) (actual 
time=0.024..0.025 rows=1 loops=1)
               ->  Nested Loop  (cost=0.02..823.90 rows=1575 width=4) (actual 
time=0.023..0.024 rows=1 loops=1)
                     ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) 
(actual time=0.005..0.005 rows=1 loops=1)
                           ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.002 rows=1 loops=1)
                                 ->  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.000..0.000 rows=1 loops=1)
                     ->  Index Scan using users_user_groups_idx on users  
(cost=0.00..804.18 rows=1575 width=8) (actual time=0.017..0.018 rows=1 loops=1)
                           Index Cond: (users.user_group_id = (607547))
   ->  Index Scan using machines_sid_un on machines  (cost=0.00..3.00 rows=1 
width=12) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: (machines.user_id = users.id)
 Total runtime: 0.121 ms
(12 rows)


On the QA server, things are not so accurate. It doesn't hurt the timing of 
this simplified query much, but when put into the actual query, the row 
estimation being off by 6 orders of magnitude really throws the planning in the 
wrong direction. The plan on the QA server is:

                                                                        QUERY 
PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1887.16..3671.20 rows=1192462 width=8) (actual 
time=0.049..0.051 rows=1 loops=1)
   ->  HashAggregate  (cost=1887.16..1889.16 rows=200 width=4) (actual 
time=0.032..0.033 rows=1 loops=1)
         ->  Limit  (cost=0.02..1868.20 rows=1517 width=4) (actual 
time=0.027..0.029 rows=1 loops=1)
               ->  Nested Loop  (cost=0.02..1868.20 rows=1517 width=4) (actual 
time=0.027..0.028 rows=1 loops=1)
                     ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) 
(actual time=0.008..0.008 rows=1 loops=1)
                           ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
                                 ->  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.001..0.001 rows=1 loops=1)
                     ->  Index Scan using users_user_groups_idx on users  
(cost=0.00..1849.20 rows=1517 width=8) (actual time=0.015..0.016 rows=1 loops=1)
                           Index Cond: (users.user_group_id = (607547))
   ->  Index Scan using machines_sid_un on machines  (cost=0.00..8.90 rows=1 
width=12) (actual time=0.013..0.013 rows=1 loops=1)
         Index Cond: (machines.user_id = users.id)
 Total runtime: 0.148 ms
(12 rows)



The problem here (I think) seems to be that the QA server believes that running 
a nested loop over 200 users.id values and joining that against 
machines.user_id will result in >1M rows. The production servers sees this more 
accurately as the nearly 1:1 relationship that it is.

The reason I wonder if this might be a bug is because if I change the obtuse 
clause "WHERE user_group_id IN (select 607547 offset 0)" to simply "where 
user_group_id in (607547)" then the plan collapses to the same plan on both 
servers:

explain analyze SELECT machines.quota_purchased
        FROM machines
        WHERE NOT deleted AND machines.user_id IN (
            SELECT id FROM users WHERE user_group_id IN (
             607547         
            ) OFFSET 0
          );


                                                                   QUERY PLAN   
                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=82.27..636.70 rows=62 width=8) (actual time=0.042..0.043 
rows=1 loops=1)
   ->  HashAggregate  (cost=82.27..82.88 rows=61 width=4) (actual 
time=0.024..0.025 rows=1 loops=1)
         ->  Limit  (cost=0.00..81.51 rows=61 width=4) (actual 
time=0.017..0.018 rows=1 loops=1)
               ->  Index Scan using users_user_groups_idx on users  
(cost=0.00..81.51 rows=61 width=4) (actual time=0.015..0.016 rows=1 loops=1)
                     Index Cond: (user_group_id = 607547)
   ->  Index Scan using machines_sid_un on machines  (cost=0.00..9.07 rows=1 
width=12) (actual time=0.013..0.013 rows=1 loops=1)
         Index Cond: (machines.user_id = users.id)
 Total runtime: 0.106 ms
(8 rows)


But, as I understand it, that remaining OFFSET 0 should keep the planner from 
caring that it's a single value or the result of a subselect. 


Anyway, anything I can check? The hardware is different so the configuration 
files are different, but they're not that different in anything other than 
effective_cache_size. Certainly not different in any way I can think that would 
affect this kind of planning mis-estimation. 

Reply via email to