On Nov 7, 2024, at 9:54 PM, Andrei Lepikhov <lepi...@gmail.com> wrote:
> On 11/8/24 09:45, Ed Sabol wrote:
>> On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov <lepi...@gmail.com> wrote:
>>> Postgres didn't want Materialize in this example because of the low 
>>> estimation on its outer subquery. AFAIC, by increasing the *_page_cost's 
>>> value, you added extra weight to the inner subquery 
>> What kind of extended statistics do you suggest for this? ndistinct, 
>> dependencies, or mcv?
>> CREATE STATISTICS tablename_stats (<statistics type?>) ON relation, type 
>> FROM tablename;
>> ANALYZE tablename;
> I'd recommend to use all of them - MCV is helpful in most of the cases (and 
> relatively cheap), distinct is actually used in Postgres now to calculate 
> number of groups (GROUP-BY, Sort, Memoize, etc.); dependencies - to find 
> correlations between columns - usually in scan filters.

OK, I've executed the following:

CREATE STATISTICS tablename_stats_rt_nd  (ndistinct) ON relation, type FROM 
tablename;
CREATE STATISTICS tablename_stats_rt_mcv (mcv) ON relation, type FROM tablename;
CREATE STATISTICS tablename_stats_rt_dep (dependencies) ON relation, type FROM 
tablename;

CREATE STATISTICS tablename_stats_rv_nd  (ndistinct) ON relation, value FROM 
tablename;
CREATE STATISTICS tablename_stats_rv_mcv (mcv) ON relation, value FROM 
tablename;
CREATE STATISTICS tablename_stats_rv_dep (dependencies) ON relation, value FROM 
tablename;

CREATE STATISTICS tablename_stats_nr_nd  (ndistinct) ON name, relation FROM 
tablename;
CREATE STATISTICS tablename_stats_nr_mcv (mcv) ON name, relation FROM tablename;
CREATE STATISTICS tablename_stats_nr_dep (dependencies) ON name, relation FROM 
tablename;

CREATE STATISTICS tablename_stats_nt_nd  (ndistinct) ON name, type FROM 
tablename;
CREATE STATISTICS tablename_stats_nt_mcv (mcv) ON name, type FROM tablename;
CREATE STATISTICS tablename_stats_nt_dep (dependencies) ON name, type FROM 
tablename;

CREATE STATISTICS tablename_stats_nv_nd  (ndistinct) ON name, value FROM 
tablename;
CREATE STATISTICS tablename_stats_nv_mcv (mcv) ON name, value FROM tablename;
CREATE STATISTICS tablename_stats_nv_dep (dependencies) ON name, value FROM 
tablename;

ANALYZE tablename;

Now with random_page_cost = 4.0, the optimizer materializes, and it's fast 
again:

 Nested Loop  (cost=1226.12..11129.87 rows=1 width=112) (actual 
time=30.965..31.333 rows=1 loops=1)
   Join Filter: (a.name = d.name)
   Buffers: shared hit=7447
   ->  Nested Loop  (cost=1225.70..11112.51 rows=1 width=108) (actual 
time=30.921..31.208 rows=1 loops=1)
         Buffers: shared hit=7418
         ->  Hash Join  (cost=1225.27..11093.62 rows=1 width=86) (actual 
time=30.862..31.078 rows=1 loops=1)
               Hash Cond: ((a.name || '.doc'::text) = b_1.name)
               Buffers: shared hit=7389
               ->  Nested Loop  (cost=1167.53..11019.89 rows=11 width=70) 
(actual time=27.143..27.347 rows=1 loops=1)
                     Join Filter: (CASE WHEN ("position"(a.name, 'zz'::text) = 
1) THEN a.name ELSE ('h_'::text || a.name) END = "*SELECT* 1".table_name)
                     Rows Removed by Join Filter: 1021
                     Buffers: shared hit=6268
                     ->  Bitmap Heap Scan on tablename a  (cost=456.55..5407.28 
rows=1077 width=38) (actual time=2.986..15.865 rows=1022 loops=1)
                           Recheck Cond: (relation = 'description'::text)
                           Filter: (type = 'table'::text)
                           Rows Removed by Filter: 37044
                           Heap Blocks: exact=4024
                           Buffers: shared hit=4065
                           ->  Bitmap Index Scan on tablename_r  
(cost=0.00..456.29 rows=38915 width=0) (actual time=2.336..2.336 rows=44980 
loops=1)
                                 Index Cond: (relation = 'description'::text)
                                 Buffers: shared hit=41
                     ->  Materialize  (cost=710.98..5564.15 rows=2 width=64) 
(actual time=0.008..0.009 rows=1 loops=1022)
                           Buffers: shared hit=2203
                           ->  Append  (cost=710.98..5564.14 rows=2 width=64) 
(actual time=7.519..7.548 rows=1 loops=1)
                                 Buffers: shared hit=2203
                                 ->  Subquery Scan on "*SELECT* 1"  
(cost=710.98..3537.89 rows=1 width=64) (actual time=6.629..6.636 rows=0 loops=1)
                                       Buffers: shared hit=1380
                                       ->  Bitmap Heap Scan on tablename  
(cost=710.98..3537.88 rows=1 width=96) (actual time=6.628..6.633 rows=0 loops=1)
                                             Recheck Cond: ((relation = ANY 
('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[]))
 AND (type = 'table'::text))
                                             Filter: ((CASE relation WHEN 
'Mission'::text THEN upper(value) ELSE value END = 'foo'::text) AND (CASE 
relation WHEN 'defaultSearchRadius'::text THEN 'default_search_radius'::text 
WHEN 'Mission'::text THEN 'o_name'::text WHEN 'priority'::text THEN 
'table_priority'::text WHEN 'bibcode'::text THEN 'catalog_bibcode'::text WHEN 
'regime'::text THEN 'frequency_regime'::text WHEN 'author'::text THEN 
'table_author'::text WHEN 'tableType'::text THEN 'table_type'::text WHEN 
'subject'::text THEN 'row_type'::text ELSE relation END = 'o_name'::text))
                                             Rows Removed by Filter: 8253
                                             Heap Blocks: exact=1276
                                             Buffers: shared hit=1380
                                             ->  BitmapAnd  
(cost=710.94..710.94 rows=1275 width=0) (actual time=3.346..3.350 rows=0 
loops=1)
                                                   Buffers: shared hit=104
                                                   ->  Bitmap Index Scan on 
tablename_r  (cost=0.00..134.96 rows=9145 width=0) (actual time=0.573..0.574 
rows=9998 loops=1)
                                                         Index Cond: (relation 
= ANY 
('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[]))
                                                         Buffers: shared hit=49
                                                   ->  Bitmap Index Scan on 
tablename_t  (cost=0.00..575.73 rows=49507 width=0) (actual time=2.693..2.693 
rows=59373 loops=1)
                                                         Index Cond: (type = 
'table'::text)
                                                         Buffers: shared hit=55
                                 ->  Subquery Scan on "*SELECT* 5"  
(cost=10.28..2026.24 rows=1 width=64) (actual time=0.886..0.904 rows=1 loops=1)
                                       Buffers: shared hit=823
                                       ->  Bitmap Heap Scan on tablename 
tablename_1  (cost=10.28..2026.23 rows=1 width=96) (actual time=0.884..0.899 
rows=1 loops=1)
                                             Recheck Cond: (relation = 
'containedBy'::text)
                                             Filter: ((substr(value, 1, 8) = 
'mission:'::text) AND (upper("substring"(value, 9)) = 'foo'::text))
                                             Rows Removed by Filter: 721
                                             Heap Blocks: exact=820
                                             Buffers: shared hit=823
                                             ->  Bitmap Index Scan on 
tablename_r  (cost=0.00..10.28 rows=781 width=0) (actual time=0.085..0.085 
rows=905 loops=1)
                                                   Index Cond: (relation = 
'containedBy'::text)
                                                   Index Cond: (relation = 
'containedBy'::text)
                                                   Buffers: shared hit=3
               ->  Hash  (cost=44.87..44.87 rows=1030 width=38) (actual 
time=5.334..5.342 rows=1025 loops=1)
                     Buckets: 2048  Batches: 1  Memory Usage: 124kB
                     Buffers: shared hit=1121
                     ->  Bitmap Heap Scan on tablename b_1  (cost=33.06..44.87 
rows=1030 width=38) (actual time=1.157..4.018 rows=1025 loops=1)
                           Recheck Cond: ((relation = 'located'::text) AND 
(type = 'document'::text))
                           Heap Blocks: exact=1113
                           Buffers: shared hit=1121
                           ->  BitmapAnd  (cost=33.06..33.06 rows=3 width=0) 
(actual time=0.765..0.769 rows=0 loops=1)
                                 Buffers: shared hit=8
                                 ->  Bitmap Index Scan on tablename_r  
(cost=0.00..16.15 rows=1030 width=0) (actual time=0.347..0.347 rows=1227 
loops=1)
                                       Index Cond: (relation = 'located'::text)
                                       Buffers: shared hit=4
                                 ->  Bitmap Index Scan on tablename_t  
(cost=0.00..16.15 rows=1030 width=0) (actual time=0.314..0.315 rows=1227 
loops=1)
                                       Index Cond: (type = 'document'::text)
                                       Buffers: shared hit=4
         ->  Index Scan using tablename_n on tablename c  (cost=0.42..18.88 
rows=1 width=22) (actual time=0.048..0.115 rows=1 loops=1)
               Index Cond: (name = a.name)
               Filter: (relation = 'lastUpdated'::text)
               Rows Removed by Filter: 58
               Buffers: shared hit=29
   ->  Index Scan using tablename_n on tablename d  (cost=0.42..17.33 rows=1 
width=22) (actual time=0.034..0.104 rows=1 loops=1)
         Index Cond: (name = c.name)
         Filter: (relation = 'rowcount'::text)
         Rows Removed by Filter: 58
         Buffers: shared hit=29
 Planning:
   Buffers: shared hit=64
 Planning Time: 5.086 ms
 Execution Time: 32.226 ms
(81 rows)

This was a nice learning experience and I hope it will help with performance 
going forward, but I still think I'm going to keep random_page_cost = 2.0.

None of this really explains why this became a problem after ~10 years of it 
not being one, but I think the only likely reason is that the table just grew 
gradually over time and reached some threshold that changed the optimizer's 
plan very adversely.

Thanks,
Ed



Reply via email to