Hi All,

I'm still trying to get a better understanding of the autovacuum process. 
This is a different postgres installation as my previous posts and confusing me 
in new ways.
Still 11.4 running on CentOS 7 and 8 nvme in software raid

This issue started with postgres "...not accepting commands to avoid 
On this server I was able to stop all access to DB and dedicate resources to 
only postgres. I thought I could allow autovacuum to do its thing with a ton of 

I think everything boils down to 2 questions:
1. Can autovacuum or manual vacuum be coerced into dealing with oldest first?
    1a. Where might I find advice on configuring postgres resources for maximum 
cpu & memory maintenance use. In other words quickest path out of "not 
accepting commands" land. Besides increasing autovacuum_freeze_max_age.
2. What can cause autovacuum to stall? Could associated toast or index bne the 

It appeared that autovacuum was not choosing the tables with the oldest xmin so 
I produced an ordered list of oldest tables with:
SELECT oid::regclass, age(relfrozenxid)
FROM pg_class
WHERE relkind IN ('r', 't', 'm')
AND age(relfrozenxid) > 2000000000

The list contained over 6000 tables from pg_toast. They all belonged to daily 
reports tables. The reports are created daily and not touched again.

Most of the autovacuums that did start seem to be hung. Never completing even 
on the simplest tables. 
The newest 2 autovacuums in the list are completing about one every couple 
CPU and disk IO are nearly idle.
An example table is shown here:

phantom=# select
phantom-#       pg_size_pretty(pg_total_relation_size(relid)) as total_size,
phantom-#       pg_size_pretty(pg_relation_size(relid, 'main')) as 
phantom-#       pg_size_pretty(pg_relation_size(relid, 'fsm')) as 
phantom-#       pg_size_pretty(pg_relation_size(relid, 'vm')) as 
phantom-#       pg_size_pretty(pg_relation_size(relid, 'init')) as 
phantom-#       pg_size_pretty(pg_table_size(relid)) as table_size,
phantom-#       pg_size_pretty(pg_total_relation_size(relid) - 
pg_relation_size(relid)) as external_size
phantom-#  from
phantom-#       pg_catalog.pg_statio_user_tables
phantom-# where
phantom-#   relname like 'report_user_439';
 total_size | relation_size_main | relation_size_fsm | relation_size_vm | 
relation_size_init | table_size | external_size
 80 kB      | 8192 bytes         | 24 kB             | 8192 bytes       | 0 
bytes            | 48 kB      | 72 kB
(1 row)

I scripted a vacuum loop using the oldest table list. It's extremely slow but 
it was making better progress than autovacuum was.

Using ps I see that there were as many worker processes as defined with 
autovacuum_max_workers but pg_stat_activity consistantly showed 19. I killed 
the script thinking there might be a conflict. I saw no difference after 30 
minutes so restarted script. Never saw anything in pg_stat_progress_vacuum.

vacuum settings:
                name                 |  setting  
 autovacuum                          | on        
 autovacuum_analyze_scale_factor     | 0.1       
 autovacuum_analyze_threshold        | 50        
 autovacuum_freeze_max_age           | 200000000 
 autovacuum_max_workers              | 40        
 autovacuum_multixact_freeze_max_age | 400000000 
 autovacuum_naptime                  | 4         
 autovacuum_vacuum_cost_delay        | 0         
 autovacuum_vacuum_cost_limit        | 5000      
 autovacuum_vacuum_scale_factor      | 0.2       
 autovacuum_vacuum_threshold         | 50        
 autovacuum_work_mem                 | -1        
 log_autovacuum_min_duration         | 0         
 vacuum_cleanup_index_scale_factor   | 0.1       
 vacuum_cost_delay                   | 0         
 vacuum_cost_limit                   | 200       
 vacuum_cost_page_dirty              | 20        
 vacuum_cost_page_hit                | 1         
 vacuum_cost_page_miss               | 10        
 vacuum_defer_cleanup_age            | 0         
 vacuum_freeze_min_age               | 50000000  
 vacuum_freeze_table_age             | 150000000 
 vacuum_multixact_freeze_min_age     | 5000000   
 vacuum_multixact_freeze_table_age   | 150000000 

I'm now thinking that autovacuum getting hung up is what caused the issue to 
begin with. I see nothing but the successful vacuums from the script and my own 
fat-fingering commands in the postgres logs (set at info).

Any hints are appreciated.

Reply via email to