Hi

 

It is not your question but for such situations, you should consider using 
partitioning.

And more closely to your question: I would not disable autovacuum but it must 
not work with default values.

 

Best regards

 

Michel SALAIS

De : Leo <leo1...@gmail.com> 
Envoyé : vendredi 2 mai 2025 16:23
À : pgsql-performance@lists.postgresql.org
Objet : Vacuum Questions

 

I have been working on AWS PostgreSQL RDS for a few years, but still not very 
experienced when it comes to performance issues.  Plus RDS is slightly 
different from the pure PostgreSQL.

 

I am trying to comprehend exactly how vacuum works.

 

Here is what I am trying to do.

 

I am purging old records from a table (500 million rows, but I am doing it in 
sets of  50,000,000 with a smaller loop of 100,000).  That works just fine.

 

Because of the amount of data/rows deleted, I disabled the autovacuum for this 
table (I want to have control over vacuum, autovacuum does not complete anyway 
due to the timeout, sizing, etc settings that I do not want to change system 
wide).  I will put the autovacuum back once I am done of course.

 

The issue is when I start vacuuming.  This table has 4 indexes and a PK that I 
worry about.  The PK takes about 30 minutes to vacuum and two of the indexes 
take about an hour each.  The problem comes in for the other 2 indexes - they 
take 12+ hours each to vacuum:

 

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  vacuuming "public.pc_workflowlog"

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  launched 4 parallel vacuum 
workers for index vacuuming (planned: 4)

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"pc_workflowlog_pk" to remove 50000000 row versions

DETAIL:  CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_u_publicid_g6uqp9lkn6e8" to remove 50000000 row versions

DETAIL:  CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_workflow_2tc9k2hdtry9v" to remove 50000000 row versions

DETAIL:  CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 50000000 row versions

DETAIL:  CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_userid_14kqw6qdsnndw" to remove 50000000 row versions 

DETAIL:  CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  "pc_workflowlog": removed 
50000000 row versions in 1129870 pages

 

I've increased max_parallel_maintenance_workers to 8 for the session and it 
used parallel 4 (one for each index I assume) to handle it and the two indexes 
were done in ~ an hour.  What I am trying to figure out is how to force the 
other two large indexes to be vacuumed in parallel - a few workers going 
against an index.  It seems it is possible to do, the index size is large 
enough to kick in, but I have not been able to figure it out yet.  Most of the 
parameters are at default values.

 

What am I missing?

 

I have a few other questions.  Does vacuum time depend on the number of dead 
rows only and the size of the table, or does the entire storage allocation 
(including dead tuples) also affect it?

 

Would it be more beneficial to drop the two large indexes, purge, vacuum, and 
recreate the indexes after make more sense (I know it needs to be tested)?  The 
reason I am doing it in stages is to make sure I have enough time to vacuum, 
but maybe it would not take much longer to vacuum after the complete purge?

 

Lastly, is it better to delete all the rows (500 mil) instead of doing it in 
smaller batches, and vacuum only once?

 

The current size of the table is about 1T and the indexes add another 1.5T to 
it.

 

Truncate is not an option as I am only deleting rows older than 6 months.  
Client was not doing purging for years, but will do it after the clean up.

 

P.S. This is my very first post here, please advise if it is the wrong channel. 
 Thank you in advance.

Reply via email to