Also, is there a way to estimate the vacuum execution? Something like explain plan - without actually vacuuming, just to see how it will perform it - like a degree of parallelism?
On Fri, May 2, 2025 at 10:23 AM Leo <leo1...@gmail.com> wrote: > I have been working on AWS PostgreSQL RDS for a few years, but still not > very experienced when it comes to performance issues. Plus RDS is slightly > different from the pure PostgreSQL. > > I am trying to comprehend exactly how vacuum works. > > Here is what I am trying to do. > > I am purging old records from a table (500 million rows, but I am doing it > in sets of 50,000,000 with a smaller loop of 100,000). That works just > fine. > > Because of the amount of data/rows deleted, I disabled the autovacuum for > this table (I want to have control over vacuum, autovacuum does not > complete anyway due to the timeout, sizing, etc settings that I do not want > to change system wide). I will put the autovacuum back once I am done of > course. > > The issue is when I start vacuuming. This table has 4 indexes and a PK > that I worry about. The PK takes about 30 minutes to vacuum and two of the > indexes take about an hour each. The problem comes in for the other 2 > indexes - they take 12+ hours each to vacuum: > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: vacuuming > "public.pc_workflowlog" > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: launched 4 parallel vacuum > workers for index vacuuming (planned: 4) > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index > "pc_workflowlog_pk" to remove 50000000 row versions > > DETAIL: CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index > "workflowlo_u_publicid_g6uqp9lkn6e8" to remove 50000000 row versions > > DETAIL: CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index > "workflowlo_n_workflow_2tc9k2hdtry9v" to remove 50000000 row versions > > DETAIL: CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s > > *psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index > "workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 50000000 row versions* > > *DETAIL: CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s* > > *psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index > "workflowlo_n_userid_14kqw6qdsnndw" to remove 50000000 row versions * > > *DETAIL: CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s* > > psql:/home/backup/leo/fws_vacuum.sql:6: INFO: "pc_workflowlog": removed > 50000000 row versions in 1129870 pages > > I've increased max_parallel_maintenance_workers to 8 for the session and > it used parallel 4 (one for each index I assume) to handle it and the two > indexes were done in ~ an hour. What I am trying to figure out is how to > force the other two large indexes to be vacuumed in parallel - a few > workers going against an index. It seems it is possible to do, the index > size is large enough to kick in, but I have not been able to figure it out > yet. Most of the parameters are at default values. > > What am I missing? > > I have a few other questions. Does vacuum time depend on the number of > dead rows only and the size of the table, or does the entire storage > allocation (including dead tuples) also affect it? > > Would it be more beneficial to drop the two large indexes, purge, vacuum, > and recreate the indexes after make more sense (I know it needs to be > tested)? The reason I am doing it in stages is to make sure I have enough > time to vacuum, but maybe it would not take much longer to vacuum after the > complete purge? > > Lastly, is it better to delete all the rows (500 mil) instead of doing it > in smaller batches, and vacuum only once? > > The current size of the table is about 1T and the indexes add another 1.5T > to it. > > Truncate is not an option as I am only deleting rows older than 6 months. > Client was not doing purging for years, but will do it after the clean up. > > P.S. This is my very first post here, please advise if it is the wrong > channel. Thank you in advance. >