Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

Adrian Klaver Thu, 14 May 2020 12:39:26 -0700

On 5/14/20 12:03 PM, Eduard Rozenberg wrote:

@Adrian thanks again.


I read the postgres docs the same way - that previously used space is marked as 
available and therefore no need for vacuum full. Previously used = now 
available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 TB of 
previously used space is clearly available ("clearly available" as proven by 
vacuum full shrinking the DB space usage by 2.4 TB). I did verify postgresql.conf has 
always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on'

Well if I'm counting zeros right 50GB on 4.4TB database is ~1.14%. Doesthat sound right for new data being added?

One place to look to see how aggressively the autovacuum is being donehere:


https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

What you find there may mean tweaking the values as explained here:

https://www.postgresql.org/docs/12/routine-vacuuming.html#AUTOVACUUM

Could be there are just a few tables that account for most of the churnand a manual VACUUM on them is needed. Say if there are regularlyscheduled large UPDATEs to tables, incorporate a VACUUM after.


I'm not planning on running VACUUM FULL regularly, just "this one time". And I was trying to to 
parallelize VACUUM FULL and minimize downtime of the production DB caused by table locking. And then I found 
the option of using "vacuumdb --jobs" which sounded like the perfect solution except for "well 
you can't actually use --jobs because you'll run into a deadlock and everybody knows that and nobody has a 
(good) solution for it" :).

--Ed

On May 14, 2020, at 11:46, Adrian Klaver <adrian.kla...@aklaver.com> wrote:

On 5/14/20 11:27 AM, Eduard Rozenberg wrote:

@Adrian thanks.
I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
previously on a test db copy and saw the DB size (postgres 9.6) shrink from 6.8 TB to 4.4 
TB of actual, real disk space usage ($ df -d 1).
I don't know the reason so much space was "locked up" (other than there is a lot of data 
"churn" from daily updates). But I definitely do need to do the vac full on the production db to 
get down to the smaller size - cannot afford the 2.4 TB of "wasted" space on an ongoing basis.


It may not be wasted space. A regular VACUUM marks space within a table 
available for reuse(and not returned to OS) when it removes unneeded tuples. It 
then fills that space up with new tuples, roughly speaking. So if the vacuuming 
is working properly you will reach a steady state where space is reused and the 
database on disk size grows slowly as reusable space is occupied. I would 
monitor the database size on a regular basis. My guess is that the VACUUM FULL 
is dropping the OS used space and then it fills up again as the database does 
those updates.

Based on your email it sounds like the vacuuming of those pg_catalog.* tables 
is done regardless, as a normal part of doing vac full on my own database.
Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without 
running into the expected and documented deadlock. Only method I'm aware of is to list each table 
individually with "-t table1 -t table2..." to "vacuum db --jobs" which is not 
pleasant and not exceedingly beautiful.
Thanks.



--
Adrian Klaver
adrian.kla...@aklaver.com



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

Reply via email to