When I run:
vacuumdb --full --all --analyze --quiet
on my database cluster it will complete in < 2 minutes (this cluster 
is a few million total rows and ~2GB).

After testing, I set this up as an off-hours cron job and it worked 
fine for several days then hung the whole database. After my pager 
pulled me from bed I found the vacuumdb process still running but the 
vacuum process on the first database (alphabetically) was showing it 
was waiting:
postgres: vacuumdb --full --all --analyze --quiet
postgres: postgres firstdb [local] VACUUM waiting

A couple hundred processes were showing as "startup waiting" and one 
was "idle in transaction". The process in the "VACUUM waiting" state 
was the only one connected to that database - all other connections 
were to other databases.

CPU and disk utilization were essentially zero. Suspecting a lock 
problem I attempted to use a pre-existing connection to view pg_locks 
but it would not respond.

I killed the vacuum process and all the processes in the "waiting" 
states cleared within a second or two and system returned to normal. 
The pg_locks query also returned but showed no useful info.

I tracked down the process that was "idle in transaction" and it was a 
pg_dump process running on another machine. This process does a 
periodic dump of one very small table and should complete in a 
fraction of a second but was still waiting since the previous day - 
apparently without deleterious effects.

There was no useful info in the log.

I've stopped running the vacuum full job via cron till I can trust it. 
Any ideas on how to track/prevent this behavior? Server is version 
7.4.1 and my web searches have proved futile.

Cheers,
Steve


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to