On 08/03/11 13:03, Mark Kirkwood wrote:
On 08/03/11 12:55, Mark Kirkwood wrote:
On 23/02/11 10:18, Mark Kirkwood wrote:
On 23/02/11 00:26, Greg Stark wrote:
It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.
Greg, thanks for clarifying this.
Unfortunately this time around I canceled the vacuum and then the
query. However *next* time I'll get rid of the query 1st and see
what happens.
I have another one here:
postgres=# select datname,procpid, query_start, current_query
from pg_stat_activity
where now() - query_start > '2 hours'::interval
order by query_start;
datname | procpid | query_start | current_query
---------+---------+-------------------------------+-----------------
stuff | 23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
postgres=# select datname,procpid, query_start, current_query
from pg_stat_activity
where now() - backend_start > '2 hours'::interval
order by query_start;
datname | procpid | query_start | current_query
---------+---------+-------------------------------+-----------------
stuff | 23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
:
stuff | 30018 | 2011-03-08 11:57:03.214177+13 | <IDLE>
stuff | 11390 | 2011-03-08 12:27:20.390122+13 | <IDLE>
stuff | 10178 | 2011-03-08 12:28:59.328265+13 | <IDLE>
stuff | 5100 | 2011-03-08 12:30:10.318204+13 | <IDLE>
stuff | 10255 | 2011-03-08 12:32:21.520814+13 | <IDLE>
stuff | 29993 | 2011-03-08 12:32:26.562835+13 | <IDLE>
stuff | 10204 | 2011-03-08 12:40:57.059958+13 | <IDLE>
stuff | 10206 | 2011-03-08 12:41:11.708924+13 | <IDLE>
(9 rows)
It looks to me like whatever query buffer pins the vacuum *was*
waiting on should be well and truly over by now. The strace for the
vacuum looks the same as before:
$ strace -p 23957
Process 23957 attached - interrupt to quit
semop(39747613, 0x7fff53d3dec0, 1
It certainly looks like the vacuum is not able to resume after
setting itself to sleep for some reason.
Immediately after I sent this, I saw the vacuum unlock itself and
continue... so there *are* some queries still blocking it. I'm seeing
some highish (1 -2 minute) lock waits for a certain table:
postgres=# select datname,procpid, query_start, current_query
from pg_stat_activity where now() - query_start > '1 minute'
and procpid in (select distinct pid from pg_locks where not
granted) order by query_start;
datname | procpid | query_start | current_query
---------+---------+-------------------------------+------------------------------
stuff | 390 | 2011-03-08 12:57:01.752234+13 | UPDATE
content.twitter_queue
: SET error = $1
: WHERE
twitter_queue_id = $2
(30 or so)
which are probably not helping.
Make that definitely not helping:
stuff=# select relname, last_vacuum
from pg_stat_all_tables
where last_vacuum is not null
and last_vacuum> '2011-03-08 06:03:06'::timestamp
order by last_vacuum limit 3;
relname | last_vacuum
----------------+-------------------------------
headline_alert | 2011-03-08 06:03:06.182866+13
stock_static | 2011-03-08 06:03:06.610739+13
twitter_queue | 2011-03-08 12:58:03.02451+13
(3 rows)
So we were trying to vacuum 'twitter_queue' (a small 20MB table) for
approx 6 hrs.
Note that the vacuum itself was not waiting in any locks according to
pg_locks, so looks to be the pinning issue. Clearly we need to amend the
application design that results in this quite long (1-2 minute)! wait
for the lock on this table.
regards
Mark
P.s: for the interested, the process(es) that the above UPDATEs are
waiting for are sitting '<IDLE> in transaction' - i.e application end is
doing some *non database* blocking call whilst holding a lock on the
'twitter_queue' table :-/
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs