On 08/03/11 13:03, Mark Kirkwood wrote:
On 08/03/11 12:55, Mark Kirkwood wrote:
On 23/02/11 10:18, Mark Kirkwood wrote:
On 23/02/11 00:26, Greg Stark wrote:

It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.


Greg, thanks for clarifying this.

Unfortunately this time around I canceled the vacuum and then the query. However *next* time I'll get rid of the query 1st and see what happens.


I have another one here:

postgres=# select datname,procpid, query_start, current_query
                  from pg_stat_activity
where now() - query_start > '2 hours'::interval order by query_start;
 datname | procpid |          query_start          |  current_query
---------+---------+-------------------------------+-----------------
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;

postgres=# select datname,procpid, query_start, current_query
                  from pg_stat_activity
where now() - backend_start > '2 hours'::interval order by query_start;
 datname | procpid |          query_start          |  current_query
---------+---------+-------------------------------+-----------------
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
                                                   :
 stuff   |   30018 | 2011-03-08 11:57:03.214177+13 | <IDLE>
 stuff   |   11390 | 2011-03-08 12:27:20.390122+13 | <IDLE>
 stuff   |   10178 | 2011-03-08 12:28:59.328265+13 | <IDLE>
 stuff   |    5100 | 2011-03-08 12:30:10.318204+13 | <IDLE>
 stuff   |   10255 | 2011-03-08 12:32:21.520814+13 | <IDLE>
 stuff   |   29993 | 2011-03-08 12:32:26.562835+13 | <IDLE>
 stuff   |   10204 | 2011-03-08 12:40:57.059958+13 | <IDLE>
 stuff   |   10206 | 2011-03-08 12:41:11.708924+13 | <IDLE>
(9 rows)


It looks to me like whatever query buffer pins the vacuum *was* waiting on should be well and truly over by now. The strace for the vacuum looks the same as before:

$ strace -p 23957
Process 23957 attached - interrupt to quit
semop(39747613, 0x7fff53d3dec0, 1


It certainly looks like the vacuum is not able to resume after setting itself to sleep for some reason.




Immediately after I sent this, I saw the vacuum unlock itself and continue... so there *are* some queries still blocking it. I'm seeing some highish (1 -2 minute) lock waits for a certain table:

postgres=# select datname,procpid, query_start, current_query
           from pg_stat_activity where now() - query_start > '1 minute'
and procpid in (select distinct pid from pg_locks where not granted) order by query_start;
 datname | procpid |          query_start          |        current_query
---------+---------+-------------------------------+------------------------------ stuff | 390 | 2011-03-08 12:57:01.752234+13 | UPDATE content.twitter_queue
                                                   : SET error = $1
: WHERE twitter_queue_id = $2
(30 or so)

which are probably not helping.



Make that definitely not helping:

stuff=# select relname, last_vacuum
        from pg_stat_all_tables
        where last_vacuum is not null
        and last_vacuum>  '2011-03-08 06:03:06'::timestamp
        order by last_vacuum limit 3;
    relname     |          last_vacuum
----------------+-------------------------------
 headline_alert | 2011-03-08 06:03:06.182866+13
 stock_static   | 2011-03-08 06:03:06.610739+13
 twitter_queue  | 2011-03-08 12:58:03.02451+13
(3 rows)



So we were trying to vacuum 'twitter_queue' (a small 20MB table) for approx 6 hrs.

Note that the vacuum itself was not waiting in any locks according to pg_locks, so looks to be the pinning issue. Clearly we need to amend the application design that results in this quite long (1-2 minute)! wait for the lock on this table.

regards

Mark

P.s: for the interested, the process(es) that the above UPDATEs are waiting for are sitting '<IDLE> in transaction' - i.e application end is doing some *non database* blocking call whilst holding a lock on the 'twitter_queue' table :-/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to