The situation seems pretty bad!! Here are the steps to reproduce in 'PostgreSQL 8.3beta2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)':
session 1: begin; session 1: update test set a = 112 where a = 112; session 2: update test set a = 113 where a = 112; --waits session 1: select * from pg_stat_activity; -- executed this a few times before executing 'select version()' and then following: session 1: <stat query1> -- see end of mail for this query procpid | current_query | waiting | duration | backend_start ---------+----------------------------------------+---------+------------------+------------------------------- 12577 | update test set a = 113 where a = 112; | t | -00:01: 35.782881 | 2008-02-01 13:36:15.31027-08 11975 | select * from pg_stat_activity ; | f | -00:01: 52.554697 | 2008-02-01 13:30:40.396392-08 (2 rows) session 1: select * from pg_locks locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------- -----+-------+------------------+--------- transactionid | | | | | | 390 | | | | 2/14 | 12577 | ShareLock | f transactionid | | | | | | 390 | | | | 1/9 | 11975 | ExclusiveLock | t <snip irrelevant> Then, session 2: ^C Cancel request sent ERROR: canceling statement due to user request session1: <stat query1> procpid | current_query | waiting | duration | backend_start ---------+----------------------------------------+---------+------------------+------------------------------- 12577 | update test set a = 113 where a = 112; | t | -00:01: 35.782881 | 2008-02-01 13:36:15.31027-08 11975 | select * from pg_stat_activity ; | f | -00:01: 52.554697 | 2008-02-01 13:30:40.396392-08 (2 rows) session 1: select * from pg_locks ; <no traces of pid 12577> session 1: select pg_backend_pid(); pg_backend_pid ---------------- 11975 The last mentioned output of <stat query> shows session 1 executing 'select * from p_s_a', whereas the <stat query> _is_ being executed in session 1!!! This result is consistently returned for a while, and later... session 2: select pg_backend_pid(); pg_backend_pid ---------------- 12577 session 1: <stat query1> procpid | current_query | waiting | duration | backend_start ---------+-----------------------+---------+-----------------+------------------------------- 11975 | <IDLE> in transaction | f | 00:06:08.671029 | 2008-02-01 13:30:40.396392-08 (1 row) After a while again: session 1: <stat query2> -- notice 2 not 1; 'select *' comes back to haunt!!! procpid | current_query | waiting | duration | backend_start ---------+----------------------------------------+---------+------------------+------------------------------- 12577 | update test set a = 113 where a = 112; | t | -00:01: 35.782881 | 2008-02-01 13:36:15.31027-08 11975 | select * from pg_stat_activity ; | f | -00:01: 52.554697 | 2008-02-01 13:30:40.396392-08 (2 rows) session 1: <stat query1> -- 1 back in action procpid | current_query | waiting | duration | backend_start ---------+----------------------------------------+---------+------------------+------------------------------- 12577 | update test set a = 113 where a = 112; | t | -00:01: 35.782881 | 2008-02-01 13:36:15.31027-08 11975 | select * from pg_stat_activity ; | f | -00:01: 52.554697 | 2008-02-01 13:30:40.396392-08 (2 rows) The <stat query1> is: select procpid, current_query::varchar(50), waiting, now() - query_start as duration, backend_start from pg_stat_activity where current_query <> '<IDLE>' and current_query not like '%DONT COUNT ME1 %' order by duration desc limit 10; The <stat query2> is: select procpid, current_query::varchar(50), waiting, now() - query_start as duration, backend_start from pg_stat_activity where current_query not like '%DONT COUNT ME1 %' order by duration desc limit 10; Found more bugs than I was looking for, to reproduce!!! The reporter also made an observation (on 8.2.4) that there were deadlocks detected at around the same time. Looked at WaitOnLock(), and clearly there's a problem, but is it at the same/only place we are suspecting it to be? Best regards, PS: Ran the <stat query>ies 1 and 2 again, just before hitting 'send', and the result is the same: procpid | current_query | waiting | duration | backend_start ---------+----------------------------------------+---------+------------------+------------------------------- 12577 | update test set a = 113 where a = 112; | t | -00:01: 35.782881 | 2008-02-01 13:36:15.31027-08 11975 | select * from pg_stat_activity ; | f | -00:01: 52.554697 | 2008-02-01 13:30:40.396392-08 (2 rows) Clearly, there's something wrong On Feb 1, 2008 8:16 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Gurjeet Singh" <[EMAIL PROTECTED]> writes: > > I saw a strange behaviour on one of the production boxes. The > > pg_stat_activity shows a process as <IDLE> and yet 'waiting' !!! On top > of > > it (understandably, since its IDLE), there are no entries for this pid > in > > pg_locks! > > Hmm, I can reproduce something like this by aborting a wait for lock. > It seems the problem is that WaitOnLock() is ignoring its own good > advice, assuming that it can do cleanup work after waiting. > > regards, tom lane > -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com 17° 29' 34.37"N, 78° 30' 59.76"E - Hyderabad 18° 32' 57.25"N, 73° 56' 25.42"E - Pune 37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco * http://gurjeet.frihost.net Mail sent from my BlackLaptop device