Re: Query on pg_stat_activity table got stuck

2019-05-11 Thread Tom Lane
I wrote: > This patch is against HEAD --- I've not looked at how much adjustment > it'll need for the back branches, but I'm sure there's some. I've back-patched this now. If you want to test the patch, the v10-branch version is at https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread Tom Lane
neeraj kumar writes: > --> But the right answer is to fix it on the writing side. > Yes I agree with this part. Even though there is very low probability, a > process can still be killed in middle when writing. So what is your > suggestion on how to recover from this automatically? Here's a draf

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread neeraj kumar
--> But the right answer is to fix it on the writing side. Yes I agree with this part. Even though there is very low probability, a process can still be killed in middle when writing. So what is your suggestion on how to recover from this automatically? On Fri, May 10, 2019 at 9:14 AM Tom Lane wr

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread Tom Lane
neeraj kumar writes: > Also curious why query on pg_stat_activity is considering terminated > process ? The short answer to that is that this bug leaves shared memory in a corrupt state. It's not really useful to worry about whether readers should react differently to that --- you could spend en

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread neeraj kumar
Also curious why query on pg_stat_activity is considering terminated process ? Irrespective of corrupted state or not, ideally query on pg_stat_activity should ignore already terminated process. My 2 cents. On Fri, May 10, 2019 at 8:01 AM neeraj kumar wrote: > There are multiple ways see this p

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread neeraj kumar
There are multiple ways see this problem. One way I am seeing is : how system will auto-recover from this particular state. So ideally if st_procpid is set to zero it means this process is already terminated, however it might be have left some corrupted information in memory. So when other compone

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread neeraj kumar
We got more information about this issue. There is one backend process still present into beentry which has changecount as odd value. However this process is long gone/terminated. It means when this process was killed/terminated its entry was not cleaned from beentry list. There seems to be some sh

Re: Query on pg_stat_activity table got stuck

2019-05-10 Thread neeraj kumar
Tom, may be I didn't make my point clear. There are two issues : 1) Why this value was left as odd 2) Why backend entry is still pending in beentry for backend process even after it was killed/terminated. I am talking about 2nd issue. My understanding is query on pg_stat_activity goes via all back

Re: Query on pg_stat_activity table got stuck

2019-05-09 Thread Tom Lane
neeraj kumar writes: > Tom, may be I didn't make my point clear. > There are two issues : > 1) Why this value was left as odd Because a function called by pgstat_bestart threw an error, is what I'm guessing. > 2) Why backend entry is still pending in beentry for backend process even > after it w

Re: Query on pg_stat_activity table got stuck

2019-05-09 Thread Tom Lane
neeraj kumar writes: > We got more information about this issue. There is one backend process > still present into beentry which has changecount as odd value. However this > process is long gone/terminated. It means when this process was > killed/terminated its entry was not cleaned from beentry l

Re: Query on pg_stat_activity table got stuck

2019-05-09 Thread Tom Lane
Jeremy Schneider writes: > Seems to me that at a minimum, this loop shouldn't go on forever. Even > having an arbitrary, crazy high, hard-coded number of attempts before > failure (like a million) would be better than spinning on the CPU > forever - which is what we are seeing. I don't think it's

Re: Query on pg_stat_activity table got stuck

2019-05-09 Thread neeraj kumar
Yes we use SSL to connect to DB. Looked into code related to st_changecount : https://github.com/postgres/postgres/blob/659e53498c3c04e4f400323c02bef98fe8d13ec8/src/include/pgstat.h#L1015-L1044 >From comment seems like each backend should have its own copy of PgBackendStatus, it means st_changeco

Re: Query on pg_stat_activity table got stuck

2019-05-08 Thread Tom Lane
neeraj kumar writes: > Yes we use SSL to connect to DB. Hm. I'm suspicious that one of the functions that fetch data for an SSL connection threw an error. In particular, it doesn't look to be hard at all to make X509_NAME_to_cstring fall over --- an encoding conversion failure would do it, even

Re: Query on pg_stat_activity table got stuck

2019-05-08 Thread neeraj kumar
Took some time to get stack trace as we didn't had root permission. Attaching stack trace of two process (out of many) stuck for same query below[1][2] Seems like call is unable to come out of this loop : https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

Re: Query on pg_stat_activity table got stuck

2019-05-08 Thread Tom Lane
neeraj kumar writes: > Took some time to get stack trace as we didn't had root permission. > Attaching stack trace of two process (out of many) stuck for same query > below[1][2] Hmm, the line numbers in your stack traces don't agree with either v10 or HEAD branches for me. But assuming that you

Re: Query on pg_stat_activity table got stuck

2019-05-06 Thread Tom Lane
neeraj kumar writes: > We are using PG 10.6. We have one cron job that queries pg_stat_activity > table to find out how many queries are running longer than X minutes and > generate metrics. > Query look like this : > SELECT * FROM pg_stat_activity WHERE state='active' > After some days, this qu