Re: Query on pg_stat_activity table got stuck

neeraj kumar Fri, 10 May 2019 08:41:33 -0700

Also curious why query on pg_stat_activity is considering terminated
process ? Irrespective of corrupted state or not, ideally query on
pg_stat_activity
should ignore already terminated process.
My 2 cents.



On Fri, May 10, 2019 at 8:01 AM neeraj kumar <neeru....@gmail.com> wrote:

> There are multiple ways see this problem. One way I am seeing is : how
> system will auto-recover from this particular state.
>
> So ideally if st_procpid is set to zero it means this process is already
> terminated, however it might be have left some corrupted information in
> memory. So when other components tries to read beentry, they should also
> check if st_procpid is already set to zero, if yes it means this process
> is gone and no need to consider this process any more.
>
> Agree this is solving particular issue about pg_stat_activity however
> don't see any harm in adding that check.
>
>
> On Thu, May 9, 2019 at 9:29 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> neeraj kumar <neeru....@gmail.com> writes:
>> > Tom, may be I didn't make my point clear.
>> > There are two issues :
>> > 1) Why this value was left as odd
>>
>> Because a function called by pgstat_bestart threw an error, is what
>> I'm guessing.
>>
>> > 2) Why backend entry is still pending in beentry for backend process
>> even
>> > after it was killed/terminated.
>> > I am talking about 2nd issue. My understanding is query on
>> pg_stat_activity
>> > goes via all backend entries via beentry and it finds this
>> wrong/corrupted
>> > entry. When a process terminates, ideally this backend entry into
>> beentery
>> > should have also been cleaned. But why this still there? Whose
>> > responsibility it is to remove entry from beentry when process
>> terminates ?
>>
>> If you're imagining that something takes an electronic hacksaw to shared
>> memory and physically removes that array entry, you're mistaken.  It's
>> still going to be there.  Now, there *is* cleanup code --- this bit of
>> pgstat_beshutdown_hook is supposed to mark the process's entry as
>> not-in-use:
>>
>>         /*
>>          * Clear my status entry, following the protocol of bumping
>> st_changecount
>>          * before and after.  We use a volatile pointer here to ensure the
>>          * compiler doesn't try to get cute.
>>          */
>>         pgstat_increment_changecount_before(beentry);
>>
>>         beentry->st_procpid = 0;        /* mark invalid */
>>
>>         pgstat_increment_changecount_after(beentry);
>>
>> However, if something had left st_changecount odd before we got to this,
>> it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
>> doesn't help readers because they're not supposed to believe st_procpid
>> is valid unless the changecount is even.
>>
>> You could maybe argue that this cleanup code should be trying to ensure
>> that the changecount is left even, but I don't think that's an appropriate
>> response.  If it's not even when we get here, we already screwed up very
>> badly, because we were breaking the protocol for (potentially) as long as
>> the process has been in existence.  Moreover, if you're worried about
>> corner cases where we did mess that up, what of corner cases where process
>> exit fails before getting here?  I think the right answer is to bring the
>> hammer down as soon as we mess up, which is what the critical-section
>> mechanism is for.
>>
>>                         regards, tom lane
>>
>
>
> --
> -------------------------------------
> Thanks
> Neeraj Kumar,
> +1  (206) 427-7267
>


-- 
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

Reply via email to