Re: [HACKERS] Stats collection on Windows

Tom Lane Wed, 05 Apr 2006 07:49:32 -0700

"Peter Brant" <[EMAIL PROTECTED]> writes:
> I added some strategic printfs to pgstat.c.  Attached is the output when
> a little program is run which, in a loop, makes 10 connections, sleeps 3
> seconds, closes them, sleeps another 3 seconds.  My workstation (Windows
> XP) was otherwise idle.


> Search for "is known to be dead, ignoring" to find the re-used process
> IDs.  Things start out clean, but after a few cycles anywhere between 1
> and 5 backends are being missed.

Looking at the pgstats code, I notice that once it makes an entry in the
dead-backends hashtable, it keeps that entry (rejecting any messages
with the same PID) for 10 seconds.  That seems like approximately
forever on modern machines, certainly much more than any plausible
out-of-order condition in the UDP packet stream.  It could easily be
enough to get us in trouble on Unix machines, never mind Windows.

A conservative suggestion would be to trim down the destroy interval.
A more radical one is to question whether we need the destroy delay
mechanism at all.  What if we got rid of all that logic and simply let
the collector delete stuff when it's told to?  Out-of-order messages
could cause entries to be re-created after they've been deleted, but
I'm not sure that I see any harm in that.  Bogus DB and table entries
are already ignored in the pgstats views (because they won't join to
anything in the system catalogs) and we also have a filter for bogus
backend entries.  There are also mechanisms that ensure these entries
will go away eventually: pgstat_vacuum_tabstat for DB and table
entries, and eventual re-use of a BackendId slot for backends.
So I'm sort of thinking that the destroy delay has outlived its
usefulness.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Stats collection on Windows

Reply via email to