"Peter Brant" <[EMAIL PROTECTED]> writes: > I added some strategic printfs to pgstat.c. Attached is the output when > a little program is run which, in a loop, makes 10 connections, sleeps 3 > seconds, closes them, sleeps another 3 seconds. My workstation (Windows > XP) was otherwise idle.
> Search for "is known to be dead, ignoring" to find the re-used process > IDs. Things start out clean, but after a few cycles anywhere between 1 > and 5 backends are being missed. Looking at the pgstats code, I notice that once it makes an entry in the dead-backends hashtable, it keeps that entry (rejecting any messages with the same PID) for 10 seconds. That seems like approximately forever on modern machines, certainly much more than any plausible out-of-order condition in the UDP packet stream. It could easily be enough to get us in trouble on Unix machines, never mind Windows. A conservative suggestion would be to trim down the destroy interval. A more radical one is to question whether we need the destroy delay mechanism at all. What if we got rid of all that logic and simply let the collector delete stuff when it's told to? Out-of-order messages could cause entries to be re-created after they've been deleted, but I'm not sure that I see any harm in that. Bogus DB and table entries are already ignored in the pgstats views (because they won't join to anything in the system catalogs) and we also have a filter for bogus backend entries. There are also mechanisms that ensure these entries will go away eventually: pgstat_vacuum_tabstat for DB and table entries, and eventual re-use of a BackendId slot for backends. So I'm sort of thinking that the destroy delay has outlived its usefulness. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster