On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: >> Moreover, it's pretty confusing that we have this general concept of >> wait events in pg_stat_activity, and then here the specific type of >> wait event we're waiting for is the ... wait event kind. Uh, what? > > Yeah, that is confusing. It comes about because of the coincidence > that pg_stat_activity finished up with a wait_event column, and our IO > multiplexing abstraction finished up with the name WaitEventSet. > <stuck-record-mode>We could instead call these new things "wait > points", because, well, erm, they name points in the code at which we > wait. They don't name events (they just happen to use the > WaitEventSet mechanism, which itself does involve events: but those > events are things like "hey, this socket is now ready for > writing").</>
Sure, we could do that, but it means breaking backward compatibility for pg_stat_activity *again*. I'd be willing to do it but I don't think I'd win any popularity contests. >> I have to admit that I like the individual event names quite a bit, >> and I think the detail will be useful to users. But I wonder if >> there's a better way to describe the class of events that we're >> talking about that's not so dependent on internal data structures. >> Maybe we could divide these waits into a couple of categories - e.g. >> "Socket", "Timeout", "Process" - and then divide these detailed wait >> events among those classes. > > Well we already discussed a couple of different ways to get "Socket", > "Latch", "Socket|Latch", ... or something like that into the > wait_event_type column or new columns. Wouldn't that be better, and > automatically fall out of the code rather than needing human curated > categories? Although Michael suggested that that should be done as a > separate patch on top of the basic feature. I think making that a separate patch is just punting the decision down the field to a day that may never come. Let's try to agree on something that we can all live with and implement it now. In terms of avoiding human-curated categories, I basically see two options: 1. Find a name that is sufficiently generic that it covers everything that might reach WaitEventSetWait either now or in the future when it might wait for even more kinds of things than it does now. For example, we could call it "Stuff" or "Thing". Less tongue-in-cheek suggestions are welcome, but it's hard to come up with something that is sufficiently-generic without being tautological. "Event" is an example of a name which is general enough to encompass everything but also stupid: the column is called "wait_event" so everything that appears in it is an event by definition. 2. Classify the events that fall into this category by some rigid principle based on the types of things being awaited. For example, we could decide that if any sockets are awaited, the event class will be "Client" if it is connected to a user and "IPC" for auxiliary processes. For myself, I don't see any real problem with using humans to classify things; that's pretty much the one thing humans are much better at than computers, so we might as well take advantage of it. I think that it's useful to try to group together types of waits which the user will see as logically related to each other, even if that involves applying some human judgement that might someday lead to some discussion about what the best categorization for some new thing is. PostgreSQL is intended to be used by humans, and avoiding discussions (or even arguments) on pgsql-hackers shouldn't outrank usability on the list of concerns. So, I tried to classify these. Here's what I came up with. Activity: ArchiverMain, AutoVacuumMain, BgWriterMain, BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll, RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain, WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather, MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive, MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep Extension: Extension I classified all of the main loop waits as waiting for activity; all of those are background processes that are waiting for something to happen and are more or less happy to sleep forever until it does. I also included the RecoveryWalAll and RecoveryWalStream events in there; those don't have the sort of "main loop" flavor of the others but they are happy to wait more or less indefinitely for something to occur. Likewise, it was pretty easy to find all of the events that were waiting for client I/O, and I grouped those all under "Client". A few of the remaining wait events seemed like they were clearly waiting for a particular timeout to expire, so I gave those their own "Timeout" category. I believe these categorizations are actually useful for users. For example, you might want to see all of the waits in the system but exclude the "Client", "Activity", and "Timeout" categories because those are things that aren't signs of a problem. A "Timeout" wait is one that you explicitly requested, a "Client" wait isn't the server's fault, and an "Activity" wait just means nothing is happening. In contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is actually delaying work that we'd ideally prefer to have get done sooner. I grouped the rest of this stuff as "IPC" because all of these events are cases where one server process is waiting for another server processes . That could be further subdivided, of course: most of those events are only going to occur in relation to parallel query, but I didn't want to group it that way explicitly because both background workers and shm_mq have other potential uses. ProcSignal and ProcSleep are related to heavyweight locks and SyncRep is of course related to synchronous replication. But they're all related in that one server process is waiting for another server process to tell it that a certain state has been reached, so IPC seems like a good categorization. Finally, extensions got their own category in this taxonomy, though I wonder if it would be better to instead have Activity/ExtensionActivity, Client/ExtensionClient, Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a separate toplevel category. To me, this seems like a pretty solid toplevel categorization and a lot more useful than just throwing all of these in one bucket and saying "good luck". I'm not super-attached to the details but, again, I think it's worth trying to bin things in a way that will be useful. >> The "SecureRead" and "SecureWrite" wait events are going to be >> confusing, because the connection isn't necessarily secure. I think >> those should be called "ClientRead" and "ClientWrite". >> Comprehensibility is more important than absolute consistency with the >> C function names. > > Devil's advocate mode: Then why not improve those function names? That'd be fine. > Obviously this gets complicated by the existence of static functions > whose names are ambiguous and lack context, and multiple wait points > in a single function. Previously I've suggested a hierarchical > arrangement for these names which might help with that. Imagine names > like: executor.Hash.<function> (reported by a background worker > executing a hypothetical parallel hash join), > executor.Hash.<function>.<something> to disambiguate multiple wait > points in one function, walsender.<function> etc. That way we could > have a tidy curated meaningful naming scheme based on modules, but a > no-brainer systematic way to name the most numerous leaf bits of that > hierarchy. Just an idea... Considering we only have a few dozen of those, this feels like it might be overkill to me, and I suspect we'll end up finding that it's a bit harder to make it consistent and useful than one might hope. I am basically happy with the way Michael named them, but that's not to say I could never be happy with anything else. >> Another thing to think about is that there's no way to actually see >> wait event information for a number of the processes which this patch >> instruments, because they don't appear in pg_stat_activity. > > Good point. Perhaps there could be another extended view, or system > process view, or some other mechanism. That could be material for a > separate patch? I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers