On Tue, Sep 10, 2024 at 4:58 PM Noah Misch <n...@leadboat.com> wrote: > ... a rule of "each wait event appears in one > pgstat_report_wait_start()" would be a rule I don't want.
As the original committer of the wait event stuff, I intended for the rule that you do not want to be the actual rule. However, I see that I didn't spell that out anywhere in the commit message, or the commit itself. > I see this level of fine-grained naming > as making the event name a sort of stable proxy for FILE:LINE. I'd value > exposing such a proxy, all else being equal, but I don't think wait event > names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way. Wait > event names should be more independent of today's code-level details. I don't agree with that. One of the most difficult parts of supporting PostgreSQL, in my experience, is that it's often very difficult to find out what has gone wrong when a system starts behaving badly. It is often necessary to ask customers to install a debugger and do stuff with it, or give them an instrumented build, in order to determine the root cause of a problem that in some cases is not even particularly complicated. While needing to refer to specific source code details may not be a common experience for the typical end user, it is extremely common for me. This problem commonly arises with error messages, because we have lots of error messages that are exactly the same, although thankfully it has become less common due to "could not find tuple for THINGY %u" no longer being a message that no longer typically reaches users. But even when someone has a complaint about an error message and there are multiple instances of that error message, I know that: (1) I can ask them to set the error verbosity to verbose. I don't have that option for wait events. (2) The primary function of the error message is to be understandable to the user, which means that it needs to be written in plain English. The primary function of a wait event is to make it possible to understand the behavior of the system and troubleshoot problems, and it becomes much less effective as soon as it starts saying that thing A and thing B are so similar that nobody will ever care about the distinction. It is very hard to be certain of that. When somebody reports that they've got a whole bunch of wait events on some wait event that nobody has ever complained about before, I want to go look at the code in that specific place and try to figure out what's happening. If I have to start imagining possible scenarios based on 2 or more call sites, or if I have to start by getting them to install a modified build with those properly split apart and trying to reproduce the problem, it's a lot harder. In my experience, the number of distinct wait events that a particular installation experiences is rarely very large. It is probably measured in dozens. A user who wishes to disregard the distinction between similarly-named wait events won't find it prohibitively difficult to look over the list of all the wait events they ever see and decide which ones they'd like to merge for reporting purposes. But a user who really needs things separated out and finds that they aren't is simply out of luck. -- Robert Haas EDB: http://www.enterprisedb.com