On 2014-10-07 10:30:54 -0400, Robert Haas wrote: > On Tue, Oct 7, 2014 at 10:12 AM, Andres Freund <and...@2ndquadrant.com> wrote: > > Have you tried/considered putting the counters into a per-backend array > > somewhere in shared memory? That way they don't blow up the size of > > frequently ping-ponged cachelines. Then you can summarize those values > > whenever querying the results. > > The problem with that is that you need O(N*M) memory instead of O(N), > where N is the number of lwlocks and M is the number of backends.
Right. > That gets painful in a hurry. We just got rid of something like that > with your patch to get rid of all the backend-local buffer pin arrays; > I'm not keen to add another such thing right back. I think it might be ok if we'd exclude buffer locks and made it depend on a GUC. > It might be viable > if we excluded the buffer locks, but that also detracts from the > value. Plus, no matter how you slice it, you're now touching cache > lines completely unrelated to the ones you need for the foreground > work. That's got a distributed overhead that's hard to measure, but > it is certainly going to knock other stuff out of the CPU caches to > some degree. Yea, it's hard to guess ;( > >> As a further point, when I study the LWLOCK_STATS output, that stuff > >> is typically what I'm looking for anyway. The first few times I ran > >> with that enabled, I was kind of interested by the total lock counts > >> ... but that quickly got uninteresting. The blocking and spindelays > >> show you where the problems are, so that's the interesting part. > > > > I don't really agree with this. Especially with shared locks (even more > > so if/hwen the LW_SHARED stuff gets in), there's simply no relevant > > blocking and spindelay. > > If your patch to implement lwlocks using atomics goes in, then we may > have to reassess what instrumentation is actually useful here. I can > only comment on the usefulness of various bits of instrumentation I > have used in the past on the code bases we had that time, or my > patches thereupon. Nobody here can reasonably be expected to know > whether the same stuff will still be useful after possible future > patches that are not even in a reviewable state at present have been > committed. It's not like it'd be significantly different today - in a read mostly workload that's bottlenecked on ProcArrayLock you'll not see many waits. There you'd have to count the total number of spinlocks cycles to measure anything interesting. > Having said that, if there's no blocking or spindelay any more, to me > that doesn't mean we should look for some other measure of contention > instead. It just means that the whole area is a solved problem, we > don't need to measure contention any more because there isn't any, and > we can move on to other issues once we finish partying. But mildly > skeptical that the outcome will be as good as all that. It's not. Just because we're not waiting in a spinlock loop doesn't mean there can't be contention... It's just moved one level down, into the cpu. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers