Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <and...@anarazel.de> escreveu:
> Hi, > > On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote: > > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra < > > tomas.von...@enterprisedb.com> escreveu: > > > > > On 5/27/22 02:11, Ranier Vilela wrote: > > > > > > > > ... > > > > > > > > Here the results with -T 60: > > > > > > Might be a good idea to share your analysis / interpretation of the > > > results, not just the raw data. After all, the change is being proposed > > > by you, so do you think this shows the change is beneficial? > > > > > I think so, but the expectation has diminished. > > I expected that the more connections, the better the performance. > > And for both patch and head, this doesn't happen in tests. > > Performance degrades with a greater number of connections. > > Your system has four CPUs. Once they're all busy, adding more connections > won't improve performance. It'll just add more and more context switching, > cache misses, and make the OS scheduler do more work. > conns tps head 10 82365.634750 50 74593.714180 80 69219.756038 90 67419.574189 100 66613.771701 Yes it is quite disappointing that with 100 connections, tps loses to 10 connections. > > > > > GetSnapShowData() isn't a bottleneck? > > I'd be surprised if it showed up in a profile on your machine with that > workload in any sort of meaningful way. The snapshot reuse logic will > always > work - because there are no writes - and thus the only work that needs to > be > done is to acquire the ProcArrayLock briefly. And because there is only a > small number of cores, contention on the cacheline for that isn't a > problem. > Thanks for sharing this. > > > > > These results look much saner, but IMHO it also does not show any clear > > > benefit of the patch. Or are you still claiming there is a benefit? > > > > > We agree that they are micro-optimizations. However, I think they > should be > > considered micro-optimizations in inner loops, because all in > procarray.c is > > a hotpath. > > As explained earlier, I don't agree that they optimize anything - you're > making some of the scalability behaviour *worse*, if it's changed at all. > > > > The first objective, I believe, was achieved, with no performance > > regression. > > I agree, the gains are small, by the tests done. > > There are no gains. > IMHO, I must disagree. > > > > But, IMHO, this is a good way, small gains turn into big gains in the > end, > > when applied to all code. > > > > Consider GetSnapShotData() > > 1. Most of the time the snapshot is not null, so: > > if (snaphost == NULL), will fail most of the time. > > > > With the patch: > > if (snapshot->xip != NULL) > > { > > if (GetSnapshotDataReuse(snapshot)) > > return snapshot; > > } > > > > Most of the time the test is true and GetSnapshotDataReuse is not called > > for new > > snapshots. > > count, subcount and suboverflowed, will not be initialized, for all > > snapshots. > > But that's irrelevant. There's only a few "new" snapshots in the life of a > connection. You're optimizing something irrelevant. > IMHO, when GetSnapShotData() is the bottleneck, all is relevant. > > > > 2. If snapshot is taken during recoverys > > The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily. > > That code isn't reached when in recovery? > Currently it is reached *even* when not in recovery. With the patch, *only* is reached when in recovery. > > > 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock. > > There's an agreement that this would be fine, for now. > > There's no such agreement at all. It's not correct. > Ok, but there is a chance it will work correctly. > > > Consider ComputeXidHorizons() > > 1. ProcGlobal->statusFlags is touched before the lock. > > Hard to believe that'd have a measurable effect. > IMHO, anything you take out of the lock is a benefit. > > > > 2. allStatusFlags[index] is not touched for all numProcs. > > I'd be surprised if the compiler couldn't defer that load on its own. > Better be sure of that, no? regards, Ranier Vilela