Em sáb., 28 de mai. de 2022 às 09:35, Tomas Vondra <
tomas.von...@enterprisedb.com> escreveu:

> On 5/28/22 02:36, Ranier Vilela wrote:
> > Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <and...@anarazel.de
> > <mailto:and...@anarazel.de>> escreveu:
> >
> >     Hi,
> >
> >     On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
> >     > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
> >     > tomas.von...@enterprisedb.com
> >     <mailto:tomas.von...@enterprisedb.com>> escreveu:
> >     >
> >     > > On 5/27/22 02:11, Ranier Vilela wrote:
> >     > > >
> >     > > > ...
> >     > > >
> >     > > > Here the results with -T 60:
> >     > >
> >     > > Might be a good idea to share your analysis / interpretation of
> the
> >     > > results, not just the raw data. After all, the change is being
> >     proposed
> >     > > by you, so do you think this shows the change is beneficial?
> >     > >
> >     > I think so, but the expectation has diminished.
> >     > I expected that the more connections, the better the performance.
> >     > And for both patch and head, this doesn't happen in tests.
> >     > Performance degrades with a greater number of connections.
> >
> >     Your system has four CPUs. Once they're all busy, adding more
> >     connections
> >     won't improve performance. It'll just add more and more context
> >     switching,
> >     cache misses, and make the OS scheduler do more work.
> >
> > conns       tps head
> > 10      82365.634750
> > 50      74593.714180
> > 80      69219.756038
> > 90      67419.574189
> > 100     66613.771701
> > Yes it is quite disappointing that with 100 connections, tps loses to 10
> > connections.
> >
>
> IMO that's entirely expected on a system with only 4 cores. Increasing
> the number of connections inevitably means more overhead (you have to
> track/manage more stuff). And at some point the backends start competing
> for L2/L3 caches, context switches are not free either, etc. So once you
> cross ~2-3x the number of cores, you should expect this.
>
> This behavior is natural/inherent, it's unlikely to go away, and it's
> one of the reasons why we recommend not to use too many connections. If
> you try to maximize throughput, just don't do that. Or just use machine
> with more cores.
>
> >
> >     > GetSnapShowData() isn't a bottleneck?
> >
> >     I'd be surprised if it showed up in a profile on your machine with
> that
> >     workload in any sort of meaningful way. The snapshot reuse logic
> >     will always
> >     work - because there are no writes - and thus the only work that
> >     needs to be
> >     done is to acquire the ProcArrayLock briefly.  And because there is
> >     only a
> >     small number of cores, contention on the cacheline for that isn't a
> >     problem.
> >
> > Thanks for sharing this.
> >
> >
> >
> >
> >     > > These results look much saner, but IMHO it also does not show
> >     any clear
> >     > > benefit of the patch. Or are you still claiming there is a
> benefit?
> >     > >
> >     > We agree that they are micro-optimizations.  However, I think they
> >     should be
> >     > considered micro-optimizations in inner loops, because all in
> >     procarray.c is
> >     > a hotpath.
> >
> >     As explained earlier, I don't agree that they optimize anything -
> you're
> >     making some of the scalability behaviour *worse*, if it's changed at
> >     all.
> >
> >
> >     > The first objective, I believe, was achieved, with no performance
> >     > regression.
> >     > I agree, the gains are small, by the tests done.
> >
> >     There are no gains.
> >
> > IMHO, I must disagree.
> >
>
> You don't have to, really. What you should do is showing results
> demonstrating the claimed gains, and so far you have not done that.
>
> I don't want to be rude, but so far you've shown results from a
> benchmark testing fork(), due to only running 10 transactions per
> client, and then results from a single run for each client count (which
> doesn't really show any gains either, and is so noisy).
>
> As mentioned GetSnapshotData() is not even in perf profile, so why would
> the patch even make a difference?
>
> You've also claimed it helps generating better code on older compilers,
> but you've never supported that with any evidence.
>
>
> Maybe there is an improvement - show us. Do a benchmark with more runs,
> to average-out the noise. Calculate VAR/STDEV to show how variable the
> results are. Use that to compare results and decide if there is an
> improvement. Also, keep in mind binary layout matters [1].
>
I redid the benchmark with a better machine:
Intel i7-10510U
RAM 8GB
SSD 512GB
Linux Ubuntu 64 bits

All files are attached, including the raw data of the results.
I did the calculations as requested.
But a quick average of the 10 benchmarks, done resulted in 10,000 tps more.
Not bad, for a simple patch, made entirely of micro-optimizations.

Results attached.

regards,
Ranier Vilela

Attachment: procarray_bench.tar.xz
Description: application/xz

Reply via email to