Wondering if we can take this to completion (any idea what more we could
do?).
On Thu, 10 Dec 2020 at 14:48, Krunal Bauskar
wrote:
>
> On Tue, 8 Dec 2020 at 14:33, Krunal Bauskar
> wrote:
>
>>
>>
>> On Thu, 3 Dec 2020 at 21:32, Tom Lane wrote:
>>
>>> Krunal Bauskar writes:
>>> > Any updates o
On Tue, 8 Dec 2020 at 14:33, Krunal Bauskar wrote:
>
>
> On Thu, 3 Dec 2020 at 21:32, Tom Lane wrote:
>
>> Krunal Bauskar writes:
>> > Any updates or further inputs on this.
>>
>> As far as LSE goes: my take is that tampering with the
>> compiler/platform's default optimization options requires
On Thu, 3 Dec 2020 at 21:32, Tom Lane wrote:
> Krunal Bauskar writes:
> > Any updates or further inputs on this.
>
> As far as LSE goes: my take is that tampering with the
> compiler/platform's default optimization options requires *very*
> strong evidence, which we have not got and likely won't
On Sat, 5 Dec 2020 at 02:55, Alexander Korotkov wrote:
>
> On Wed, Dec 2, 2020 at 6:58 AM Krunal Bauskar wrote:
> > Let me know what do you think about this analysis and any specific
> > direction that we should consider to help move forward.
>
> BTW, it would be also nice to benchmark my lwlock
On Wed, Dec 2, 2020 at 6:58 AM Krunal Bauskar wrote:
> Let me know what do you think about this analysis and any specific direction
> that we should consider to help move forward.
BTW, it would be also nice to benchmark my lwlock patch on the
Kunpeng. I'm very optimistic about this patch, but i
On Wed, Dec 2, 2020 at 6:58 AM Krunal Bauskar wrote:
> 1. CAS patch (applied on the baseline)
>- Kunpeng: 10-45% improvement observed [1]
>- Graviton2: 30-50% improvement observed [2]
What does lower boundary of improvement mean? Does it mean minimal
improvement observed? Obviously not,
On Thu, Dec 3, 2020 at 7:02 PM Tom Lane wrote:
> From a system structural standpoint, I seriously dislike that lwlock.c
> patch: putting machine-specific variant implementations into that file
> seems like a disaster for maintainability. So it would need to show a
> very significant gain across a
Krunal Bauskar writes:
> Any updates or further inputs on this.
As far as LSE goes: my take is that tampering with the
compiler/platform's default optimization options requires *very*
strong evidence, which we have not got and likely won't get. Users
who are building for specific hardware can ch
Any updates or further inputs on this.
On Wed, 2 Dec 2020 at 09:27, Krunal Bauskar wrote:
>
>
> On Tue, 1 Dec 2020 at 22:19, Tom Lane wrote:
>
>> Alexander Korotkov writes:
>> > On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar
>> wrote:
>> >> I would request you guys to re-think it from this per
> On 01/12/2020, 19:08, "Alexander Korotkov" wrote:
>BTW, what number of clients did you use? I can't find it in your message.
Sure. Important params seem to be:
Pgbench:
Clients: 256
pgbench_jobs : 32
Scale: 1000
fill_factor: 90
Postgresql:
shared buffers: 31GB
max_connections: 1024
T
On Tue, 1 Dec 2020 at 22:19, Tom Lane wrote:
> Alexander Korotkov writes:
> > On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar
> wrote:
> >> I would request you guys to re-think it from this perspective to help
> ensure that PGSQL can scale well on ARM.
> >> s_lock becomes a top-most function and
> On 01/12/2020, 16:59, "Alexander Korotkov" wrote:
> On Tue, Dec 1, 2020 at 1:10 PM Amit Khandekar wrote:
> > FWIW, here is an earlier discussion on the same (also added the
>> proposal author here) :
Thanks for looping me in!
>>
>>
> https://www.postgresql.org/message-id/flat/
On Tue, Dec 1, 2020 at 7:57 PM Zidenberg, Tsahi wrote:
> > On 01/12/2020, 16:59, "Alexander Korotkov" wrote:
> > On Tue, Dec 1, 2020 at 1:10 PM Amit Khandekar
> > wrote:
> > > FWIW, here is an earlier discussion on the same (also added the
> >> proposal author here) :
>
> Thanks for loop
Alexander Korotkov writes:
> On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar wrote:
>> I would request you guys to re-think it from this perspective to help ensure
>> that PGSQL can scale well on ARM.
>> s_lock becomes a top-most function and LSE is not a universal solution but
>> CAS surely help
On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar wrote:
> I would request you guys to re-think it from this perspective to help ensure
> that PGSQL can scale well on ARM.
> s_lock becomes a top-most function and LSE is not a universal solution but
> CAS surely helps ease the main bottleneck.
CAS p
On Tue, 1 Dec 2020 at 20:25, Alexander Korotkov
wrote:
> On Tue, Dec 1, 2020 at 3:44 PM Krunal Bauskar
> wrote:
> > I have completed benchmarking with lse.
> >
> > Graph attached.
>
> Thank you for benchmarking.
>
> Now I agree with this comment by Tom Lane
>
> > In general, I'm pretty skeptical
On Tue, Dec 1, 2020 at 1:10 PM Amit Khandekar wrote:
> On Tue, 1 Dec 2020 at 15:33, Krunal Bauskar wrote:
> > What I meant was outline-atomics support was added in GCC-9.4 and was made
> > default in gcc-10.
> > LSE support is present for quite some time.
>
> FWIW, here is an earlier discussion
On Tue, Dec 1, 2020 at 3:44 PM Krunal Bauskar wrote:
> I have completed benchmarking with lse.
>
> Graph attached.
Thank you for benchmarking.
Now I agree with this comment by Tom Lane
> In general, I'm pretty skeptical of *all* the results posted so far on
> this thread, because everybody seem
On Tue, 1 Dec 2020 at 02:16, Alexander Korotkov
wrote:
> On Mon, Nov 30, 2020 at 9:21 PM Tom Lane wrote:
> > Alexander Korotkov writes:
> > > I tend to think that LSE is enabled by default in Apple's clang based
> > > on your previous message[1]. In order to dispel the doubts could you
> > > p
On Tue, 1 Dec 2020 at 15:33, Krunal Bauskar wrote:
> What I meant was outline-atomics support was added in GCC-9.4 and was made
> default in gcc-10.
> LSE support is present for quite some time.
FWIW, here is an earlier discussion on the same (also added the
proposal author here) :
https://www.
On Tue, 1 Dec 2020 at 15:16, Alexander Korotkov
wrote:
> On Tue, Dec 1, 2020 at 6:26 AM Krunal Bauskar
> wrote:
> > On Tue, 1 Dec 2020 at 02:31, Alexander Korotkov
> wrote:
> >> BTW, how do you get that required gcc version is 9.4? I've managed to
> >> use LSE with gcc 9.3.
> >
> > Did they ba
On Tue, Dec 1, 2020 at 9:01 AM Tom Lane wrote:
> I did what I could in this department. It's late and I'm not going to
> have time to run read/write benchmarks before bed, but here are some
> results for the "pgbench -S" cases. I tried to match your testing
> choices, but could not entirely:
>
>
On Tue, Dec 1, 2020 at 6:26 AM Krunal Bauskar wrote:
> On Tue, 1 Dec 2020 at 02:31, Alexander Korotkov wrote:
>> BTW, how do you get that required gcc version is 9.4? I've managed to
>> use LSE with gcc 9.3.
>
> Did they backported it to 9.3?
> I am just looking at the gcc guide.
> https://gcc.g
Alexander Korotkov writes:
> 2) None of the patches considered in this thread give a clear
> advantage for PostgreSQL built with LSE.
Yeah, I think so.
> To further confirm this let's wait for Kunpeng 920 tests by Krunal
> Bauskar and Amit Khandekar. Also it would be nice if someone will run
>
On Tue, 1 Dec 2020 at 02:31, Alexander Korotkov
wrote:
> On Mon, Nov 30, 2020 at 7:00 AM Krunal Bauskar
> wrote:
> > 3. Problem with GCC approach is still a lot of distro don't support gcc
> 9.4 as default.
> > To use this approach:
> > * PGSQL will have to roll out its packages using gc
On Mon, Nov 30, 2020 at 7:00 AM Krunal Bauskar wrote:
> 3. Problem with GCC approach is still a lot of distro don't support gcc 9.4
> as default.
> To use this approach:
> * PGSQL will have to roll out its packages using gcc-9.4+ only so that
> they are compatible with all aarch64 machin
On Mon, Nov 30, 2020 at 9:21 PM Tom Lane wrote:
> Alexander Korotkov writes:
> > I tend to think that LSE is enabled by default in Apple's clang based
> > on your previous message[1]. In order to dispel the doubts could you
> > please provide assembly of SpinLockAcquire for following clang
> > o
Alexander Korotkov writes:
> I tend to think that LSE is enabled by default in Apple's clang based
> on your previous message[1]. In order to dispel the doubts could you
> please provide assembly of SpinLockAcquire for following clang
> options.
> "-O2"
> "-O2 -march=armv8-a+lse"
> "-O2 -march=ar
On Mon, Nov 30, 2020 at 9:08 AM Tom Lane wrote:
> Krunal Bauskar writes:
> > On Mon, 30 Nov 2020 at 10:14, Tom Lane wrote:
> >> The results I posted at [1] seem to contradict this for Apple's new
> >> machines.
>
> > For the results you saw on Mac-Mini was LSE enabled by default.
>
> Hmm, I don'
On Mon, Nov 30, 2020 at 9:20 AM Krunal Bauskar wrote:
> Some of us may be surprised by the fact that enabling lse is causing
> regression (1816 -> 892 or 714 -> 610) with HEAD itself.
> While lse is meant to improve the performance. This, unfortunately, is not
> always the case at-least based on
On Mon, 30 Nov 2020 at 11:38, Tom Lane wrote:
> Krunal Bauskar writes:
> > On Mon, 30 Nov 2020 at 10:14, Tom Lane wrote:
> >> The results I posted at [1] seem to contradict this for Apple's new
> >> machines.
>
> > For the results you saw on Mac-Mini was LSE enabled by default.
>
> Hmm, I don't
Krunal Bauskar writes:
> On Mon, 30 Nov 2020 at 10:14, Tom Lane wrote:
>> The results I posted at [1] seem to contradict this for Apple's new
>> machines.
> For the results you saw on Mac-Mini was LSE enabled by default.
Hmm, I don't know how to get Apple's clang to admit what its default
setti
On Mon, 30 Nov 2020 at 10:14, Tom Lane wrote:
> Krunal Bauskar writes:
> > So given all the permutations and combinations, I think we could approach
> > the problem as follows:
>
> > * Enable use of CAS as it is known to have optimal performance (vs TAS)
>
> The results I posted at [1] seem to c
Krunal Bauskar writes:
> So given all the permutations and combinations, I think we could approach
> the problem as follows:
> * Enable use of CAS as it is known to have optimal performance (vs TAS)
The results I posted at [1] seem to contradict this for Apple's new
machines.
In general, I'm pr
On Sun, 29 Nov 2020 at 22:23, Alexander Korotkov
wrote:
> On Sat, Nov 28, 2020 at 1:31 PM Alexander Korotkov
> wrote:
> > I guess that might depend on the implementation of CAS and TAS. I bet
> > usage of CAS in spinlock gives advantage when ldxr/stxr are used, but
> > not when swpal/casa are u
On Thu, Nov 26, 2020 at 7:35 AM Krunal Bauskar wrote:
> * x86 uses optimized xchg operation.
> ARM too started supporting it (using Large System Extension) with
> ARM-v8.1 but since it not supported with ARM-v8, GCC default tends
> to roll more generic load-store assembly code.
>
> * gcc-9.4
On Sat, Nov 28, 2020 at 5:36 AM Tom Lane wrote:
> So at least on Apple's hardware, it seems like the CAS
> implementation might be a shade faster when uncontended,
> but it's very clearly worse when there is contention for
> the spinlock. That's interesting, because the argument
> that CAS should
I wrote:
> It might be that this hardware is capable of showing a difference with a
> better-tuned pgbench test, but with an untuned pgbench run, we just aren't
> sufficiently sensitive to the spinlock properties. (Which I guess is good
> news, really.)
It occurred to me that if we don't insist o
Peter Eisentraut writes:
> I tried this on a M1 MacBook Air. I cannot reproduce these results.
> The unpatched numbers are about in the neighborhood of what you showed,
> but the patched numbers are only about a few percent better, not the
> 1.5x or 2x change that you showed.
After redoing th
On 2020-11-26 23:55, Tom Lane wrote:
... and, after retrieving my jaw from the floor, I present the
attached. Apple's chips evidently like this style of spinlock a LOT
better. The difference is so remarkable that I wonder if I made a
mistake somewhere. Can anyone else replicate these results?
On Fri, Nov 27, 2020 at 11:55 AM Michael Paquier wrote:
> Not planning to buy one here, anything I have read on that tells that
> it is worth a performance study.
Another interesting area for experiments is AWS graviton2 instances.
Specification says it supports arm v8.2, so it should have swpal/
On Fri, Nov 27, 2020 at 02:50:30AM -0500, Tom Lane wrote:
> Yeah, that wasn't making sense to me either. The most likely explanation
> seems to be that I messed up the test somehow ... but I don't see where.
> So, again, I'm wondering if anyone else can replicate or refute this.
I do find your re
Alexander Korotkov writes:
> On Fri, Nov 27, 2020 at 1:55 AM Tom Lane wrote:
>> ... and, after retrieving my jaw from the floor, I present the
>> attached. Apple's chips evidently like this style of spinlock a LOT
>> better. The difference is so remarkable that I wonder if I made a
>> mistake s
On Fri, Nov 27, 2020 at 2:20 AM Tom Lane wrote:
> Alexander Korotkov writes:
> > On Thu, Nov 26, 2020 at 1:32 PM Heikki Linnakangas wrote:
> >> Is there some official ARM documentation, like a programmer's reference
> >> manual or something like that, that would show a reference
> >> implementat
On Fri, Nov 27, 2020 at 1:55 AM Tom Lane wrote:
>
> Krunal Bauskar writes:
> > On Thu, 26 Nov 2020 at 10:50, Tom Lane wrote:
> >> Also, exactly what hardware/software platform were these curves
> >> obtained on?
>
> > Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for
> >
Alexander Korotkov writes:
> On Thu, Nov 26, 2020 at 1:32 PM Heikki Linnakangas wrote:
>> Is there some official ARM documentation, like a programmer's reference
>> manual or something like that, that would show a reference
>> implementation of a spinlock on ARM? It would be good to refer to an
>
Krunal Bauskar writes:
> On Thu, 26 Nov 2020 at 10:50, Tom Lane wrote:
>> Also, exactly what hardware/software platform were these curves
>> obtained on?
> Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for
> server and 8 for client) [2 numa nodes]
> Storage: 3.2 TB NVMe
On Thu, 26 Nov 2020 at 16:02, Heikki Linnakangas wrote:
> On 26/11/2020 06:30, Krunal Bauskar wrote:
> > Improving spin-lock implementation on ARM.
> >
> >
> > * Spin-Lock is known to have a signif
On Thu, Nov 26, 2020 at 1:32 PM Heikki Linnakangas wrote:
> On 26/11/2020 06:30, Krunal Bauskar wrote:
> > Improving spin-lock implementation on ARM.
> >
> >
> > * Spin-Lock is known to have a signif
On 26/11/2020 06:30, Krunal Bauskar wrote:
Improving spin-lock implementation on ARM.
* Spin-Lock is known to have a significant effect on performance
with increasing scalability.
* Existing Spin-Lock implementation for ARM is sub
On Thu, 26 Nov 2020 at 10:55, Krunal Bauskar wrote:
> Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for
> server and 8 for client) [2 numa nodes]
> Storage: 3.2 TB NVMe SSD
> OS: CentOS Linux release 7.6
> PGSQL: baseline = Release Tag 13.1
> Invocation suite:
> https://
On Thu, 26 Nov 2020 at 10:50, Tom Lane wrote:
> Michael Paquier writes:
> > On Thu, Nov 26, 2020 at 10:00:50AM +0530, Krunal Bauskar wrote:
> >> (Thanks to Amit Khandekar for rigorously performance testing this patch
> >> with different combinations).
>
> > For the simple-update and tpcb-like gr
Michael Paquier writes:
> On Thu, Nov 26, 2020 at 10:00:50AM +0530, Krunal Bauskar wrote:
>> (Thanks to Amit Khandekar for rigorously performance testing this patch
>> with different combinations).
> For the simple-update and tpcb-like graphs, do you have any actual
> numbers to share between 128
scalability baseline patched
---- --
updatetpcb update tpcb
--
128 107932 78554 108081 78569
2568
On Thu, Nov 26, 2020 at 10:00:50AM +0530, Krunal Bauskar wrote:
> (Thanks to Amit Khandekar for rigorously performance testing this patch
> with different combinations).
For the simple-update and tpcb-like graphs, do you have any actual
numbers to share between 128 and 1024 connections? The blue
Improving spin-lock implementation on ARM.
* Spin-Lock is known to have a significant effect on performance
with increasing scalability.
* Existing Spin-Lock implementation for ARM is sub-optimal due to
use of TAS (test and swap
56 matches
Mail list logo