On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote: > Hi folks!
Hiya, > > I'm adding a CC to Steve Capper, a colleague in Arm who's our expert > here for this kind of question. He's also a DM in Debian... :-) Now I feel guilty about not doing enough Debian :-). > > On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote: > >On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: > > > >> It would also be nice to have numbers to see the impact on non-ARMv8.1 > >> CPU on real workloads. As pointed out by Florian, and if the impact is > >> negligible, it might be a good idea to enable -moutline-atomics > >> globally at the GCC level so that all software can benefit from it, and > >> instead of only glibc. That could be either upstream or only in Debian, > >> that's probably a separate discussion. Otherwise we will likely end up > >> using this non-default GCC option on all packages that runs faster with > >> it. > > > >Agreed. > > I think the -moutline-atomics is probably good to enable by default > once we've got it (gcc 10). that's the suggestion I've heard from gcc > folks in Arm. > > >> Also note that the mechanism allowing a safe upgrade *does* incur a > >> runtime overhead as every binary now has to test for the presence of > >> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in > >> progress. That's why we have disabled it on architecture not providing > >> an optimized library [1]. > > Oh, ick. :-/ > > >Thanks for the pointer, it's interesting to see data on that. This also > >suggests that it might be worthwhile to investigate a better mechanism > >for identifying the availability of hardware features. > > > >> > I've tested both options and found them to be acceptable on v8.1a > >> > (Neoverse > >> > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the > >> > various different configuration permutations if you'd like to see > >> > additional > >> > data. That's good to hear! > >> > >> As said above I think we would need more numbers on real workload to > >> take a decision. Don't get me wrong I do not oppose on improving atomics > >> on ARMv8.1, but I would like that we chose the best option. Also if we > >> go with the -moutline-atomics option, I believe it rather has to be a > >> ARM porters decision than a glibc maintainers decision (hence the Cc:). > > > >I'll see what I can come up with. > > > >Do the arm porters have any opinions on this matter? > > It's a good question, and thanks for asking! I definitely think it's > worth doing -moutline-atomics, and I'm hoping Steve can share some > performance numbers to help convince. :-) > We ran -moutline-atomics on a mixture of development hardware running, IIRC some DPDK lock tests that employed C11-style atomics. As expected there was a performance penalty, but it was order of magnitude of 1%. The perf boost from moving to LSE was a lot larger (and we noticed the variance dropping a lot with LSE too). FWIW, I'd recommend the -moutline-atomics for the general case. (I used to be a fan of the multi-lib approach; but the way the runtime selection is implemented in gcc with a direct branch changed my mind :-) ). Cheers, -- Steve