Re: [RFC] Improve memset

2019-09-19 Thread Borislav Petkov
On Tue, Sep 17, 2019 at 01:45:20PM -0700, Linus Torvalds wrote: > That sounds better, but I'm a bit nervous about the whole thing > because who knows when the alternatives code itself internally uses > memset() and then we have a nasty little chicken-and-egg problem. You mean memcpy()...? > Also,

Re: [RFC] Improve memset

2019-09-19 Thread Borislav Petkov
On Tue, Sep 17, 2019 at 03:10:21PM -0500, Josh Poimboeuf wrote: > Then the "reverse alternatives" feature wouldn't be needed anyway. The intent was to have the default, most-used version be there at build-time, obviating the need to patch. Therefore on those old !ERMS CPUs we'll end up doing rep;s

Re: [RFC] Improve memset

2019-09-17 Thread Linus Torvalds
On Tue, Sep 17, 2019 at 1:10 PM Josh Poimboeuf wrote: > > Could it instead do this? > > ALTERNATIVE_2("call memset_orig", > "call memset_rep",X86_FEATURE_REP_GOOD, > "rep; stosb", X86_FEATURE_ERMS) > > Then the "reverse altern

Re: [RFC] Improve memset

2019-09-17 Thread Josh Poimboeuf
On Fri, Sep 13, 2019 at 09:22:37AM +0200, Borislav Petkov wrote: > In order to patch on machines which don't set X86_FEATURE_ERMS, I need > to do a "reversed" patching of sorts, i.e., patch when the x86 feature > flag is NOT set. See the below changes in alternative.c which basically > add a flags

RE: [RFC] Improve memset

2019-09-17 Thread David Laight
From: Linus Torvalds > Sent: 16 September 2019 18:25 ... > You can basically always beat "rep movs/stos" with hand-tuned AVX2/512 > code for specific cases if you don't look at I$ footprint and the cost > of the AVX setup (and the cost of frequency changes, which often go > hand-in-hand with the AV

Re: [RFC] Improve memset

2019-09-17 Thread Borislav Petkov
On Mon, Sep 16, 2019 at 10:25:25AM -0700, Linus Torvalds wrote: > So the "inline constant sizes" case has advantages over and beyond the > obvious ones. I suspect that a reasonable cut-off point is somethinig > like "8*sizeof(long)". But look at things like "struct kstat" uses > etc, the limit migh

Re: [RFC] Improve memset

2019-09-16 Thread Linus Torvalds
On Mon, Sep 16, 2019 at 4:14 PM Andy Lutomirski wrote: > > Well, when I wrote this email, I *thought* it was inlining the > 'memset' function, but maybe I just can't read gcc's output today. Not having your compiler, it's also possible that it works for you, but just doesn't work for me. > It se

Re: [RFC] Improve memset

2019-09-16 Thread Andy Lutomirski
On Mon, Sep 16, 2019 at 2:30 PM Linus Torvalds wrote: > > On Mon, Sep 16, 2019 at 10:41 AM Andy Lutomirski wrote: > > > > After some experimentation, I think y'all are just doing it wrong. > > GCC is very clever about this as long as it's given the chance. This > > test, for example, generates e

Re: [RFC] Improve memset

2019-09-16 Thread Linus Torvalds
On Mon, Sep 16, 2019 at 10:41 AM Andy Lutomirski wrote: > > After some experimentation, I think y'all are just doing it wrong. > GCC is very clever about this as long as it's given the chance. This > test, for example, generates excellent code: > > #include > > __THROW __nonnull ((1)) __attribut

Re: [RFC] Improve memset

2019-09-16 Thread Andy Lutomirski
On Mon, Sep 16, 2019 at 10:25 AM Linus Torvalds wrote: > > On Mon, Sep 16, 2019 at 2:18 AM Rasmus Villemoes > wrote: > > > > Eh, this benchmark doesn't seem to provide any hints on where to set the > > cut-off for a compile-time constant n, i.e. the 32 in > > Yes, you'd need to use proper fixed-s

Re: [RFC] Improve memset

2019-09-16 Thread Linus Torvalds
On Mon, Sep 16, 2019 at 2:18 AM Rasmus Villemoes wrote: > > Eh, this benchmark doesn't seem to provide any hints on where to set the > cut-off for a compile-time constant n, i.e. the 32 in Yes, you'd need to use proper fixed-size memset's with __builtin_memset() to test that case. Probably easy e

Re: [RFC] Improve memset

2019-09-16 Thread Rasmus Villemoes
On 13/09/2019 18.36, Borislav Petkov wrote: > On Fri, Sep 13, 2019 at 12:42:32PM +0200, Borislav Petkov wrote: >> Or should we talk to Intel hw folks about it... > > Or, I can do something like this, while waiting. Benchmark at the end. > > The numbers are from a KBL box: > > model : 1

Re: [RFC] Improve memset

2019-09-14 Thread Borislav Petkov
On Sat, Sep 14, 2019 at 12:29:15PM +0300, Alexey Dobriyan wrote: > eh... I'd just drop it. These registers screw up everything. The intent is to not touch memset_orig and let it die with its users. It is irrelevant now anyway. If it can be shown that the extended list of clobbered registers hurt

Re: [RFC] Improve memset

2019-09-14 Thread Alexey Dobriyan
> Instead of calling memset: > > 8100cd8d: e8 0e 15 7a 00 callq 817ae2a0 > <__memset> > > and having a JMP inside it depending on the feature supported, let's simply > have the REP; STOSB directly in the code: > > ... > 81000442: 4c 89 d7

Re: [RFC] Improve memset

2019-09-13 Thread Borislav Petkov
On Fri, Sep 13, 2019 at 12:42:32PM +0200, Borislav Petkov wrote: > Or should we talk to Intel hw folks about it... Or, I can do something like this, while waiting. Benchmark at the end. The numbers are from a KBL box: model : 158 model name : Intel(R) Core(TM) i5-9600K CPU @ 3.70G

Re: [RFC] Improve memset

2019-09-13 Thread Borislav Petkov
On Fri, Sep 13, 2019 at 11:18:00AM +0200, Rasmus Villemoes wrote: > Something like > > if (__builtin_constant_p(c) && __builtin_constant_p(n) && n <= 32) > return __builtin_memset(dest, c, n); > > might be enough? Of course it would be sad if 32 was so high that this > turned

Re: [RFC] Improve memset

2019-09-13 Thread Rasmus Villemoes
On 13/09/2019 11.00, Linus Torvalds wrote: > On Fri, Sep 13, 2019 at 8:22 AM Borislav Petkov wrote: >> >> since the merge window is closing in and y'all are on a conference, I >> thought I should take another stab at it. It being something which Ingo, >> Linus and Peter have suggested in the past

Re: [RFC] Improve memset

2019-09-13 Thread Linus Torvalds
On Fri, Sep 13, 2019 at 8:22 AM Borislav Petkov wrote: > > since the merge window is closing in and y'all are on a conference, I > thought I should take another stab at it. It being something which Ingo, > Linus and Peter have suggested in the past at least once. > > Instead of calling memset: > >

Re: [RFC] Improve memset

2019-09-13 Thread Rasmus Villemoes
On 13/09/2019 09.22, Borislav Petkov wrote: > > Instead of calling memset: > > 8100cd8d: e8 0e 15 7a 00 callq 817ae2a0 > <__memset> > > and having a JMP inside it depending on the feature supported, let's simply > have the REP; STOSB directly in the code: > > ..

Re: [RFC] Improve memset

2019-09-13 Thread Borislav Petkov
On Fri, Sep 13, 2019 at 09:35:30AM +0200, Ingo Molnar wrote: > That looks exciting - I'm wondering what effects this has on code > footprint - for example defconfig vmlinux code size, and what the average > per call site footprint impact is? > > If the footprint effect is acceptable, then I'd ex

Re: [RFC] Improve memset

2019-09-13 Thread Ingo Molnar
* Borislav Petkov wrote: > Hi, > > since the merge window is closing in and y'all are on a conference, I > thought I should take another stab at it. It being something which Ingo, > Linus and Peter have suggested in the past at least once. > > Instead of calling memset: > > 8100cd8d: