On Tue, Sep 17, 2019 at 01:45:20PM -0700, Linus Torvalds wrote:
> That sounds better, but I'm a bit nervous about the whole thing
> because who knows when the alternatives code itself internally uses
> memset() and then we have a nasty little chicken-and-egg problem.
You mean memcpy()...?
> Also,
On Tue, Sep 17, 2019 at 03:10:21PM -0500, Josh Poimboeuf wrote:
> Then the "reverse alternatives" feature wouldn't be needed anyway.
The intent was to have the default, most-used version be there at
build-time, obviating the need to patch. Therefore on those old !ERMS
CPUs we'll end up doing rep;s
On Tue, Sep 17, 2019 at 1:10 PM Josh Poimboeuf wrote:
>
> Could it instead do this?
>
> ALTERNATIVE_2("call memset_orig",
> "call memset_rep",X86_FEATURE_REP_GOOD,
> "rep; stosb", X86_FEATURE_ERMS)
>
> Then the "reverse altern
On Fri, Sep 13, 2019 at 09:22:37AM +0200, Borislav Petkov wrote:
> In order to patch on machines which don't set X86_FEATURE_ERMS, I need
> to do a "reversed" patching of sorts, i.e., patch when the x86 feature
> flag is NOT set. See the below changes in alternative.c which basically
> add a flags
From: Linus Torvalds
> Sent: 16 September 2019 18:25
...
> You can basically always beat "rep movs/stos" with hand-tuned AVX2/512
> code for specific cases if you don't look at I$ footprint and the cost
> of the AVX setup (and the cost of frequency changes, which often go
> hand-in-hand with the AV
On Mon, Sep 16, 2019 at 10:25:25AM -0700, Linus Torvalds wrote:
> So the "inline constant sizes" case has advantages over and beyond the
> obvious ones. I suspect that a reasonable cut-off point is somethinig
> like "8*sizeof(long)". But look at things like "struct kstat" uses
> etc, the limit migh
On Mon, Sep 16, 2019 at 4:14 PM Andy Lutomirski wrote:
>
> Well, when I wrote this email, I *thought* it was inlining the
> 'memset' function, but maybe I just can't read gcc's output today.
Not having your compiler, it's also possible that it works for you,
but just doesn't work for me.
> It se
On Mon, Sep 16, 2019 at 2:30 PM Linus Torvalds
wrote:
>
> On Mon, Sep 16, 2019 at 10:41 AM Andy Lutomirski wrote:
> >
> > After some experimentation, I think y'all are just doing it wrong.
> > GCC is very clever about this as long as it's given the chance. This
> > test, for example, generates e
On Mon, Sep 16, 2019 at 10:41 AM Andy Lutomirski wrote:
>
> After some experimentation, I think y'all are just doing it wrong.
> GCC is very clever about this as long as it's given the chance. This
> test, for example, generates excellent code:
>
> #include
>
> __THROW __nonnull ((1)) __attribut
On Mon, Sep 16, 2019 at 10:25 AM Linus Torvalds
wrote:
>
> On Mon, Sep 16, 2019 at 2:18 AM Rasmus Villemoes
> wrote:
> >
> > Eh, this benchmark doesn't seem to provide any hints on where to set the
> > cut-off for a compile-time constant n, i.e. the 32 in
>
> Yes, you'd need to use proper fixed-s
On Mon, Sep 16, 2019 at 2:18 AM Rasmus Villemoes
wrote:
>
> Eh, this benchmark doesn't seem to provide any hints on where to set the
> cut-off for a compile-time constant n, i.e. the 32 in
Yes, you'd need to use proper fixed-size memset's with
__builtin_memset() to test that case. Probably easy e
On 13/09/2019 18.36, Borislav Petkov wrote:
> On Fri, Sep 13, 2019 at 12:42:32PM +0200, Borislav Petkov wrote:
>> Or should we talk to Intel hw folks about it...
>
> Or, I can do something like this, while waiting. Benchmark at the end.
>
> The numbers are from a KBL box:
>
> model : 1
On Sat, Sep 14, 2019 at 12:29:15PM +0300, Alexey Dobriyan wrote:
> eh... I'd just drop it. These registers screw up everything.
The intent is to not touch memset_orig and let it die with its users. It
is irrelevant now anyway.
If it can be shown that the extended list of clobbered registers hurt
> Instead of calling memset:
>
> 8100cd8d: e8 0e 15 7a 00 callq 817ae2a0
> <__memset>
>
> and having a JMP inside it depending on the feature supported, let's simply
> have the REP; STOSB directly in the code:
>
> ...
> 81000442: 4c 89 d7
On Fri, Sep 13, 2019 at 12:42:32PM +0200, Borislav Petkov wrote:
> Or should we talk to Intel hw folks about it...
Or, I can do something like this, while waiting. Benchmark at the end.
The numbers are from a KBL box:
model : 158
model name : Intel(R) Core(TM) i5-9600K CPU @ 3.70G
On Fri, Sep 13, 2019 at 11:18:00AM +0200, Rasmus Villemoes wrote:
> Something like
>
> if (__builtin_constant_p(c) && __builtin_constant_p(n) && n <= 32)
> return __builtin_memset(dest, c, n);
>
> might be enough? Of course it would be sad if 32 was so high that this
> turned
On 13/09/2019 11.00, Linus Torvalds wrote:
> On Fri, Sep 13, 2019 at 8:22 AM Borislav Petkov wrote:
>>
>> since the merge window is closing in and y'all are on a conference, I
>> thought I should take another stab at it. It being something which Ingo,
>> Linus and Peter have suggested in the past
On Fri, Sep 13, 2019 at 8:22 AM Borislav Petkov wrote:
>
> since the merge window is closing in and y'all are on a conference, I
> thought I should take another stab at it. It being something which Ingo,
> Linus and Peter have suggested in the past at least once.
>
> Instead of calling memset:
>
>
On 13/09/2019 09.22, Borislav Petkov wrote:
>
> Instead of calling memset:
>
> 8100cd8d: e8 0e 15 7a 00 callq 817ae2a0
> <__memset>
>
> and having a JMP inside it depending on the feature supported, let's simply
> have the REP; STOSB directly in the code:
>
> ..
On Fri, Sep 13, 2019 at 09:35:30AM +0200, Ingo Molnar wrote:
> That looks exciting - I'm wondering what effects this has on code
> footprint - for example defconfig vmlinux code size, and what the average
> per call site footprint impact is?
>
> If the footprint effect is acceptable, then I'd ex
* Borislav Petkov wrote:
> Hi,
>
> since the merge window is closing in and y'all are on a conference, I
> thought I should take another stab at it. It being something which Ingo,
> Linus and Peter have suggested in the past at least once.
>
> Instead of calling memset:
>
> 8100cd8d:
21 matches
Mail list logo