On Sat, Oct 31, 2020 at 12:11:42PM +0000, David Laight wrote:
> The gcc 7.5.0 I have handy probably generates the best code for:
> 
> unsigned char q_2(unsigned int pc)
> {
>         unsigned char rctx = 0;
> 
>         rctx += !!(pc & (NMI_MASK));
>         rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
>         rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
> 
>         return rctx;
> }
> 
> 0000000000000000 <q_2>:
>    0:   f7 c7 00 00 f0 00       test   $0xf00000,%edi     # clock 0
>    6:   0f 95 c0                setne  %al                # clock 1
>    9:   f7 c7 00 00 ff 00       test   $0xff0000,%edi     # clock 0
>    f:   0f 95 c2                setne  %dl                # clock 1
>   12:   01 c2                   add    %eax,%edx          # clock 2
>   14:   81 e7 00 01 ff 00       and    $0xff0100,%edi
>   1a:   0f 95 c0                setne  %al
>   1d:   01 d0                   add    %edx,%eax          # clock 3
>   1f:   c3                      retq
> 
> I doubt that is beatable.
> 
> I've annotated the register dependency chain.
> Likely to be 3 (or maybe 4) clocks.
> The other versions are a lot worse (7 or 8) without allowing
> for 'sbb' taking 2 clocks on a lot of Intel cpus.

https://godbolt.org/z/EfnG8E

Recent GCC just doesn't want to do that. Still, using u8 makes sense, so
I've kept that.

Reply via email to