> On 18 Aug 2017, at 10:56 PM, Michael Clark <michaeljcl...@mac.com> wrote:
> 
>> 
>> On 18 Aug 2017, at 10:41 PM, Gabriel Paubert <paub...@iram.es> wrote:
>> 
>> On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
>>> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
>>> disabled RTF.
>>> 
>>> 
>>> This one is quite interesting:
>>> 
>>> - https://cx.rv8.io/g/WXWMTG
>>> 
>>> It’s another target independent bug. x86 is using some LEA followed by SAR 
>>> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
>>> RISC-V seems like a nice target to try to fix this codegen for, as its less 
>>> risk than attempting a fix in x86 ;-)
>>> 
>>> - https://github.com/riscv/riscv-gcc/issues/89
>>> 
>>> code:
>>> 
>>>     template <typename T, unsigned B>
>>>     inline T signextend(const T x)
>>>     {
>>>     struct {T x:B;} s;
>>>     return s.x = x;
>>>     }
>>> 
>>>     int sx5(int x) {
>>>             return signextend<signed int,5>(x);
>>>     }
>>> 
>>> riscv asm:
>>> 
>>>     sx5(int):
>>>       slliw a0,a0,3
>>>       slliw a0,a0,24
>>>       sraiw a0,a0,24
>>>       sraiw a0,a0,3
>>>       ret
>>> 
>>> hand coded riscv asm
>>> 
>>>     sx5(int):
>>>       slliw a0,a0,27
>>>       sraiw a0,a0,27
>>>       ret
>>> 
>>> x86 asm:
>>> 
>>>     sx5(int):
>>>       lea eax, [0+rdi*8]
>>>       sar al, 3
>>>       movsx eax, al
>>>       ret
>>> 
>>> hand coded x86 asm (no worse because the sar depends on the lea)
>>> 
>>>     sx5(int):
>>>       shl edi, 27
>>>       sar edi, 27
>>>       movsx eax, dl
>> 
>> Huh? dl is not a subreg of edi!
>> 
>> s/edi/edx/ and it may work.
>> 
>> dil can also be used, but only on 64 bit.

In any case it is more straightforward given we are just dealing with an int 
bitfield member in this case so a regular move would suffice. The mov gets 
retired in rename (movsx and movzx can both be retired in rename I think). The 
same pattern could be used for all 5 function examples, with predictable 
performance, versus 5 different solutions for the 5 different bitfield 
positions (that’s actually very amazing that the pattern matching on x86 finds 
5 unique ways to lower this code pattern).  Padon my Intel syntax. e.g.

sx5(int):
 shl edi, 27
 sar edi, 27
 mov eax, edi

However I’m more interested in what the fix would be for RISC-V and Aarch64. 
It’s less obvious how to fix it on x86-64 because there are too many different 
ways. On RISC-V there is only one obvious way and that is to not artificially 
narrow the type when we are extracting from an int aligned struct bit field 
member to an int. Likewise on Aarch64, one sbfiz instruction.

> Sorry I meant dil on x86-64. I was sure that it was possible to extend into 
> another register. I have not done much i386 asm so I am unaware of the 
> constraints. Can the source and dest registers for movsx not differ on i386? 
> I thought they could.
> 
> In any case, the plot thickens…
> 
> I believe we have bugs on both RISC-V and Aarch64.
> 
> I found that it at least appears like it is transitioning to a char or short 
> as the break is at 24 and 16 depending on the width, and values over 16 work 
> as one would expect.
> 
> Here is an updated test program: https://cx.rv8.io/g/M9ewNf
> 
>       template <typename T, unsigned B>
>       inline T signextend(const T x)
>       {
>         struct {T x:B;} s;
>         return s.x = x;
>       }
> 
>       int sx3(int x) { return signextend<signed int,3>(x); }
>       int sx5(int x) { return signextend<signed int,5>(x); }
>       int sx11(int x) { return signextend<signed int,11>(x); }
>       int sx14(int x) { return signextend<signed int,14>(x); }
>       int sx19(int x) { return signextend<signed int,19>(x); }
> 
> I filed a bug on riscv-gcc but I think it is target independent code given 
> there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a 
> single sbfx for all of the test functions.
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> Should I file a bug on GCC bugzilla given it looks to be target independent?
> 
> On RISC-V, the codegen is much more obviously wrong, but essentially the same 
> thing is happening on Aarch64 but there is only one additional instruction 
> instead of two.
> 
>       sx3(int):
>         slliw a0,a0,5
>         slliw a0,a0,24
>         sraiw a0,a0,24
>         sraiw a0,a0,5
>         ret
>       sx5(int):
>         slliw a0,a0,3
>         slliw a0,a0,24
>         sraiw a0,a0,24
>         sraiw a0,a0,3
>         ret
>       sx11(int):
>         slliw a0,a0,5
>         slliw a0,a0,16
>         sraiw a0,a0,16
>         sraiw a0,a0,5
>         ret
>       sx14(int):
>         slliw a0,a0,2
>         slliw a0,a0,16
>         sraiw a0,a0,16
>         sraiw a0,a0,2
>         ret
>       sx19(int):
>         slliw a0,a0,13
>         sraiw a0,a0,13
>         ret

Reply via email to