On Tue, Nov 5, 2024 at 5:27 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > > > > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrote: > > > > On Tue, Nov 5, 2024 at 10:57 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > >> > > > >> > > > >> > > > >> On 11/4/24 7:52 PM, H.J. Lu wrote: > > > >>> On Tue, Nov 5, 2024 at 8:48 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > >>>> > > > >>>> > > > >>>> > > > >>>> On 11/4/24 5:42 PM, H.J. Lu wrote: > > > >>>>> On Tue, Nov 5, 2024 at 8:07 AM Jeff Law <jeffreya...@gmail.com> > > > >>>>> wrote: > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> On 11/1/24 4:32 PM, H.J. Lu wrote: > > > >>>>>>> For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to > > > >>>>>>> return > > > >>>>>>> true, all integer arguments smaller than int are passed as int: > > > >>>>>>> > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.c > > > >>>>>>> extern int baz (char c1); > > > >>>>>>> > > > >>>>>>> int > > > >>>>>>> foo (char c1) > > > >>>>>>> { > > > >>>>>>> return baz (c1); > > > >>>>>>> } > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.s > > > >>>>>>> .file "x.c" > > > >>>>>>> .text > > > >>>>>>> .p2align 4 > > > >>>>>>> .globl foo > > > >>>>>>> .type foo, @function > > > >>>>>>> foo: > > > >>>>>>> .LFB0: > > > >>>>>>> .cfi_startproc > > > >>>>>>> movsbl 4(%esp), %eax > > > >>>>>>> movl %eax, 4(%esp) > > > >>>>>>> jmp baz > > > >>>>>>> .cfi_endproc > > > >>>>>>> .LFE0: > > > >>>>>>> .size foo, .-foo > > > >>>>>>> .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" > > > >>>>>>> .section .note.GNU-stack,"",@progbits > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ > > > >>>>>>> > > > >>>>>>> But integer promotion: > > > >>>>>>> > > > >>>>>>> movsbl 4(%esp), %eax > > > >>>>>>> movl %eax, 4(%esp) > > > >>>>>>> > > > >>>>>>> isn't necessary if incoming arguments and outgoing arguments are > > > >>>>>>> the > > > >>>>>>> same. Use unpromoted incoming integer arguments as outgoing > > > >>>>>>> arguments > > > >>>>>>> if incoming integer arguments are the same as outgoing arguments > > > >>>>>>> to > > > >>>>>>> avoid unnecessary integer promotion. > > > >>>>>> Is there a particular reason x86 can't use the same mechanisms that > > > >>>>> > > > >>>>> Other targets define TARGET_PROMOTE_PROTOTYPES to return false > > > >>>>> to avoid this issue. Changing x86 TARGET_PROMOTE_PROTOTYPES > > > >>>>> to return false will break LLVM which assumes that incoming > > > >>>>> char/short > > > >>>>> arguments on x86 are always extended to int. The following targets > > > >>>> Then my suggestion would be to cover this in REE somehow. We started > > > >>>> looking at that a couple years ago and set it aside. But the basic > > > >>>> idea was to expose the ABI guarantees to REE, then let REE do its > > > >>>> thing. > > > >>>> > > > >>>> Jeff > > > >>>> > > > >>> > > > >>> For > > > >>> > > > >>> extern int baz (char c1); > > > >>> > > > >>> int > > > >>> foo (char c1) > > > >>> { > > > >>> return baz (c1); > > > >>> } > > > >>> > > > >>> on i386, we get these > > > >>> > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 A32]))) > > > >>> "x.c":6:10 > > > >>> 190 {extendqisi2} > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 S4 A32]) > > > >>> (nil))) > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > >>> {*movsi_internal} > > > >>> (nil)) > > > >>> (call_insn/j 9 8 10 2 (set (reg:SI 0 ax) > > > >>> (call (mem:QI (symbol_ref:SI ("baz") [flags 0x41] > > > >>> <function_decl 0x7f27347aae00 > > > >>> baz>) [0 baz S1 A8]) > > > >>> (const_int 4 [0x4]))) "x.c":6:10 1420 {*sibcall_value} > > > >>> (expr_list:REG_CALL_DECL (symbol_ref:SI ("baz") [flags 0x41] > > > >>> <function_decl 0x7f273 > > > >>> 47aae00 baz>) > > > >>> (nil)) > > > >>> (expr_list:SI (use (mem:SI (reg/f:SI 16 argp) [0 S4 A32])) > > > >>> (nil))) > > > >>> > > > >>> before REE. How should REE work to elimate > > > >>> > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 A32]))) > > > >>> "x.c":6:10 > > > >>> 190 {extendqisi2} > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 S4 A32]) > > > >>> (nil))) > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > >>> {*movsi_internal} > > > >>> (nil)) > > > >> You'll have to write code to describe the ABI how the values in > > > >> question > > > >> are already extended to REE. It's not going to "just work", you'll > > > >> have > > > >> to do some development. I'm not inclined to ACK the expansion patch at > > > >> this time given we've got multiple ways to handle extension removal. > > > >> > > > > > > > > For char and short parameters, x86 ABI leaves the upper bits in > > > > 32 bit fields undefined. If the outgoing arguments are the same > > > > as the incoming arguments, there is no need to extend outgoing > > > > arguments. Also ABI info isn't available in REE. I am not sure if > > > > REE is the best fit here. > > > The whole point is to make it available and utilize it. > > > > I think most of the complication is due to the promotion being performed > > on GIMPLE but GIMPLE not taking the incoming argument as promoted. > > > > On my wishlist is to defer "applying" promote_prototypes to call RTL > > expansion and keeping GIMPLE "type correct", aka _not_ promoting > > arguments in the C/C++/Ada frontends. I'll note that not all frontends > > perform this promotion (sic!), and thus HJs pattern matching of the > > sibling call site is required for correctness even! > > > > I'll also note that there is DECL_ARG_TYPE on PARM_DECLs which > > should reflect the incoming promotion state - using the target hook in > > RTL expansion is "wrong" (see above - not applied consistently), but > > DECL_ARG_TYPE should tell you where it was applied and where not. > > With LTO and calling C from fortran this optimization might go wrong > > (fortran doesn't promote, C sets DECL_ARG_TYPE to promoted type).
My patch only uses DECL_ARG_TYPE to compare against the incoming argument to check if the incoming argument is used as outgoing argument unchanged. My patch doesn't change any functionality whether the caller does promotion or not since GCC always assumes the upper bits are undefined. My patch keeps the incoming argument unchanged only when there is no change in source code. > > > > I'd say we should either > > > > a) drop targetm.promote_prototypes (and hope nobody uses it to > > implement an ABI requirement) > > b) apply targetm.promote_prototypes during RTL expansion > > > > So the patch as-is isn't good I think (if you don't succeed with > > Fortran due to lack of small integer types you can also try D > > or Modula for the LTO wrong-code testcase). > > > > ISTR we do rely on DECL_ARG_TYPE in combine but wrongly so > > as said. > > program test > use iso_c_binding, only: c_short > interface > subroutine foo(a) bind(c) > import c_short > integer(kind=c_short), intent(in), value :: a > end subroutine foo > end interface > integer(kind=c_short) a(5); > call foo (a(3)) > end > > shows > > ;; foo (_7); > > (insn 13 12 14 (set (reg:HI 102) > (mem/c:HI (plus:DI (reg/f:DI 93 virtual-stack-vars) > (const_int -6 [0xfffffffffffffffa])) [1 a[2]+0 S2 > A16])) "t.f90":10:19 -1 > (nil)) > > (insn 14 13 15 (set (reg:HI 5 di) > (reg:HI 102)) "t.f90":10:19 -1 > (nil)) > > (call_insn 15 14 0 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x41] > <function_decl 0x7f7fcc430800 foo>) [0 foo S1 A8]) > (const_int 0 [0])) "t.f90":10:19 -1 > (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x41] > <function_decl 0x7f7fcc430800 foo>) > (nil)) > (expr_list:HI (use (reg:HI 5 di)) > (nil))) > > and with -Os generates > > movw 10(%rsp), %di > call foo > > but a C TU would assume 'foo' receives promoted arguments. GCC follows x86 psABI and assumes the upper bits are undefined. But LLVM assumes the upper bits are extended: $ cat x1.c extern int baz (int); int foo (char c1) { return baz (c1) + 1; } $ gcc -S -O2 x1.c $ cat x1.s .file "x1.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movsbl %dil, %edi call baz addq $8, %rsp .cfi_def_cfa_offset 8 addl $1, %eax ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" .section .note.GNU-stack,"",@progbits $ clang -S -O2 x1.c $ cat x1.s .text .file "x1.c" .globl foo # -- Begin function foo .p2align 4, 0x90 .type foo,@function foo: # @foo .cfi_startproc # %bb.0: pushq %rax .cfi_def_cfa_offset 16 callq baz incl %eax popq %rcx .cfi_def_cfa_offset 8 retq .Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc # -- End function .ident "clang version 19.1.0 (Fedora 19.1.0-1.fc41)" .section ".note.GNU-stack","",@progbits .addrsig $ -- H.J.