On Wed, Nov 6, 2024 at 6:01 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Nov 6, 2024 at 10:52 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Wed, Nov 6, 2024 at 4:29 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener > > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law <jeffreya...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrote: > > > > > > > > On Tue, Nov 5, 2024 at 10:57 AM Jeff Law > > > > > > > > <jeffreya...@gmail.com> wrote: > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> On 11/4/24 7:52 PM, H.J. Lu wrote: > > > > > > > >>> On Tue, Nov 5, 2024 at 8:48 AM Jeff Law > > > > > > > >>> <jeffreya...@gmail.com> wrote: > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> On 11/4/24 5:42 PM, H.J. Lu wrote: > > > > > > > >>>>> On Tue, Nov 5, 2024 at 8:07 AM Jeff Law > > > > > > > >>>>> <jeffreya...@gmail.com> wrote: > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> On 11/1/24 4:32 PM, H.J. Lu wrote: > > > > > > > >>>>>>> For targets, like x86, which define > > > > > > > >>>>>>> TARGET_PROMOTE_PROTOTYPES to return > > > > > > > >>>>>>> true, all integer arguments smaller than int are passed > > > > > > > >>>>>>> as int: > > > > > > > >>>>>>> > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.c > > > > > > > >>>>>>> extern int baz (char c1); > > > > > > > >>>>>>> > > > > > > > >>>>>>> int > > > > > > > >>>>>>> foo (char c1) > > > > > > > >>>>>>> { > > > > > > > >>>>>>> return baz (c1); > > > > > > > >>>>>>> } > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.s > > > > > > > >>>>>>> .file "x.c" > > > > > > > >>>>>>> .text > > > > > > > >>>>>>> .p2align 4 > > > > > > > >>>>>>> .globl foo > > > > > > > >>>>>>> .type foo, @function > > > > > > > >>>>>>> foo: > > > > > > > >>>>>>> .LFB0: > > > > > > > >>>>>>> .cfi_startproc > > > > > > > >>>>>>> movsbl 4(%esp), %eax > > > > > > > >>>>>>> movl %eax, 4(%esp) > > > > > > > >>>>>>> jmp baz > > > > > > > >>>>>>> .cfi_endproc > > > > > > > >>>>>>> .LFE0: > > > > > > > >>>>>>> .size foo, .-foo > > > > > > > >>>>>>> .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat > > > > > > > >>>>>>> 14.2.1-3)" > > > > > > > >>>>>>> .section .note.GNU-stack,"",@progbits > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ > > > > > > > >>>>>>> > > > > > > > >>>>>>> But integer promotion: > > > > > > > >>>>>>> > > > > > > > >>>>>>> movsbl 4(%esp), %eax > > > > > > > >>>>>>> movl %eax, 4(%esp) > > > > > > > >>>>>>> > > > > > > > >>>>>>> isn't necessary if incoming arguments and outgoing > > > > > > > >>>>>>> arguments are the > > > > > > > >>>>>>> same. Use unpromoted incoming integer arguments as > > > > > > > >>>>>>> outgoing arguments > > > > > > > >>>>>>> if incoming integer arguments are the same as outgoing > > > > > > > >>>>>>> arguments to > > > > > > > >>>>>>> avoid unnecessary integer promotion. > > > > > > > >>>>>> Is there a particular reason x86 can't use the same > > > > > > > >>>>>> mechanisms that > > > > > > > >>>>> > > > > > > > >>>>> Other targets define TARGET_PROMOTE_PROTOTYPES to return > > > > > > > >>>>> false > > > > > > > >>>>> to avoid this issue. Changing x86 > > > > > > > >>>>> TARGET_PROMOTE_PROTOTYPES > > > > > > > >>>>> to return false will break LLVM which assumes that incoming > > > > > > > >>>>> char/short > > > > > > > >>>>> arguments on x86 are always extended to int. The > > > > > > > >>>>> following targets > > > > > > > >>>> Then my suggestion would be to cover this in REE somehow. > > > > > > > >>>> We started > > > > > > > >>>> looking at that a couple years ago and set it aside. But > > > > > > > >>>> the basic > > > > > > > >>>> idea was to expose the ABI guarantees to REE, then let REE > > > > > > > >>>> do its thing. > > > > > > > >>>> > > > > > > > >>>> Jeff > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> For > > > > > > > >>> > > > > > > > >>> extern int baz (char c1); > > > > > > > >>> > > > > > > > >>> int > > > > > > > >>> foo (char c1) > > > > > > > >>> { > > > > > > > >>> return baz (c1); > > > > > > > >>> } > > > > > > > >>> > > > > > > > >>> on i386, we get these > > > > > > > >>> > > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > > > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > > > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 A32]))) > > > > > > > >>> "x.c":6:10 > > > > > > > >>> 190 {extendqisi2} > > > > > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 S4 > > > > > > > >>> A32]) > > > > > > > >>> (nil))) > > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > > > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > > > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > > > > > >>> {*movsi_internal} > > > > > > > >>> (nil)) > > > > > > > >>> (call_insn/j 9 8 10 2 (set (reg:SI 0 ax) > > > > > > > >>> (call (mem:QI (symbol_ref:SI ("baz") [flags 0x41] > > > > > > > >>> <function_decl 0x7f27347aae00 > > > > > > > >>> baz>) [0 baz S1 A8]) > > > > > > > >>> (const_int 4 [0x4]))) "x.c":6:10 1420 > > > > > > > >>> {*sibcall_value} > > > > > > > >>> (expr_list:REG_CALL_DECL (symbol_ref:SI ("baz") [flags > > > > > > > >>> 0x41] > > > > > > > >>> <function_decl 0x7f273 > > > > > > > >>> 47aae00 baz>) > > > > > > > >>> (nil)) > > > > > > > >>> (expr_list:SI (use (mem:SI (reg/f:SI 16 argp) [0 S4 > > > > > > > >>> A32])) > > > > > > > >>> (nil))) > > > > > > > >>> > > > > > > > >>> before REE. How should REE work to elimate > > > > > > > >>> > > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > > > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > > > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 A32]))) > > > > > > > >>> "x.c":6:10 > > > > > > > >>> 190 {extendqisi2} > > > > > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 S4 > > > > > > > >>> A32]) > > > > > > > >>> (nil))) > > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > > > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > > > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > > > > > >>> {*movsi_internal} > > > > > > > >>> (nil)) > > > > > > > >> You'll have to write code to describe the ABI how the values > > > > > > > >> in question > > > > > > > >> are already extended to REE. It's not going to "just work", > > > > > > > >> you'll have > > > > > > > >> to do some development. I'm not inclined to ACK the expansion > > > > > > > >> patch at > > > > > > > >> this time given we've got multiple ways to handle extension > > > > > > > >> removal. > > > > > > > >> > > > > > > > > > > > > > > > > For char and short parameters, x86 ABI leaves the upper bits in > > > > > > > > 32 bit fields undefined. If the outgoing arguments are the same > > > > > > > > as the incoming arguments, there is no need to extend outgoing > > > > > > > > arguments. Also ABI info isn't available in REE. I am not > > > > > > > > sure if > > > > > > > > REE is the best fit here. > > > > > > > The whole point is to make it available and utilize it. > > > > > > > > > > > > I think most of the complication is due to the promotion being > > > > > > performed > > > > > > on GIMPLE but GIMPLE not taking the incoming argument as promoted. > > > > > > > > > > > > On my wishlist is to defer "applying" promote_prototypes to call RTL > > > > > > expansion and keeping GIMPLE "type correct", aka _not_ promoting > > > > > > arguments in the C/C++/Ada frontends. I'll note that not all > > > > > > frontends > > > > > > perform this promotion (sic!), and thus HJs pattern matching of the > > > > > > sibling call site is required for correctness even! > > > > > > > > > > > > I'll also note that there is DECL_ARG_TYPE on PARM_DECLs which > > > > > > should reflect the incoming promotion state - using the target hook > > > > > > in > > > > > > RTL expansion is "wrong" (see above - not applied consistently), but > > > > > > DECL_ARG_TYPE should tell you where it was applied and where not. > > > > > > With LTO and calling C from fortran this optimization might go wrong > > > > > > (fortran doesn't promote, C sets DECL_ARG_TYPE to promoted type). > > > > > > > > My patch only uses DECL_ARG_TYPE to compare against the incoming > > > > argument to check if the incoming argument is used as outgoing argument > > > > unchanged. My patch doesn't change any functionality whether the caller > > > > does promotion or not since GCC always assumes the upper bits are > > > > undefined. My patch keeps the incoming argument unchanged only > > > > when there is no change in source code. > > > > > > > > > > > > > > > > I'd say we should either > > > > > > > > > > > > a) drop targetm.promote_prototypes (and hope nobody uses it to > > > > > > implement an ABI requirement) > > > > > > b) apply targetm.promote_prototypes during RTL expansion > > > > > > > > > > > > So the patch as-is isn't good I think (if you don't succeed with > > > > > > Fortran due to lack of small integer types you can also try D > > > > > > or Modula for the LTO wrong-code testcase). > > > > > > > > > > > > ISTR we do rely on DECL_ARG_TYPE in combine but wrongly so > > > > > > as said. > > > > > > > > > > program test > > > > > use iso_c_binding, only: c_short > > > > > interface > > > > > subroutine foo(a) bind(c) > > > > > import c_short > > > > > integer(kind=c_short), intent(in), value :: a > > > > > end subroutine foo > > > > > end interface > > > > > integer(kind=c_short) a(5); > > > > > call foo (a(3)) > > > > > end > > > > > > > > > > shows > > > > > > > > > > ;; foo (_7); > > > > > > > > > > (insn 13 12 14 (set (reg:HI 102) > > > > > (mem/c:HI (plus:DI (reg/f:DI 93 virtual-stack-vars) > > > > > (const_int -6 [0xfffffffffffffffa])) [1 a[2]+0 S2 > > > > > A16])) "t.f90":10:19 -1 > > > > > (nil)) > > > > > > > > > > (insn 14 13 15 (set (reg:HI 5 di) > > > > > (reg:HI 102)) "t.f90":10:19 -1 > > > > > (nil)) > > > > > > > > > > (call_insn 15 14 0 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x41] > > > > > <function_decl 0x7f7fcc430800 foo>) [0 foo S1 A8]) > > > > > (const_int 0 [0])) "t.f90":10:19 -1 > > > > > (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x41] > > > > > <function_decl 0x7f7fcc430800 foo>) > > > > > (nil)) > > > > > (expr_list:HI (use (reg:HI 5 di)) > > > > > (nil))) > > > > > > > > > > and with -Os generates > > > > > > > > > > movw 10(%rsp), %di > > > > > call foo > > > > > > > > > > but a C TU would assume 'foo' receives promoted arguments. > > > > > > > > GCC follows x86 psABI and assumes the upper bits are undefined. > > > > But LLVM assumes the upper bits are extended: > > > > > > I'm saying your patch increases the likelyhood we run into problems with > > > out internal inconsistency of handling promote_prototypes - it adds to the > > > number of places we wrongly assume all callers honor it when the > > > callee was compiled in a way to honor the target hook. > > > > > > So IMO the patch is wrong (in the sense of wrong-code), and as given > > > promote_prototypes is only useful for TU local binding functions. > > > > TARGET_PROMOTE_PROTOTYPES isn't defined for psABI purpose. > > x86 psABI doesn't require it. GCC uses only the lower bits of incoming > > arguments. But it isn't the GCC's job to promote incoming arguments > > and copy them to outgoing arguments since not usages of such functions > > are compiled by GCC. > > But if you now elide the promotion for the call to foo in > > static void foo (char c) { ... }
Only calling local functions, like foo, is an issue. There is no issue if foo is global. Optimization for calling global functions is OK. Am I correct? > void bar (char c) { foo (c); } > > then the existing combine optimization breaks since when all calls are > from the local TU (as for foo()) we assume that the promotion happened > but after your optimization it now depends on the callers of bar which > we do not control. > > Richard. > > > > I showed a way to rectify this by applying promote_prototypes at > > > call RTL expansion time (instead of or in addition to frontends). Iff we > > > want to keep doing this "optimization" - given we perform this > > > optimization > > > in combine it got effectively part of the de-factor x86 ABI as we assume > > > our callers perform the promotion, no? > > > > > > Richard. > > > > > > > $ cat x1.c > > > > extern int baz (int); > > > > > > > > int > > > > foo (char c1) > > > > { > > > > return baz (c1) + 1; > > > > } > > > > $ gcc -S -O2 x1.c > > > > $ cat x1.s > > > > .file "x1.c" > > > > .text > > > > .p2align 4 > > > > .globl foo > > > > .type foo, @function > > > > foo: > > > > .LFB0: > > > > .cfi_startproc > > > > subq $8, %rsp > > > > .cfi_def_cfa_offset 16 > > > > movsbl %dil, %edi > > > > call baz > > > > addq $8, %rsp > > > > .cfi_def_cfa_offset 8 > > > > addl $1, %eax > > > > ret > > > > .cfi_endproc > > > > .LFE0: > > > > .size foo, .-foo > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" > > > > .section .note.GNU-stack,"",@progbits > > > > $ clang -S -O2 x1.c > > > > $ cat x1.s > > > > .text > > > > .file "x1.c" > > > > .globl foo # -- Begin function foo > > > > .p2align 4, 0x90 > > > > .type foo,@function > > > > foo: # @foo > > > > .cfi_startproc > > > > # %bb.0: > > > > pushq %rax > > > > .cfi_def_cfa_offset 16 > > > > callq baz > > > > incl %eax > > > > popq %rcx > > > > .cfi_def_cfa_offset 8 > > > > retq > > > > .Lfunc_end0: > > > > .size foo, .Lfunc_end0-foo > > > > .cfi_endproc > > > > # -- End function > > > > .ident "clang version 19.1.0 (Fedora 19.1.0-1.fc41)" > > > > .section ".note.GNU-stack","",@progbits > > > > .addrsig > > > > $ > > > > > > > > -- > > > > H.J. > > > > > > > > -- > > H.J. -- H.J.