On Wed, Nov 6, 2024 at 6:01 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 10:52 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Wed, Nov 6, 2024 at 4:29 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener
> > > > <richard.guent...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener
> > > > > <richard.guent...@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law <jeffreya...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrote:
> > > > > > > > On Tue, Nov 5, 2024 at 10:57 AM Jeff Law 
> > > > > > > > <jeffreya...@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On 11/4/24 7:52 PM, H.J. Lu wrote:
> > > > > > > >>> On Tue, Nov 5, 2024 at 8:48 AM Jeff Law 
> > > > > > > >>> <jeffreya...@gmail.com> wrote:
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On 11/4/24 5:42 PM, H.J. Lu wrote:
> > > > > > > >>>>> On Tue, Nov 5, 2024 at 8:07 AM Jeff Law 
> > > > > > > >>>>> <jeffreya...@gmail.com> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>> On 11/1/24 4:32 PM, H.J. Lu wrote:
> > > > > > > >>>>>>> For targets, like x86, which define 
> > > > > > > >>>>>>> TARGET_PROMOTE_PROTOTYPES to return
> > > > > > > >>>>>>> true, all integer arguments smaller than int are passed 
> > > > > > > >>>>>>> as int:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > >>>>>>> extern int baz (char c1);
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> int
> > > > > > > >>>>>>> foo (char c1)
> > > > > > > >>>>>>> {
> > > > > > > >>>>>>>       return baz (c1);
> > > > > > > >>>>>>> }
> > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > >>>>>>>          .file   "x.c"
> > > > > > > >>>>>>>          .text
> > > > > > > >>>>>>>          .p2align 4
> > > > > > > >>>>>>>          .globl  foo
> > > > > > > >>>>>>>          .type   foo, @function
> > > > > > > >>>>>>> foo:
> > > > > > > >>>>>>> .LFB0:
> > > > > > > >>>>>>>          .cfi_startproc
> > > > > > > >>>>>>>          movsbl  4(%esp), %eax
> > > > > > > >>>>>>>          movl    %eax, 4(%esp)
> > > > > > > >>>>>>>          jmp     baz
> > > > > > > >>>>>>>          .cfi_endproc
> > > > > > > >>>>>>> .LFE0:
> > > > > > > >>>>>>>          .size   foo, .-foo
> > > > > > > >>>>>>>          .ident  "GCC: (GNU) 14.2.1 20240912 (Red Hat 
> > > > > > > >>>>>>> 14.2.1-3)"
> > > > > > > >>>>>>>          .section        .note.GNU-stack,"",@progbits
> > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> But integer promotion:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>          movsbl  4(%esp), %eax
> > > > > > > >>>>>>>          movl    %eax, 4(%esp)
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> isn't necessary if incoming arguments and outgoing 
> > > > > > > >>>>>>> arguments are the
> > > > > > > >>>>>>> same.  Use unpromoted incoming integer arguments as 
> > > > > > > >>>>>>> outgoing arguments
> > > > > > > >>>>>>> if incoming integer arguments are the same as outgoing 
> > > > > > > >>>>>>> arguments to
> > > > > > > >>>>>>> avoid unnecessary integer promotion.
> > > > > > > >>>>>> Is there a particular reason x86 can't use the same 
> > > > > > > >>>>>> mechanisms that
> > > > > > > >>>>>
> > > > > > > >>>>> Other targets define TARGET_PROMOTE_PROTOTYPES to return 
> > > > > > > >>>>> false
> > > > > > > >>>>> to avoid this issue.   Changing x86 
> > > > > > > >>>>> TARGET_PROMOTE_PROTOTYPES
> > > > > > > >>>>> to return false will break LLVM which assumes that incoming 
> > > > > > > >>>>> char/short
> > > > > > > >>>>> arguments on x86 are always extended to int.   The 
> > > > > > > >>>>> following targets
> > > > > > > >>>> Then my suggestion would be to cover this in REE somehow.  
> > > > > > > >>>> We started
> > > > > > > >>>> looking at that a couple years ago and set it aside.   But 
> > > > > > > >>>> the basic
> > > > > > > >>>> idea was to expose the ABI guarantees to REE, then let REE 
> > > > > > > >>>> do its thing.
> > > > > > > >>>>
> > > > > > > >>>> Jeff
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> For
> > > > > > > >>>
> > > > > > > >>> extern int baz (char c1);
> > > > > > > >>>
> > > > > > > >>> int
> > > > > > > >>> foo (char c1)
> > > > > > > >>> {
> > > > > > > >>>     return baz (c1);
> > > > > > > >>> }
> > > > > > > >>>
> > > > > > > >>> on i386, we get these
> > > > > > > >>>
> > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102])
> > > > > > > >>>           (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp)
> > > > > > > >>>                       (const_int 4 [0x4])) [0 c1+0 S1 A32]))) 
> > > > > > > >>> "x.c":6:10
> > > > > > > >>> 190 {extendqisi2}
> > > > > > > >>>        (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0  S4 
> > > > > > > >>> A32])
> > > > > > > >>>           (nil)))
> > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp)
> > > > > > > >>>                   (const_int 4 [0x4])) [0  S4 A32])
> > > > > > > >>>           (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 
> > > > > > > >>> {*movsi_internal}
> > > > > > > >>>        (nil))
> > > > > > > >>> (call_insn/j 9 8 10 2 (set (reg:SI 0 ax)
> > > > > > > >>>           (call (mem:QI (symbol_ref:SI ("baz") [flags 0x41]
> > > > > > > >>> <function_decl 0x7f27347aae00
> > > > > > > >>> baz>) [0 baz S1 A8])
> > > > > > > >>>               (const_int 4 [0x4]))) "x.c":6:10 1420 
> > > > > > > >>> {*sibcall_value}
> > > > > > > >>>        (expr_list:REG_CALL_DECL (symbol_ref:SI ("baz") [flags 
> > > > > > > >>> 0x41]
> > > > > > > >>> <function_decl 0x7f273
> > > > > > > >>> 47aae00 baz>)
> > > > > > > >>>           (nil))
> > > > > > > >>>       (expr_list:SI (use (mem:SI (reg/f:SI 16 argp) [0  S4 
> > > > > > > >>> A32]))
> > > > > > > >>>           (nil)))
> > > > > > > >>>
> > > > > > > >>> before REE.  How should REE work to elimate
> > > > > > > >>>
> > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102])
> > > > > > > >>>           (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp)
> > > > > > > >>>                       (const_int 4 [0x4])) [0 c1+0 S1 A32]))) 
> > > > > > > >>> "x.c":6:10
> > > > > > > >>> 190 {extendqisi2}
> > > > > > > >>>        (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0  S4 
> > > > > > > >>> A32])
> > > > > > > >>>           (nil)))
> > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp)
> > > > > > > >>>                   (const_int 4 [0x4])) [0  S4 A32])
> > > > > > > >>>           (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 
> > > > > > > >>> {*movsi_internal}
> > > > > > > >>>        (nil))
> > > > > > > >> You'll have to write code to describe the ABI how the values 
> > > > > > > >> in question
> > > > > > > >> are already extended to REE.  It's not going to "just work", 
> > > > > > > >> you'll have
> > > > > > > >> to do some development.  I'm not inclined to ACK the expansion 
> > > > > > > >> patch at
> > > > > > > >> this time given we've got multiple ways to handle extension 
> > > > > > > >> removal.
> > > > > > > >>
> > > > > > > >
> > > > > > > > For char and short parameters, x86 ABI leaves the upper bits in
> > > > > > > > 32 bit fields undefined.  If the outgoing arguments are the same
> > > > > > > > as the incoming arguments, there is no need to extend outgoing
> > > > > > > > arguments.   Also ABI info isn't available in REE.  I am not 
> > > > > > > > sure if
> > > > > > > > REE is the best fit here.
> > > > > > > The whole point is to make it available and utilize it.
> > > > > >
> > > > > > I think most of the complication is due to the promotion being 
> > > > > > performed
> > > > > > on GIMPLE but GIMPLE not taking the incoming argument as promoted.
> > > > > >
> > > > > > On my wishlist is to defer "applying" promote_prototypes to call RTL
> > > > > > expansion and keeping GIMPLE "type correct", aka _not_ promoting
> > > > > > arguments in the C/C++/Ada frontends.  I'll note that not all 
> > > > > > frontends
> > > > > > perform this promotion (sic!), and thus HJs pattern matching of the
> > > > > > sibling call site is required for correctness even!
> > > > > >
> > > > > > I'll also note that there is DECL_ARG_TYPE on PARM_DECLs which
> > > > > > should reflect the incoming promotion state - using the target hook 
> > > > > > in
> > > > > > RTL expansion is "wrong" (see above - not applied consistently), but
> > > > > > DECL_ARG_TYPE should tell you where it was applied and where not.
> > > > > > With LTO and calling C from fortran this optimization might go wrong
> > > > > > (fortran doesn't promote, C sets DECL_ARG_TYPE to promoted type).
> > > >
> > > > My patch only uses DECL_ARG_TYPE to compare against the incoming
> > > > argument to check if the incoming argument is used as outgoing argument
> > > > unchanged.  My patch doesn't change any functionality whether the caller
> > > > does promotion or not since GCC always assumes the upper bits are
> > > > undefined.  My patch keeps the incoming argument unchanged only
> > > > when there is no change in source code.
> > > >
> > > > > >
> > > > > > I'd say we should either
> > > > > >
> > > > > >  a) drop targetm.promote_prototypes (and hope nobody uses it to
> > > > > >      implement an ABI requirement)
> > > > > >  b) apply targetm.promote_prototypes during RTL expansion
> > > > > >
> > > > > > So the patch as-is isn't good I think (if you don't succeed with
> > > > > > Fortran due to lack of small integer types you can also try D
> > > > > > or Modula for the LTO wrong-code testcase).
> > > > > >
> > > > > > ISTR we do rely on DECL_ARG_TYPE in combine but wrongly so
> > > > > > as said.
> > > > >
> > > > > program test
> > > > >     use iso_c_binding, only: c_short
> > > > >     interface
> > > > >       subroutine foo(a) bind(c)
> > > > >         import c_short
> > > > >         integer(kind=c_short), intent(in), value :: a
> > > > >       end subroutine foo
> > > > >     end interface
> > > > >     integer(kind=c_short) a(5);
> > > > >     call foo (a(3))
> > > > > end
> > > > >
> > > > > shows
> > > > >
> > > > > ;; foo (_7);
> > > > >
> > > > > (insn 13 12 14 (set (reg:HI 102)
> > > > >         (mem/c:HI (plus:DI (reg/f:DI 93 virtual-stack-vars)
> > > > >                 (const_int -6 [0xfffffffffffffffa])) [1 a[2]+0 S2
> > > > > A16])) "t.f90":10:19 -1
> > > > >      (nil))
> > > > >
> > > > > (insn 14 13 15 (set (reg:HI 5 di)
> > > > >         (reg:HI 102)) "t.f90":10:19 -1
> > > > >      (nil))
> > > > >
> > > > > (call_insn 15 14 0 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x41]
> > > > > <function_decl 0x7f7fcc430800 foo>) [0 foo S1 A8])
> > > > >         (const_int 0 [0])) "t.f90":10:19 -1
> > > > >      (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x41]
> > > > > <function_decl 0x7f7fcc430800 foo>)
> > > > >         (nil))
> > > > >     (expr_list:HI (use (reg:HI 5 di))
> > > > >         (nil)))
> > > > >
> > > > > and with -Os generates
> > > > >
> > > > >         movw    10(%rsp), %di
> > > > >         call    foo
> > > > >
> > > > > but a C TU would assume 'foo' receives promoted arguments.
> > > >
> > > > GCC follows x86 psABI and assumes the upper bits are undefined.
> > > > But LLVM assumes the upper bits are extended:
> > >
> > > I'm saying your patch increases the likelyhood we run into problems with
> > > out internal inconsistency of handling promote_prototypes - it adds to the
> > > number of places we wrongly assume all callers honor it when the
> > > callee was compiled in a way to honor the target hook.
> > >
> > > So IMO the patch is wrong (in the sense of wrong-code), and as given
> > > promote_prototypes is only useful for TU local binding functions.
> >
> > TARGET_PROMOTE_PROTOTYPES isn't defined for psABI purpose.
> > x86 psABI doesn't require it.   GCC uses only the lower bits of incoming
> > arguments.   But it isn't the GCC's job to promote incoming arguments
> > and copy them to outgoing arguments since not usages of such functions
> > are compiled by GCC.
>
> But if you now elide the promotion for the call to foo in
>
> static void foo (char c) { ... }

Only calling local functions, like foo, is an issue.  There is no
issue if foo is global.   Optimization for calling global functions
is OK.  Am I correct?

> void bar (char c) { foo (c); }
>
> then the existing combine optimization breaks since when all calls are
> from the local TU (as for foo()) we assume that the promotion happened
> but after your optimization it now depends on the callers of bar which
> we do not control.
>
> Richard.
>
> > > I showed a way to rectify this by applying promote_prototypes at
> > > call RTL expansion time (instead of or in addition to frontends).  Iff we
> > > want to keep doing this "optimization" - given we perform this 
> > > optimization
> > > in combine it got effectively part of the de-factor x86 ABI as we assume
> > > our callers perform the promotion, no?
> > >
> > > Richard.
> > >
> > > > $ cat x1.c
> > > > extern int baz (int);
> > > >
> > > > int
> > > > foo (char c1)
> > > > {
> > > >   return baz (c1) + 1;
> > > > }
> > > > $ gcc -S -O2 x1.c
> > > > $ cat x1.s
> > > > .file "x1.c"
> > > > .text
> > > > .p2align 4
> > > > .globl foo
> > > > .type foo, @function
> > > > foo:
> > > > .LFB0:
> > > > .cfi_startproc
> > > > subq $8, %rsp
> > > > .cfi_def_cfa_offset 16
> > > > movsbl %dil, %edi
> > > > call baz
> > > > addq $8, %rsp
> > > > .cfi_def_cfa_offset 8
> > > > addl $1, %eax
> > > > ret
> > > > .cfi_endproc
> > > > .LFE0:
> > > > .size foo, .-foo
> > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > .section .note.GNU-stack,"",@progbits
> > > > $ clang -S -O2 x1.c
> > > > $ cat x1.s
> > > > .text
> > > > .file "x1.c"
> > > > .globl foo                             # -- Begin function foo
> > > > .p2align 4, 0x90
> > > > .type foo,@function
> > > > foo:                                    # @foo
> > > > .cfi_startproc
> > > > # %bb.0:
> > > > pushq %rax
> > > > .cfi_def_cfa_offset 16
> > > > callq baz
> > > > incl %eax
> > > > popq %rcx
> > > > .cfi_def_cfa_offset 8
> > > > retq
> > > > .Lfunc_end0:
> > > > .size foo, .Lfunc_end0-foo
> > > > .cfi_endproc
> > > >                                         # -- End function
> > > > .ident "clang version 19.1.0 (Fedora 19.1.0-1.fc41)"
> > > > .section ".note.GNU-stack","",@progbits
> > > > .addrsig
> > > > $
> > > >
> > > > --
> > > > H.J.
> >
> >
> >
> > --
> > H.J.



-- 
H.J.

Reply via email to