Hi!

On 2024-06-27T22:27:21+0200, I wrote:
> On 2024-06-27T18:49:17+0200, I wrote:
>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>>> This patch adds a combine pass that runs late in the pipeline.
>
> [After sending, I realized I replied to a previous thread of this work.]
>
>> I've beek looking a bit through recent nvptx target code generation
>> changes for GCC target libraries, and thought I'd also share here my
>> findings for the "late-combine" changes in isolation, for nvptx target.
>> 
>> First the unexpected thing:
>
> So much for "unexpected thing" -- next level of unexpected here...
> Appreciated if anyone feels like helping me find my way through this, but
> I totally understand if you've got other things to do.

OK, I found something already.  (Unexpectedly quickly...)  ;-)

>> there are a few cases where we now see unused
>> registers get declared, for example (random) in
>> 'nvptx-none/newlib/libc/libm_a-s_modf.o:modf'

I've now looked into the former one ('tmp-libm_a-s_modf.i.xz' is
attached), to avoid...

> I first looked into a simpler case: newlib 'libc/locale/lnumeric.c'.

>     ../../../source-gcc/newlib/libc/locale/lnumeric.c:88:10: warning: ‘ret’ 
> is used uninitialized [-Wuninitialized]
>        88 |   return ret;
>           |          ^~~
>     ../../../source-gcc/newlib/libc/locale/lnumeric.c:48:7: note: ‘ret’ was 
> declared here
>        48 |   int ret;
>           |       ^~~
>
> Uh.  Given nothing else is going on in that function, I suppose '%r22'
> relates to the uninitialized 'ret' -- and given undefined behavior, GCC
> of course is fine to emit an unused 'reg' in that case...

... the undefined behavior here.

But in fact, for both cases, the unexpected difference goes away if after
'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run
as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's
all not active for nvptx target given '!reload_completed', given nvptx is
'targetm.no_register_allocation'.  Maybe we need to enable a few more
passes, or is there anything in 'pass_late_combine' to change, so that we
don't run into this?  Does it inadvertently mark registers live or
something like that?

The following makes these two cases work, but evidently needs a lot more
analysis: a lot of other passes are enabled that may be anything between
beneficial and harmful for 'targetm.no_register_allocation'/nvptx.

    --- gcc/passes.cc
    +++ gcc/passes.cc
    @@ -676,17 +676,17 @@ const pass_data pass_data_postreload =
     class pass_postreload : public rtl_opt_pass
     {
     public:
       pass_postreload (gcc::context *ctxt)
         : rtl_opt_pass (pass_data_postreload, ctxt)
       {}
     
       /* opt_pass methods: */
    -  bool gate (function *) final override { return reload_completed; }
    +  bool gate (function *) final override { return reload_completed || 
targetm.no_register_allocation; }
    --- gcc/regcprop.cc
    +++ gcc/regcprop.cc
    @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass
     public:
       pass_cprop_hardreg (gcc::context *ctxt)
         : rtl_opt_pass (pass_data_cprop_hardreg, ctxt)
       {}
     
       /* opt_pass methods: */
       bool gate (function *) final override
         {
    -      return (optimize > 0 && (flag_cprop_registers));
    +      return (optimize > 0 && flag_cprop_registers && 
!targetm.no_register_allocation);
         }


Grüße
 Thomas


> But: should we expect '-fno-late-combine-instructions' vs.
> '-flate-combine-instructions' to behave in the same way?  (After all,
> '%r22' remains unused also with '-flate-combine-instructions', and
> doesn't need to be emitted.)  This could, of course, also be a nvptx back
> end issue?
>
> I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
> is attached if you'd like to reproduce this with your own nvptx target
> 'cc1':
>
>     $ [...]/configure --target=nvptx-none --enable-languages=c
>     $ make -j12 all-gcc
>     $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase 
> tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o 
> tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions
>
>
> Grüße
>  Thomas


Attachment: tmp-libm_a-s_modf.i.xz
Description: application/xz

Reply via email to