Hi! On 2024-06-27T22:27:21+0200, I wrote: > On 2024-06-27T18:49:17+0200, I wrote: >> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >>> This patch adds a combine pass that runs late in the pipeline. > > [After sending, I realized I replied to a previous thread of this work.] > >> I've beek looking a bit through recent nvptx target code generation >> changes for GCC target libraries, and thought I'd also share here my >> findings for the "late-combine" changes in isolation, for nvptx target. >> >> First the unexpected thing: > > So much for "unexpected thing" -- next level of unexpected here... > Appreciated if anyone feels like helping me find my way through this, but > I totally understand if you've got other things to do.
OK, I found something already. (Unexpectedly quickly...) ;-) >> there are a few cases where we now see unused >> registers get declared, for example (random) in >> 'nvptx-none/newlib/libc/libm_a-s_modf.o:modf' I've now looked into the former one ('tmp-libm_a-s_modf.i.xz' is attached), to avoid... > I first looked into a simpler case: newlib 'libc/locale/lnumeric.c'. > ../../../source-gcc/newlib/libc/locale/lnumeric.c:88:10: warning: ‘ret’ > is used uninitialized [-Wuninitialized] > 88 | return ret; > | ^~~ > ../../../source-gcc/newlib/libc/locale/lnumeric.c:48:7: note: ‘ret’ was > declared here > 48 | int ret; > | ^~~ > > Uh. Given nothing else is going on in that function, I suppose '%r22' > relates to the uninitialized 'ret' -- and given undefined behavior, GCC > of course is fine to emit an unused 'reg' in that case... ... the undefined behavior here. But in fact, for both cases, the unexpected difference goes away if after 'pass_late_combine' I inject a 'pass_fast_rtl_dce'. That's normally run as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's all not active for nvptx target given '!reload_completed', given nvptx is 'targetm.no_register_allocation'. Maybe we need to enable a few more passes, or is there anything in 'pass_late_combine' to change, so that we don't run into this? Does it inadvertently mark registers live or something like that? The following makes these two cases work, but evidently needs a lot more analysis: a lot of other passes are enabled that may be anything between beneficial and harmful for 'targetm.no_register_allocation'/nvptx. --- gcc/passes.cc +++ gcc/passes.cc @@ -676,17 +676,17 @@ const pass_data pass_data_postreload = class pass_postreload : public rtl_opt_pass { public: pass_postreload (gcc::context *ctxt) : rtl_opt_pass (pass_data_postreload, ctxt) {} /* opt_pass methods: */ - bool gate (function *) final override { return reload_completed; } + bool gate (function *) final override { return reload_completed || targetm.no_register_allocation; } --- gcc/regcprop.cc +++ gcc/regcprop.cc @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass public: pass_cprop_hardreg (gcc::context *ctxt) : rtl_opt_pass (pass_data_cprop_hardreg, ctxt) {} /* opt_pass methods: */ bool gate (function *) final override { - return (optimize > 0 && (flag_cprop_registers)); + return (optimize > 0 && flag_cprop_registers && !targetm.no_register_allocation); } Grüße Thomas > But: should we expect '-fno-late-combine-instructions' vs. > '-flate-combine-instructions' to behave in the same way? (After all, > '%r22' remains unused also with '-flate-combine-instructions', and > doesn't need to be emitted.) This could, of course, also be a nvptx back > end issue? > > I'm happy to supply any dump files etc. Also, 'tmp-libc_a-lnumeric.i.xz' > is attached if you'd like to reproduce this with your own nvptx target > 'cc1': > > $ [...]/configure --target=nvptx-none --enable-languages=c > $ make -j12 all-gcc > $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase > tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o > tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions > > > Grüße > Thomas
tmp-libm_a-s_modf.i.xz
Description: application/xz