Hi, Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi, > > Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> Hi, >> >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote: >>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool >>>> > <seg...@kernel.crashing.org>: >>>> > There are at least six very different kinds of subreg: >>>> > >>>> > 0) Lvalue subregs. Most archs have no use for it, and it can be >>>> > expressed much more clearly and cleanly always. >>>> > 1) Subregs of mem. Do not use, deprecated. When old reload goes away >>>> > this will go away. >>>> > 2) Subregs of hard registers. Do not use, there are much better ways to >>>> > write subregs of a non-zero byte offset, and for zero offset this is >>>> > non-canonical RTL. >>>> > 3) Bitcast subregs. In principle they go from one mode to another mode >>>> > of the same size (but read on). >>>> > 4) Paradoxical subregs. A concept completely separate from the rest, >>>> > different rules for everything, it has to be special cased almost >>>> > everywhere, it would be better if it was a separate rtx_code imo. >>>> > 5) Finally, normal subregs, taking a contiguous span of bits from some >>>> > value. >>>> > >>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is >>>> > written as just one subreg, as you say. And a 4) of a 5) is just >>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) ) >>>> > >>>> >> Note whether targets actually support subreg operations needs to be >>>> >> queried and I’m not sure how subreg with offset validation should work >>>> >> there. >>>> > >>>> > But 3) is always valid, no? On pseudos >> I also has similar question: do we need to query/recog if "SF(SI#0)" is >> valid on the target, or it would always work (even through reload)? >> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid, >> and "SF(DI#4)" is not valid. >>>> >>>> Yes, but it will eventually result in a spill/reload which is >>>> undesirable when we created this from CSE from a load. So I think >>>> for CSE we do want to know whether a spill will definitely not >>>> occur. >>> >>> Does it cause reloads though? On any sane backend? If no movsf pattern >>> allows integer registers, can things work at all? >>> >>> Anyway, the normal way to test if some RTL is valid is to just generate >>> it (using validate_change) and then do apply_change_group, which then >>> cancels the changes if they do not work. CSE already does some of >>> this. >> validate_change seems ok. Thanks! >>> >>> (I am doubtful doing any of this in CSE is a good idea fwiw). >> Understand your concern! Especially when we need to emit additional >> inns in CSE. >> While CSE does some similar work. It transforms >> "[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI". >> and "see if a MEM has already been loaded with a widening operation; >> if it has, we can use a subreg of that." (only for int modes). >> So, it may be acceptable to do this in CSE (maybe still seems >> hacking). > > This maybe works for "DI to DF", because "mode converting > subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then > "y:DF=[sf:DI]" can be replaced by "y:DF=x:DI#0". > > While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper. > So, it may be doubtful for "convert DI to SF" in CSE. Considering the limitations of CSE, I try to find other places to handle this issue, and notice DSE can optimize below code: "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0". So, I drafted a patch to update DSE to handle DI->DF/SF. The patch updates "extract_low_bits" to get mode change with subreg. diff --git a/gcc/expmed.cc b/gcc/expmed.cc index b12b0e000c2..5e36331082c 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src) if (!targetm.modes_tieable_p (src_int_mode, src_mode)) return NULL_RTX; - if (!targetm.modes_tieable_p (int_mode, mode)) + if (!targetm.modes_tieable_p (int_mode, mode) + && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode)) + && GET_MODE_CLASS (mode) == MODE_FLOAT + && GET_MODE_CLASS (src_mode) == MODE_INT)) return NULL_RTX; src = gen_lowpart (src_int_mode, src); ----------- While I am aware that DSE is not good at cross basic-block. e.g. for code, it was not optimized. typedef struct SF {float a[4];int i1; int i2; } SF; int foo_si (SF a, int flag) { if (flag == 2) return a.i1 + a.i2; return 0; } So, we may back to previous ideas which we discussed in early of this thread. https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html BR, Jeff (Jiufu) > > Any comments or suggestions? > > > BR, > Jeff (Jiufu) > >> >> Thanks for so great comments! >> >> BR, >> Jeff (Jiufu) >> >>> >>> >>> Segher