https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70784
--- Comment #4 from ktkachov at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #3) > Also note that finding the heuristics when to use this (for -Os it is of > course clearer) is hard, if the pointer is sufficiently aligned or if the > target is strict alignment it is of course easier. And the RTL DSE patch > caused some SPEC regressions on powerpc*. Thanks for pointing me to PR22141. I was hoping to structure this in such a way as to pass down the merged byte sequence to a target hook together with information about the original store+setup sequence and have the target emit the most profitable sequence using any target-specific knowledge and tricks it may have (like prefer two movls isntead of movabsq in your example above). We could provide some helper functions for extracting immediates from the byte vector (like dse_decode_int in your patch or a wrapper around builtin_strncpy_read_str from builtins.c). The default implementation of the hook would be a conservative sequence that emits a sequence of stores up to word_mode in width (avoiding unaligned stores for STRICT_ALIGN or SLOW_UNALIGNED_ACCESS targets), perhaps reusing some store_by_pieces infrastructure. For the heuristics question I've found tracking setup instructions as well as stores to work well. That is, from the original store sequence keep track not only of the store insns but also the instructions that move immediates into the registers to be stored. And the new sequence would be rejected if the total number of stores is not smaller, or if the total store+setup cost is higher than the old store+setup cost. Of course, that would depend on the backend estimating the cost of synthesising immediates, which the aarch64 target (at least) does. I think a gimple implementation of this would suffer from such a drawback, in that it would not know whether the new wider immediates are actually profitable unless it consulted RTX costs, which we wouldn't want at gimple level.