[Bug rtl-optimization/70784] Merge multiple short stores of immediates into wider stores

ktkachov at gcc dot gnu.org Mon, 25 Apr 2016 03:19:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70784


--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #3)
> Also note that finding the heuristics when to use this (for -Os it is of
> course clearer) is hard, if the pointer is sufficiently aligned or if the
> target is strict alignment it is of course easier.  And the RTL DSE patch
> caused some SPEC regressions on powerpc*.

Thanks for pointing me to PR22141.

I was hoping to structure this in such a way as to pass down the merged byte
sequence to a target hook together with information about the original
store+setup sequence and have the target emit the most profitable sequence
using any target-specific knowledge and tricks it may have (like prefer two
movls isntead of movabsq in your example above). We could provide some helper
functions for extracting immediates from the byte vector (like dse_decode_int
in your patch or a wrapper around builtin_strncpy_read_str from builtins.c).

The default implementation of the hook would be a conservative sequence that
emits a sequence of stores up to word_mode in width (avoiding unaligned stores
for STRICT_ALIGN or SLOW_UNALIGNED_ACCESS targets), perhaps reusing some
store_by_pieces infrastructure.

For the heuristics question I've found tracking setup instructions as well as
stores to work well. That is, from the original store sequence keep track not
only of the store insns but also the instructions that move immediates into the
registers to be stored. And the new sequence would be rejected if the total
number of stores is not smaller, or if the total store+setup cost is higher
than the old store+setup cost. Of course, that would depend on the backend
estimating the cost of synthesising immediates, which the aarch64 target (at
least) does.

I think a gimple implementation of this would suffer from such a drawback, in
that it would not know whether the new wider immediates are actually profitable
unless it consulted RTX costs, which we wouldn't want at gimple level.

[Bug rtl-optimization/70784] Merge multiple short stores of immediates into wider stores

Reply via email to