https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591
--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Hongtao Liu from comment #12) > short a; > short c; > short d; > void > foo (short b, short f) > { > c = b + a; > d = f + a; > } > > foo(short, short): > addw a(%rip), %di > addw a(%rip), %si > movw %di, c(%rip) > movw %si, d(%rip) > ret > > this one is bad since gcc10.1 and there's no subreg, The problem is if the > operand is used by more than 1 insn, and they all support separate m > constraint, mem_cost is quite small(just 1, reg move cost is 2), and this > makes RA more inclined to propagate memory across insns. I guess RA assumes > the separate m means the insn only support memory_operand? I don't see this as problematic. IIRC, there was a discussion in the past that a couple (two?) memory accesses from the same location close to each other can be faster (so, -O2, not -Os) than preloading the value to the register first. In contrast, the example from the Comment #11 already has the correct value in %eax, so there is no need to reload it again from memory, even in a narrower mode.