https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591
--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #13) > (In reply to Hongtao Liu from comment #12) > > short a; > > short c; > > short d; > > void > > foo (short b, short f) > > { > > c = b + a; > > d = f + a; > > } > > > > foo(short, short): > > addw a(%rip), %di > > addw a(%rip), %si > > movw %di, c(%rip) > > movw %si, d(%rip) > > ret > > > > this one is bad since gcc10.1 and there's no subreg, The problem is if the > > operand is used by more than 1 insn, and they all support separate m > > constraint, mem_cost is quite small(just 1, reg move cost is 2), and this > > makes RA more inclined to propagate memory across insns. I guess RA assumes > > the separate m means the insn only support memory_operand? > > I don't see this as problematic. IIRC, there was a discussion in the past > that a couple (two?) memory accesses from the same location close to each > other can be faster (so, -O2, not -Os) than preloading the value to the > register first. Someone just filed a similar issue to the above testcase (the one in comment #12) as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114688 :).