https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87869
--- Comment #4 from Nick Bowler <nbowler at draconx dot ca> --- (In reply to Richard Biener from comment #3) > I think a better target for optimizing would be the RTL side, [...] > I'm sure arc can store to a register address as well. Yes, if the shortest possible store encoding were used on ARC instead of the longest possible encoding, then the unrolled loop would not be nearly as painful, e.g., 00000000 <do_stuff>: 0: 40c3 f000 0000 mov_s r0,0xf0000000 6: 732c mov_s r1,3 8: a020 st_s r1,[r0,0] a: a021 st_s r1,[r0,0x4] c: a022 st_s r1,[r0,0x8] e: a023 st_s r1,[r0,0xc] 10: a024 st_s r1,[r0,0x10] 12: a025 st_s r1,[r0,0x14] 14: a026 st_s r1,[r0,0x18] 16: a027 st_s r1,[r0,0x1c] 18: a028 st_s r1,[r0,0x20] 1a: a029 st_s r1,[r0,0x24] 1c: a02a st_s r1,[r0,0x28] 1e: 7ee0 j_s [blink]