https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #24 from Mateusz Guzik <mjguzik at gmail dot com> --- I got the thing compiled against top of git. with this as a testcase: void zero(char *buf) { __builtin_memset(buf, 0, SIZE); } compiled like so: ./xgcc -O2 -DSIZE=128 -mno-sse -c ~/zero.c && objdump --disassemble=zero zero.o The compiler emits completely unrolled stores for sizes up to 128, which raises an eye-brow but is perhaps fine. However, for 129 and higher I see the code going back to rep, which is not the expected state. The expected behavior is unrolled loops, 32 bytes per iteration, of up to 256.