http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW CC| |hubicka at gcc dot gnu.org --- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-20 11:48:13 UTC --- Ok. A rep movsb; is as slow as a memcpy call (-mstringop-strategy=rep_byte -minline-all-stringops). -minline-all-stringops itself is nearly as fast as -fno-tree-loop-distribute-patterns. To answer my own question, BC is between zero and 7. But I really wonder why the rep movsb is slower than the explicit byte-copy loop ... We do seem to seriously hose the CFG though - with PGO we get a nice loop nest CFG and the speed of before the patch - even when it uses a memcpy call.