http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56199
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution| |INVALID --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-04 09:39:31 UTC --- It should be faster if the string is not in the cache. Which of course it is for your testcase (because you have an artificial loop here). So the benchmark does not show that the transform is bad but instead it shows that if repeatedly initializing sth from the same (large) constants then it's profitable to use a smaller instruction encoding. But of course that's again likely only true when 'cpy' is not inlined - in which case we cannot distinguish the cases. Highly suspicious testing method in the end.