http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |WAITING Last reconfirmed| |2012-06-20 Component|c |tree-optimization CC| |rguenth at gcc dot gnu.org Ever Confirmed|0 |1 Target Milestone|--- |4.8.0 --- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-20 09:27:52 UTC --- You mean the fix lead to recognition of memcpy? At least I see memcpy calls in the bad assembly. There is always a cost consideration for memcpy - does performance recover with -minline-all-stringops? I suppose BC is actually very small? The testcase does not include a runtime part so I can't check myself. Definitely a byte-wise copy loop as in the .good assembly variant, .L5: - .loc 1 14 0 is_stmt 1 discriminator 2 - movzbl 16(%esp,%eax), %edx - movb %dl, (%esi,%eax) - leal 1(%eax), %eax -.LVL5: - cmpl %ebx, %eax - jl .L5 does not look good - even a rep movb should be faster, no?