https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
I compared __builtin_memcpy one size at a time. Here are results in
cycles:
clang 1 bytes: 17193410146
gcc 1 bytes: 15440244966
clang 2 bytes: 8997535880
gcc 2 bytes: 8147449530
clang 3 bytes: 6002276628
gcc 3 bytes: 5430387704
clang 4 bytes: 4497121282
gcc 4 bytes: 4069604454
clang 5 bytes: 3644879742
gcc 5 bytes: 3258094970
clang 6 bytes: 3045612708
gcc 6 bytes: 2728410608
clang 7 bytes: 2574110178
gcc 7 bytes: 2330365680
clang 8 bytes: 969894432
gcc 8 bytes: 6436950208
GCC is faster except for 8 byte size.