This report was prompted by a mail on the lkml which was suggesting to hand-craft memset: http://lkml.org/lkml/2007/8/17/309 . So I wondered if the code generated for __builtin_memset was any good, and could be used instead of hand-crafted code. I tested with (Debian) GCC 3.4.6, 4.1.3, 4.2.1, and also with a snapshot of GCC 4.3. All the results are similar, so I will only show them for GCC 4.2 on x86-64. Compilation was done with -O3.
First, the __builtin_memset code: void fill1(char *s, int a) { __builtin_memset(s, a, 15); } GCC generates: 0: 40 0f b6 c6 movzbl %sil,%eax 4: 48 ba 01 01 01 01 01 mov $0x101010101010101,%rdx b: 01 01 01 e: 40 0f b6 ce movzbl %sil,%ecx 12: 48 0f af c2 imul %rdx,%rax 16: 40 88 77 0e mov %sil,0xe(%rdi) 1a: 48 89 07 mov %rax,(%rdi) 1d: 40 0f b6 c6 movzbl %sil,%eax 21: 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax 27: 89 47 08 mov %eax,0x8(%rdi) 2a: 89 c8 mov %ecx,%eax 2c: c1 e0 08 shl $0x8,%eax 2f: 01 c8 add %ecx,%eax 31: 66 89 47 0c mov %ax,0xc(%rdi) 35: c3 retq Notice that GCC first computes %sil * (01)^8 and puts it into %rax, then it computes %sil * (01)^4 and puts it into %eax (where it already was, due to the previous multiplication), then it computes %sil * (01)^2 and puts it into %ax (where it already was, again). Second, some code where multiplication results are reused: void fill2(char *s, int a) { unsigned long long int v = (unsigned char)a * 0x0101010101010101ull; *(unsigned long long int *)s = v; *(unsigned *)(s + 8) = v; *(unsigned short *)(s + 12) = v; *(s + 15) = v; } GCC generates: 0: 40 0f b6 f6 movzbl %sil,%esi 4: 48 b8 01 01 01 01 01 mov $0x101010101010101,%rax b: 01 01 01 e: 48 0f af f0 imul %rax,%rsi 12: 48 89 37 mov %rsi,(%rdi) 15: 89 77 08 mov %esi,0x8(%rdi) 18: 66 89 77 0c mov %si,0xc(%rdi) 1c: 40 88 77 0f mov %sil,0xf(%rdi) 20: c3 retq The function is 21 bytes smaller (-40%), it does not require two additional registers (c and d), and it will not be slower. The same issue arises on x86_32. The hand-written code (with 32bit integers this time) is 14 bytes smaller for memset(,,15). -- Summary: Redundant multiplications for memset Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guillaume dot melquiond at ens-lyon dot fr GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33103