On Mon, Feb 16, 2009 at 11:19 AM, Narasimha Datta <datt...@yahoo.com> wrote: > Hello, > > Here's a simple memory copy macro: > > #define MYMEMCOPY(dp, sp, len) \ > do { \ > long __len = len; \ > while (--__len >= 0) \ > (dp)[__len] = (sp)[__len]; \ > } while (0) > > void foo(unsigned char *dp, const unsigned char *sp, unsigned long size) { > MYMEMCOPY(dp, sp, size); > } > > void bar(unsigned char *dp, const unsigned char *sp) { > MYMEMCOPY(dp, sp, 128); > } > > The code fragments generated for the foo and bar functions with -O and -O2 > optimizations respectively is as follows: > > /* ===== With -O switch ===== */ > /* function foo */ > .L4: > movzbl -1(%rcx), %eax > movb %al, -1(%rdx) > subq $1, %rcx > subq $1, %rdx > subq $1, %r8 > jns .L4 > > /* function bar */ > movl $126, %edx > .L8: > .LBB3: > .loc 1 13 0 > movzbl 1(%rdx,%rsi), %eax > movb %al, 1(%rdx,%rdi) > subq $1, %rdx > cmpq $-2, %rdx > jne .L8 > > /* ===== With -O2 switch =====*/ > /* function foo */ > .L4: > movzbl -1(%rsi), %eax > addq $1, %rdi > subq $1, %rsi > movb %al, -1(%rcx) > subq $1, %rcx > cmpq %rdx, %rdi > jne .L4 > > /* function bar */ > movl $126, %edx > .L9: > .LBB3: > .loc 1 13 0 > movzbl 1(%rdx,%rsi), %eax > movb %al, 1(%rdx,%rdi) > subq $1, %rdx > cmpq $-2, %rdx > jne .L9 > > Now my questions are: > (i) Why does the compiler generate an addq, cmpq and jne for the foo function > with -O2? Isn't subq/jns more efficient, as seen from the output from -O? > (ii) For function bar, why is the "cmpq $-2, %rdx" instruction generated? > Won't it be better to count down from 128 to 0 instead of 126 to -2? > > Here's my OS and compiler version (I'm running a 64-bit FreeBSD): > $ uname -a > FreeBSD xxx 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Wed Nov 12 18:54:21 PST 2008 > r...@wc7:/usr/obj/usr/src/sys/SMKERNEL amd64 > $ cc --version > cc (GCC) 4.2.1 20070719 [FreeBSD] > Copyright (C) 2007 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > And these are the commands I used to compile the program: > cc -S -O -g test.c > cc -S -O2 -g test.c > > Any pointers would be appreciated. Thanks!
1) Try a more recent GCC 2) Use memcpy. It is properly inlined/optimized. Richard.