------- Comment #35 from rakdver at gcc dot gnu dot org 2005-11-17 15:09 ------- Created an attachment (id=10263) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10263&action=view) Patch
After some playing with fold, I arrived to the following patch, that almost works. With the patch, the code for the loop is <L0>:; MEM[base: ptr]{*ptr} = cleanse_ctr; ptr = ptr + 1B; cleanse_ctr = (unsigned char) (((signed char) ptr & 15) + (signed char) cleanse_ctr + 17); len = len - 1; if (len != 0) goto <L0>; else goto <L2>; Which seems just fine. The assembler is .L3: movb (%edi), %al movb %al, (%ecx) incl %ecx movb %cl, %al andl $15, %eax movb (%edi), %dl addl $17, %edx addl %edx, %eax movb %al, (%edi) decl %esi jne .L3 Which also seems OK to me. However, the "ugly" version we produce without the patch: .L4: movb (%edi), %al movb %al, (%ecx) incl %ecx movb -16(%ebp), %al addl %esi, %eax andl $15, %eax movb (%edi), %dl addl $17, %edx addl %edx, %eax movb %al, (%edi) incl %esi cmpl 12(%ebp), %esi jne .L4 Is faster by 30%, from reasons I just don't understand :-( -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923