------- Comment #35 from rakdver at gcc dot gnu dot org 2005-11-17 15:09
-------
Created an attachment (id=10263)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10263&action=view)
Patch
After some playing with fold, I arrived to the following patch, that almost
works. With the patch, the code for the loop is
<L0>:;
MEM[base: ptr]{*ptr} = cleanse_ctr;
ptr = ptr + 1B;
cleanse_ctr = (unsigned char) (((signed char) ptr & 15) + (signed char)
cleanse_ctr + 17);
len = len - 1;
if (len != 0) goto <L0>; else goto <L2>;
Which seems just fine. The assembler is
.L3:
movb (%edi), %al
movb %al, (%ecx)
incl %ecx
movb %cl, %al
andl $15, %eax
movb (%edi), %dl
addl $17, %edx
addl %edx, %eax
movb %al, (%edi)
decl %esi
jne .L3
Which also seems OK to me. However, the "ugly" version we produce without the
patch:
.L4:
movb (%edi), %al
movb %al, (%ecx)
incl %ecx
movb -16(%ebp), %al
addl %esi, %eax
andl $15, %eax
movb (%edi), %dl
addl $17, %edx
addl %edx, %eax
movb %al, (%edi)
incl %esi
cmpl 12(%ebp), %esi
jne .L4
Is faster by 30%, from reasons I just don't understand :-(
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923