------- Comment #35 from rakdver at gcc dot gnu dot org  2005-11-17 15:09 
-------
Created an attachment (id=10263)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10263&action=view)
Patch

After some playing with fold, I arrived to the following patch, that almost
works.  With the patch, the code for the loop is

<L0>:;
  MEM[base: ptr]{*ptr} = cleanse_ctr;
  ptr = ptr + 1B;
  cleanse_ctr = (unsigned char) (((signed char) ptr & 15) + (signed char)
cleanse_ctr + 17);
  len = len - 1;
  if (len != 0) goto <L0>; else goto <L2>;

Which seems just fine.  The assembler is

.L3:
        movb    (%edi), %al
        movb    %al, (%ecx)
        incl    %ecx
        movb    %cl, %al
        andl    $15, %eax
        movb    (%edi), %dl
        addl    $17, %edx
        addl    %edx, %eax
        movb    %al, (%edi)
        decl    %esi
        jne     .L3

Which also seems OK to me.  However, the "ugly" version we produce without the
patch:

.L4:
        movb    (%edi), %al
        movb    %al, (%ecx)
        incl    %ecx
        movb    -16(%ebp), %al
        addl    %esi, %eax
        andl    $15, %eax
        movb    (%edi), %dl
        addl    $17, %edx
        addl    %edx, %eax
        movb    %al, (%edi)
        incl    %esi
        cmpl    12(%ebp), %esi
        jne     .L4

Is faster by 30%, from reasons I just don't understand :-(


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923

Reply via email to