------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni
dot cz 2005-06-25 11:32 -------
Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0
than 3.3
> ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25
> 10:15 -------
> Re. comment #25, as far as I can tell there are registers available in
> that loop. To quote the loop from comment #12:
>
> .L4:
> movb (%esi), %al
> movb %al, (%edx)
> leal (%ecx,%edi), %eax
> andl $15, %eax
> incl %ecx
> addb (%esi), %al
> incl %edx
> addl $17, %eax
> cmpl %ecx, 12(%ebp)
> movb %al, (%esi)
> jne .L4
>
> Checking off used registers in this loop:
> %esi x
> %edi x
> %eax x
> %ebx
> %ecx x
> %edx x
>
> So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is
> also free, right?). Maybe the allocator thinks %ebx can't be used
> because it is the PIC register.
yes, ebx cannot be used because of pic, and -fomit-frame-pointer is off
by default.
> Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)")
> gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"):
>
> .L4:
> movzbl (%esi), %eax
> movb %al, (%ecx)
> incl %ecx
> movzbl -13(%ebp), %eax
> movzbl (%esi), %edx
> incb -13(%ebp)
> andb $15, %al
> addb $17, %dl
> addb %dl, %al
> cmpl %edi, %ecx
> movb %al, (%esi)
> jne .L4
>
> The .optimized tree dump looks like this:
>
> <bb 0>:
> len.23 = len - 1;
> if (len.23 != 4294967295) goto <L6>; else goto <L2>;
> And the first two lines are
> also just weird, it is probably cheaper on almost any machine to do
> len.23 = len;
> if (len.23 != 0) goto <L6>; else goto <L2>;
>
> <L6>:
> len.23 = len.23 - 1;
> (etc...)
Not really. On i686, there should be no difference.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923