------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-25 11:32 ------- Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
> ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25 > 10:15 ------- > Re. comment #25, as far as I can tell there are registers available in > that loop. To quote the loop from comment #12: > > .L4: > movb (%esi), %al > movb %al, (%edx) > leal (%ecx,%edi), %eax > andl $15, %eax > incl %ecx > addb (%esi), %al > incl %edx > addl $17, %eax > cmpl %ecx, 12(%ebp) > movb %al, (%esi) > jne .L4 > > Checking off used registers in this loop: > %esi x > %edi x > %eax x > %ebx > %ecx x > %edx x > > So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is > also free, right?). Maybe the allocator thinks %ebx can't be used > because it is the PIC register. yes, ebx cannot be used because of pic, and -fomit-frame-pointer is off by default. > Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)") > gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"): > > .L4: > movzbl (%esi), %eax > movb %al, (%ecx) > incl %ecx > movzbl -13(%ebp), %eax > movzbl (%esi), %edx > incb -13(%ebp) > andb $15, %al > addb $17, %dl > addb %dl, %al > cmpl %edi, %ecx > movb %al, (%esi) > jne .L4 > > The .optimized tree dump looks like this: > > <bb 0>: > len.23 = len - 1; > if (len.23 != 4294967295) goto <L6>; else goto <L2>; > And the first two lines are > also just weird, it is probably cheaper on almost any machine to do > len.23 = len; > if (len.23 != 0) goto <L6>; else goto <L2>; > > <L6>: > len.23 = len.23 - 1; > (etc...) Not really. On i686, there should be no difference. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923