On Thu, Mar 12, 2009 at 7:03 AM, Paolo Bonzini <bonz...@gnu.org> wrote: > Toon Moene wrote: >> Paolo Bonzini wrote: >> >>>> Attached you'll find the (preprocessed) source of the routine that >>>> printed the Infinity's (of course, I cannot be completely certain that >>>> it actually resulted in the wrong code, but at least it might be studied >>>> to see if it helps to find the culprit). >>> >>> No, this function is sane (the peephole *is* called a lot by this >>> function, but all is in due order). I looked at the dumps and assembly >>> for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected. >> >> Yeah, it was probably too much to hope for. > > No, you were right, and that's great. -ffast-math makes a difference, > because it enables more vectorization. > > It goes as this: > > (insn 494 493 495 44 statin.f:703 (set (reg:SF 371) > (vec_select:SF (reg:V4SF 367) > (parallel [ > (const_int 0 [0x0]) > ]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD > (reg:V4SF 367) > (nil))) > > registers 371 and 367 are coalesced into xmm0. Then the vec_select is > split to just > > (set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367])) > > and these are indeed !=, but they have the same hard register number so > the peephole should not apply in this case. Here is a minimized testcase: > > subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp) > integer :: x,y > real pstratr(x,y),pconvecr(x,y),zhxy(x,y) > real ztmp(4) > do j = 1,y > do i = 1,x-2 > zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j) > ztstratr = ztstratr + pstratr(i,j) > ztconvecr = ztconvecr + pconvecr(i,j) > ztsenf = ztsenf + zhxy(i,j) > ztlatf = ztlatf + zhxy(i,j) > ztcldtop = ztcldtop + zhxy(i,j) > enddo > enddo > ztmp(1)=zttotrainr > ztmp(2)=ztstratr > ztmp(3)=ztconvecr > ztmp(4)=ztsenf*ztlatf*ztcldtop > end > > The following patch should fix it, you're welcome to run it through > HIRLAM. I'm bootstrapping it in the meanwhile. > > Index: gcc/config/i386/i386.md > =================================================================== > --- gcc/config/i386/i386.md (revision 144464) > +++ gcc/config/i386/i386.md (working copy) > @@ -20795,7 +20795,7 @@ > [(match_dup 0) > (match_operand:SI 2 "memory_operand" "")])) > (clobber (reg:CC FLAGS_REG))])] > - "operands[0] != operands[1] > + "!rtx_equal_p (operands[0], operands[1]) > && GENERAL_REGNO_P (REGNO (operands[0])) > && GENERAL_REGNO_P (REGNO (operands[1]))" > [(set (match_dup 0) (match_dup 4)) > @@ -20811,7 +20811,7 @@ > (match_operator 3 "commutative_operator" > [(match_dup 0) > (match_operand 2 "memory_operand" "")]))] > - "operands[0] != operands[1] > + "!rtx_equal_p (operands[0], operands[1]) > && ((MMX_REG_P (operands[0]) && MMX_REG_P (operands[1])) > || (SSE_REG_P (operands[0]) && SSE_REG_P (operands[1])))" > [(set (match_dup 0) (match_dup 2)) >
Will "REGNO (operands[0]) == REGNO (operands[1])" work here? -- H.J.