On Thu, Mar 12, 2009 at 7:03 AM, Paolo Bonzini <bonz...@gnu.org> wrote:
> Toon Moene wrote:
>> Paolo Bonzini wrote:
>>
>>>> Attached you'll find the (preprocessed) source of the routine that
>>>> printed the Infinity's (of course, I cannot be completely certain that
>>>> it actually resulted in the wrong code, but at least it might be studied
>>>> to see if it helps to find the culprit).
>>>
>>> No, this function is sane (the peephole *is* called a lot by this
>>> function, but all is in due order).  I looked at the dumps and assembly
>>> for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected.
>>
>> Yeah, it was probably too much to hope for.
>
> No, you were right, and that's great.  -ffast-math makes a difference,
> because it enables more vectorization.
>
> It goes as this:
>
> (insn 494 493 495 44 statin.f:703 (set (reg:SF 371)
>        (vec_select:SF (reg:V4SF 367)
>            (parallel [
>                    (const_int 0 [0x0])
>                ]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD
> (reg:V4SF 367)
>        (nil)))
>
> registers 371 and 367 are coalesced into xmm0.  Then the vec_select is
> split to just
>
> (set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367]))
>
> and these are indeed !=, but they have the same hard register number so
> the peephole should not apply in this case.  Here is a minimized testcase:
>
> subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp)
> integer :: x,y
> real pstratr(x,y),pconvecr(x,y),zhxy(x,y)
> real ztmp(4)
> do j = 1,y
>  do i = 1,x-2
>   zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j)
>   ztstratr   = ztstratr   + pstratr(i,j)
>   ztconvecr  = ztconvecr  + pconvecr(i,j)
>   ztsenf     = ztsenf     + zhxy(i,j)
>   ztlatf     = ztlatf     + zhxy(i,j)
>   ztcldtop   = ztcldtop   + zhxy(i,j)
>  enddo
> enddo
> ztmp(1)=zttotrainr
> ztmp(2)=ztstratr
> ztmp(3)=ztconvecr
> ztmp(4)=ztsenf*ztlatf*ztcldtop
> end
>
> The following patch should fix it, you're welcome to run it through
> HIRLAM.  I'm bootstrapping it in the meanwhile.
>
> Index: gcc/config/i386/i386.md
> ===================================================================
> --- gcc/config/i386/i386.md     (revision 144464)
> +++ gcc/config/i386/i386.md     (working copy)
> @@ -20795,7 +20795,7 @@
>                      [(match_dup 0)
>                       (match_operand:SI 2 "memory_operand" "")]))
>               (clobber (reg:CC FLAGS_REG))])]
> -  "operands[0] != operands[1]
> +  "!rtx_equal_p (operands[0], operands[1])
>    && GENERAL_REGNO_P (REGNO (operands[0]))
>    && GENERAL_REGNO_P (REGNO (operands[1]))"
>   [(set (match_dup 0) (match_dup 4))
> @@ -20811,7 +20811,7 @@
>                    (match_operator 3 "commutative_operator"
>                      [(match_dup 0)
>                       (match_operand 2 "memory_operand" "")]))]
> -  "operands[0] != operands[1]
> +  "!rtx_equal_p (operands[0], operands[1])
>    && ((MMX_REG_P (operands[0]) && MMX_REG_P (operands[1]))
>        || (SSE_REG_P (operands[0]) && SSE_REG_P (operands[1])))"
>   [(set (match_dup 0) (match_dup 2))
>

Will "REGNO (operands[0]) == REGNO (operands[1])" work here?


-- 
H.J.

Reply via email to