http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-13 
13:04:28 UTC ---

Created attachment 28674

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28674

gcc48-pr54073.patch



On x86_64-linux on SandyBridge CPU with -O3 -march=corei7-avx I've tracked it

down to the 

http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171341

change, in particular emit_conditional_move part of the changes.

Before the change emit_conditional_move would completely ignore the predicate

on the comparison operand (operands[1]), starting with r171341 it honors it.

And the movsicc's ordered_comparison_operator would give up on the UNLT

comparison in the MonteCarlo testcase, while ix86_expand_int_movcc expands it

just fine and at least on that loop it is beneficial to use

        vucomisd        %xmm0, %xmm1

        cmovae  %eax, %ebp

instead of:

.L4:

        addl    $1, %ebx

...

        vucomisd        %xmm0, %xmm2

        jb      .L4



The attached proof of concept patch attempts to just restore the 4.6 and

earlier behavior by allowing in all comparisons.  Of course perhaps it might be

possible it needs better tuning than that, I meant it just as a start for

discussions.



vanilla trunk:



**                                                              **

** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **

** for details. (Results can be submitted to p...@nist.gov)     **

**                                                              **

Using       2.00 seconds min time per kenel.

Composite Score:         1886.79

FFT             Mflops:  1726.97    (N=1024)

SOR             Mflops:  1239.20    (100 x 100)

MonteCarlo:     Mflops:   374.13

Sparse matmult  Mflops:  1956.30    (N=1000, nz=5000)

LU              Mflops:  4137.37    (M=100, N=100)



patched trunk:



**                                                              **

** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **

** for details. (Results can be submitted to p...@nist.gov)     **

**                                                              **

Using       2.00 seconds min time per kenel.

Composite Score:         1910.08

FFT             Mflops:  1726.97    (N=1024)

SOR             Mflops:  1239.20    (100 x 100)

MonteCarlo:     Mflops:   528.94

Sparse matmult  Mflops:  1949.03    (N=1000, nz=5000)

LU              Mflops:  4106.27    (M=100, N=100)

Reply via email to