https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68664

            Bug ID: 68664
           Summary: PowerPC: speculative sqrt in c-ray main loop causes
                    large slow down
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: anton at samba dot org
  Target Milestone: ---

c-ray (a tiny ray tracer) can be found at:

http://www.sgidepot.co.uk/depot/c-ray-1.1.tar.gz

When built and run with the following args:

gcc -mcpu=power8 -O3 -ffast-math -o c-ray-mt c-ray-mt.c -lpthread -lm
./c-ray-mt -t 1 -s 768x432 -r 8 -i sphfract -o sphfract.ppm

a speculative sqrt is scheduled inside the hottest loop:

    2.46 :        10001a44:   ld      r9,72(r9)
    0.00 :        10001a48:   cmpdi   cr7,r9,0
    0.00 :        10001a4c:   beq     cr7,10001be0 <shade+0x320>
    0.05 :        10001a50:   lfd     f8,0(r9)
    0.00 :        10001a54:   lfd     f9,8(r9)
    0.00 :        10001a58:   lfd     f0,16(r9)
    0.00 :        10001a5c:   lfd     f7,24(r9)
    2.74 :        10001a60:   fmadd   f2,f8,f8,f1
    0.05 :        10001a64:   fmul    f12,f9,f5
    0.00 :        10001a68:   fsub    f11,f5,f9
    0.00 :        10001a6c:   fsub    f6,f4,f8
    0.00 :        10001a70:   fsub    f10,f3,f0
    5.34 :        10001a74:   fmadd   f9,f9,f9,f2
    0.08 :        10001a78:   fnmadd  f8,f8,f4,f12
    0.00 :        10001a7c:   xsmuldp vs11,vs11,vs33
    0.00 :        10001a80:   fmadd   f9,f0,f0,f9
    9.99 :        10001a84:   fnmsub  f12,f0,f3,f8
    0.10 :        10001a88:   xsmaddadp vs11,vs32,vs6
    0.00 :        10001a8c:   fnmsub  f0,f7,f7,f9
    0.00 :        10001a90:   xsmaddadp vs11,vs45,vs10
   11.32 :        10001a94:   xsmaddadp vs0,vs12,vs43
    0.16 :        10001a98:   fadd    f12,f11,f11
    0.00 :        10001a9c:   xsmuldp vs0,vs0,vs44
    0.01 :        10001aa0:   fmsub   f0,f12,f12,f0
   64.34 :        10001aa4:   fcmpu   cr7,f0,f31
    0.97 :        10001aa8:   fsqrt   f11,f0                 <----- here I am
    0.00 :        10001aac:   blt     cr7,10001a44 <shade+0x184>

Building with -fno-sched-spec improves performance by almost 2x:

gcc -mcpu=power8 -O3 -ffast-math -fno-sched-spec -o c-ray-mt c-ray-mt.c
-lpthread -lm

Reply via email to