https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68664
Bug ID: 68664 Summary: PowerPC: speculative sqrt in c-ray main loop causes large slow down Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: anton at samba dot org Target Milestone: --- c-ray (a tiny ray tracer) can be found at: http://www.sgidepot.co.uk/depot/c-ray-1.1.tar.gz When built and run with the following args: gcc -mcpu=power8 -O3 -ffast-math -o c-ray-mt c-ray-mt.c -lpthread -lm ./c-ray-mt -t 1 -s 768x432 -r 8 -i sphfract -o sphfract.ppm a speculative sqrt is scheduled inside the hottest loop: 2.46 : 10001a44: ld r9,72(r9) 0.00 : 10001a48: cmpdi cr7,r9,0 0.00 : 10001a4c: beq cr7,10001be0 <shade+0x320> 0.05 : 10001a50: lfd f8,0(r9) 0.00 : 10001a54: lfd f9,8(r9) 0.00 : 10001a58: lfd f0,16(r9) 0.00 : 10001a5c: lfd f7,24(r9) 2.74 : 10001a60: fmadd f2,f8,f8,f1 0.05 : 10001a64: fmul f12,f9,f5 0.00 : 10001a68: fsub f11,f5,f9 0.00 : 10001a6c: fsub f6,f4,f8 0.00 : 10001a70: fsub f10,f3,f0 5.34 : 10001a74: fmadd f9,f9,f9,f2 0.08 : 10001a78: fnmadd f8,f8,f4,f12 0.00 : 10001a7c: xsmuldp vs11,vs11,vs33 0.00 : 10001a80: fmadd f9,f0,f0,f9 9.99 : 10001a84: fnmsub f12,f0,f3,f8 0.10 : 10001a88: xsmaddadp vs11,vs32,vs6 0.00 : 10001a8c: fnmsub f0,f7,f7,f9 0.00 : 10001a90: xsmaddadp vs11,vs45,vs10 11.32 : 10001a94: xsmaddadp vs0,vs12,vs43 0.16 : 10001a98: fadd f12,f11,f11 0.00 : 10001a9c: xsmuldp vs0,vs0,vs44 0.01 : 10001aa0: fmsub f0,f12,f12,f0 64.34 : 10001aa4: fcmpu cr7,f0,f31 0.97 : 10001aa8: fsqrt f11,f0 <----- here I am 0.00 : 10001aac: blt cr7,10001a44 <shade+0x184> Building with -fno-sched-spec improves performance by almost 2x: gcc -mcpu=power8 -O3 -ffast-math -fno-sched-spec -o c-ray-mt c-ray-mt.c -lpthread -lm