https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66623
Bug ID: 66623 Summary: Unsafe FP math reduction used in strict math mode Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: david.sherwood at arm dot com Target Milestone: --- Created attachment 35825 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35825&action=edit Unsafe FP math reduction example I've found a bug with reductions for Neon whereby we change the ordering of FP computation in strict math mode. The example looks like this: float foo (float *__restrict__ i) { float l = 0; for (int a = 0; a < 4; a++) for (int b = 0; b < 4; b++) l += i[b]; return l; } when compiled with the flags -O2 -ftree-vectorize -fno-inline -march=armv8-a we generate the asm: movi v0.4s, 0 mov x1, x0 mov w0, 0 .L2: ldr s1, [x1, w0, sxtw 2] add w0, w0, 1 cmp w0, 4 dup v1.4s, v1.s[0] fadd v0.4s, v0.4s, v1.4s bne .L2 faddp v0.4s, v0.4s, v0.4s faddp v0.4s, v0.4s, v0.4s which is (i[0] + i[1] + ...) + (i[0] + i[1] + ...) + ... We know that in general "(a + b) + (a + b)" is not guaranteed to be the same as "((a + b) + a) + b".