Hello, > > > So far OK, but with ter, this becomes > > > > > > sum1 = 0; > > > sum2 = 0; > > > for (i = 0; i < n; i+=4) > > > { > > > x_1 = a[i]; > > > y_1 = b[i]; > > > x_2 = a[i+1]; > > > y_2 = b[i+1]; > > > x_3 = a[i+2]; > > > y_3 = b[i+2]; > > > x_4 = a[i+3]; > > > y_4 = b[i+3]; > > > sum1 += x_1 * y_1 + x_2 * y_2 + x_3 * y_3 + x_4 * y_4; > > > sum2 += x_1 / y_1 + x_2 / y_2 + x_3 / y_3 + x_4 / y_4; > > > } > > > > > > Now we need some 11 registers for the loop, instead of the original 5 > > > (and the number of registers grows with the unroll factor). > > > > The TER hack we settled on for PR17549 was supposed to prevent this kind > > of thing, but it was already obvious at the time that a better fix is > > needed in the general case. You've find a pretty nasty one here. > > Why didn't it trigger? I can't reproduce it by a bit of simple hacking > around, have you got a little testcase and options to turn on to produce > this?
-O1 suffices. The (sum? + 1) is needed to workaround the hack introduced to fix PR17549 (and it is very close to what happens in sixtrack, except that there the operation with the accumulated variable is a bit more complicated). Zdenek int a[200], b[200]; void xxx(void) { int i, sum1 = 0, sum2 = 0, x, y; for (i = 0; i < 200; i+=8) { x = a[i]; y = b[i]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+1]; y = b[i+1]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+2]; y = b[i+2]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+3]; y = b[i+3]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+4]; y = b[i+4]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+5]; y = b[i+5]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+6]; y = b[i+6]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; x = a[i+7]; y = b[i+7]; sum1 = (sum1 + 1) + x * y; sum2 = (sum2 + 1) + x / y; } bla (sum1, sum2); }