On Nov 9, 2013, at 02:48, Ondřej Bílka <nel...@seznam.cz> wrote: >> I've done the overflow checking in Gigi (Ada front end). Benchmarking >> real world large Ada programs (where every integer operation is checked, >> including array index computations etc.), I found the performance cost >> *very* small (less than 1% on typical code, never more than 2%). There >> is a bit more cost in code size, but that is mostly due to the fact that >> we want to generate error messages with correct line number information >> without relying on backtraces. >> > Overhead is mostly from additonal branches that are not taken. We need > more accurate measure of cache effects than code size, for example > looking to increased number icache hits which will not count code that > is never executed.
Indeed, and execution time testing shows there isn't a significant increase in icache pressure. >> [...] >> A few things helped to make the cost small: the biggest one is that >> typically on of the operands is known to be negative or positive. >> Gigi will use Ada type information, and Natural or Positive integer >> variables are very common. So, if you compute X + C with C positive, >> you can write the conditional expression as: > > On x64 efect of this analysis is small, processor does overflow detection for > you. The problem as always is that of pass ordering. If we would encapsulate the overflow check some kind of builtin that we'd directly very late in the compilation process to processor-specific instructions, then early optimizers cannot do their work. > By just expanding out to regular additions, comparisons and control > flow we can avoid this problem. > >> (if X < Integer'Last - C then X + C else raise Constraint_Error) >> > >> On my x86-64 this generates something like: >> 00 cmpl $0x7fffffff,%de >> 06 je 0x0000000c >> 08 leal 0x01(%rdi),%eax >> 0b ret >> [..] > > This has redundant compare instruction that cost a cycle and 6 bytes. > You can just write > > 0: 83 c7 01 add $0x1,%edi > 3: 71 03 jno 0x8 > [...] > When you know that one operand is positive or you deal with unsigned > then you could replace jno with jnc which is bit faster on sandy bridge > processors and later as add, jnc pair is macro-fused but add jno is not. Indeed that is ideal, assuming condition flags are dead, but I think that the right way to do that is by late combine-like optimizations after instruction selection. In looking at generated code in actual programs, most checks are optimized away. This is more important than saving a cycle here or there in the much smaller number of checks that remain. After all, we all "know" that our programs will never fail any overflow checks, so it is just a matter of the compiler to be smart enough to prove this. While it won't achieve that goal (halting problem etc), for a large fraction localized analysis is sufficient to prove checks cannot fail. I'm afraid that premature optimization will always be a loss here. Probably combine, in combination with some machine-specific instruction patterns, could be taught to do some of the optimizations you mention, and that would have as advantage that they'd be also applicable to manual tests people write. -Geert