On Mon, Oct 24, 2016 at 06:14:48PM +0200, Eric Botcazou wrote: > > Maybe the best you can do is generate the double-width result, and then > > check if the upper halve is the sign extension of the lower halve. Maybe > > some trickery can help (for add/sub/neg at least). > > That's inefficient, even for additive operations.
It's better than the generic branch sequence below, or yours. It still sucks, obviously. Let's see. Completely untested. Inputs in regs 3 and 4, output in reg 3. 32-bit code all the way. add: eqv 9,3,4 add 3,3,4 xor 4,3,4 and. 4,9,4 blt <overflow> sub: xor 9,3,4 sub 3,3,4 eqv 4,3,4 and. 4,9,4 blt <overflow> neg: neg 3,3 xoris. 9,3,0x8000 beq <overflow> mul: mulhw 9,3,4 mullw 3,3,4 srawi 4,9,31 cmpw 4,9 bne <overflow> > > You can also just FAIL the expander if !TARGET_MCRXR. I wonder just how > > bad the generic code is. > > It is branchy. Here's a 32-bit overflow addition at -O2: > > cmpwi 7,4,0 > add 4,3,4 > blt- 7,.L4 > cmpw 7,4,3 > blt- 7,.L3 > .L5: > mr 3,4 > blr > .L4: > cmpw 7,4,3 > ble+ 7,.L5 > .L3: > <overflow> > > You can do it manually with just one branch: > > add 10,4,3 > srwi 4,4,31 > cmpw 7,10,3 > mfcr 9 > rlwinm 9,9,29,1 > cmpw 7,9,4 > bne- 7,.L5 > mr 3,10 > blr > .L5: > <overflow> > > and of course with -mmcrxr: > > addo 3,3,4 > mcrxr 7 > bgt- 7,.L10 > blr > L10: > <overflow> Or using mcrxr (or mtxer) and SO: mcrxr 0 # clear XER[SO], can use mtxer instead ... addo. 3,3,4 bso .L10 blr .L10: etc. (but keeping track of when your SO flag is clear is a pain, and if you have to reset it all the time there is no big win). Segher