On Mon, Oct 24, 2016 at 06:14:48PM +0200, Eric Botcazou wrote:
> > Maybe the best you can do is generate the double-width result, and then
> > check if the upper halve is the sign extension of the lower halve.  Maybe
> > some trickery can help (for add/sub/neg at least).
> 
> That's inefficient, even for additive operations.

It's better than the generic branch sequence below, or yours.  It still
sucks, obviously.

Let's see.  Completely untested.  Inputs in regs 3 and 4, output in reg 3.
32-bit code all the way.

add:
        eqv 9,3,4
        add 3,3,4
        xor 4,3,4
        and. 4,9,4
        blt <overflow>

sub:
        xor 9,3,4
        sub 3,3,4
        eqv 4,3,4
        and. 4,9,4
        blt <overflow>

neg:
        neg 3,3
        xoris. 9,3,0x8000
        beq <overflow>

mul:
        mulhw 9,3,4
        mullw 3,3,4
        srawi 4,9,31
        cmpw 4,9
        bne <overflow>

> > You can also just FAIL the expander if !TARGET_MCRXR.  I wonder just how
> > bad the generic code is.
> 
> It is branchy.  Here's a 32-bit overflow addition at -O2:
> 
>       cmpwi 7,4,0
>       add 4,3,4
>       blt- 7,.L4
>       cmpw 7,4,3
>       blt- 7,.L3
> .L5:
>       mr 3,4
>       blr
> .L4:
>       cmpw 7,4,3
>       ble+ 7,.L5
> .L3:
>       <overflow>
> 
> You can do it manually with just one branch:
> 
>       add 10,4,3
>       srwi 4,4,31
>       cmpw 7,10,3
>       mfcr 9
>       rlwinm 9,9,29,1
>       cmpw 7,9,4
>       bne- 7,.L5
>       mr 3,10
>       blr
> .L5:
>       <overflow>
> 
> and of course with -mmcrxr:
> 
>       addo 3,3,4
>       mcrxr 7
>       bgt- 7,.L10
>       blr
> L10:
>       <overflow>

Or using mcrxr (or mtxer) and SO:

        mcrxr 0 # clear XER[SO], can use mtxer instead
        ...
        addo. 3,3,4
        bso .L10
        blr
.L10:
        etc.

(but keeping track of when your SO flag is clear is a pain, and if you
have to reset it all the time there is no big win).


Segher

Reply via email to