Re: SH optimized software floating point routines

Joern Rennecke Thu, 22 Jul 2010 09:14:46 -0700

Quoting Christian Bruel <christian.br...@st.com>:

Using the ieee-sf.S + this patch
OK

Is this only a proof-of-concept, because you only change the ne[sd]f2implementation? And you go out of your way to only accept a restricted

set of values.  Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.

AFAICT you need only one cycle penalty, in the check_nan path:

GLOBAL(nesf2):
        /* If the raw values are unequal, the result is unequal, unless
           both values are +-zero.
           If the raw values are equal, the result is equal, unless
           the values are NaN.  */
        cmp/eq  r4,r5
        mov.l   LOCAL(inf2),r1
        bt/s     LOCAL(check_nan)
        mov     r4,r0
        or      r5,r0
        rts
        add     r0,r0
LOCAL(check_nan):
        add     r0,r0
        cmp/hi  r1,r0
        rts
        movt    r0
        .balign 4
LOCAL(inf2):
        .long 0xff000000

You could even save four bytes by putting the check_nan label into the

delay slot, but I'm not sure if that'll discomfit any branchprediction mechanism.


Disclaimer: I've not tested this code.

For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?

If so, the DFmode code could become something like this:

GLOBAL(nedf2):
        cmp/eq  DBL0L,DBL1L
        mov.l   LOCAL(inf2),r1
        bf LOCAL(ne)
        cmp/eq  DBL0H,DBL1H
        bt/s    LOCAL(check_nan)
        mov     DBL0H,r0
        or      DBL1H,r0

        add     r0,r0
        rts
        or      DBL0L,r0
LOCAL(check_nan):
        tst     DBL0L,DBL0L
        add     r0,r0
        subc    r1,r0
        mov     #-1,r0
        rts
        negc    r0,r0
LOCAL(ne):
        rts
        mov #1,r0
        .balign 4
LOCAL(inf2):
        .long 0xffe00000

Re: SH optimized software floating point routines

Reply via email to