Re: SH optimized software floating point routines

Christian Bruel Fri, 23 Jul 2010 02:32:24 -0700

Joern Rennecke wrote:

Quoting Christian Bruel <christian.br...@st.com>:
Using the ieee-sf.S + this patch
OK
Is this only a proof-of-concept, because you only change the ne[sd]f2implementation?

I changed also the unordered comparison patterns. (cmpunsf_i1,cmpundf_i1). But yes, the other functions that would need the same kindof check would be unordsf2, and all the comparisons (gtsf2, gesf2f...)for floats and doubles.But I will only consider those after/if we all agree that this needs tobe done instead of keeping the current QNaN only restrictions.


And you go out of your way to only accept a restricted

set of values.

This hold for the original optimized implementation as well, for exampleI don't think that 0x7f800001 was caught. In fact implementing correctlythe isnan check without restricted set of value makes the originaldiscussion pointless, since the Q/S bits are a subpart of all possiblecodings, with any fractional part != 0.


Plus, the overuse of the arithmetic unit hurts SH4-100 /

SH4-200 instruction pairing.

AFAICT you need only one cycle penalty, in the check_nan path:

GLOBAL(nesf2):
         /* If the raw values are unequal, the result is unequal, unless
            both values are +-zero.
            If the raw values are equal, the result is equal, unless
            the values are NaN.  */
         cmp/eq  r4,r5
         mov.l   LOCAL(inf2),r1
         bt/s     LOCAL(check_nan)
         mov     r4,r0
         or      r5,r0
         rts
         add     r0,r0
LOCAL(check_nan):
         add     r0,r0
         cmp/hi  r1,r0
         rts
         movt    r0
         .balign 4
LOCAL(inf2):
         .long 0xff000000

You could even save four bytes by putting the check_nan label into the

delay slot, but I'm not sure if that'll discomfit any branchprediction mechanism.

Thanks a lot of this one, It should fix the original problem on therestricted set of values as well. The cmpund patterns fix shouldprobably have a similar checks.


Disclaimer: I've not tested this code.

For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?

If so, the DFmode code could become something like this:

GLOBAL(nedf2):
         cmp/eq  DBL0L,DBL1L
         mov.l   LOCAL(inf2),r1
         bf LOCAL(ne)
         cmp/eq  DBL0H,DBL1H
         bt/s    LOCAL(check_nan)
         mov     DBL0H,r0
         or      DBL1H,r0

         add     r0,r0
         rts
         or      DBL0L,r0
LOCAL(check_nan):
         tst     DBL0L,DBL0L
         add     r0,r0
         subc    r1,r0
         mov     #-1,r0
         rts
         negc    r0,r0
LOCAL(ne):
         rts
         mov #1,r0
         .balign 4
LOCAL(inf2):
         .long 0xffe00000


> For an actual patch, you need to use the SL* macros from
> config/sh/lib1funcs.h because the SH1 does not have delayed branches.

OK, thanks

Re: SH optimized software floating point routines

Reply via email to