Quoting Christian Bruel <christian.br...@st.com>:
Using the ieee-sf.S + this patch
OK
Is this only a proof-of-concept, because you only change the ne[sd]f2
implementation? And you go out of your way to only accept a restricted
set of values. Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.
AFAICT you need only one cycle penalty, in the check_nan path:
GLOBAL(nesf2):
/* If the raw values are unequal, the result is unequal, unless
both values are +-zero.
If the raw values are equal, the result is equal, unless
the values are NaN. */
cmp/eq r4,r5
mov.l LOCAL(inf2),r1
bt/s LOCAL(check_nan)
mov r4,r0
or r5,r0
rts
add r0,r0
LOCAL(check_nan):
add r0,r0
cmp/hi r1,r0
rts
movt r0
.balign 4
LOCAL(inf2):
.long 0xff000000
You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch
prediction mechanism.
Disclaimer: I've not tested this code.
For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?
If so, the DFmode code could become something like this:
GLOBAL(nedf2):
cmp/eq DBL0L,DBL1L
mov.l LOCAL(inf2),r1
bf LOCAL(ne)
cmp/eq DBL0H,DBL1H
bt/s LOCAL(check_nan)
mov DBL0H,r0
or DBL1H,r0
add r0,r0
rts
or DBL0L,r0
LOCAL(check_nan):
tst DBL0L,DBL0L
add r0,r0
subc r1,r0
mov #-1,r0
rts
negc r0,r0
LOCAL(ne):
rts
mov #1,r0
.balign 4
LOCAL(inf2):
.long 0xffe00000