Joern Rennecke wrote:
Quoting Christian Bruel <christian.br...@st.com>:
Using the ieee-sf.S + this patch
OK
Is this only a proof-of-concept, because you only change the ne[sd]f2
implementation?
I changed also the unordered comparison patterns. (cmpunsf_i1,
cmpundf_i1). But yes, the other functions that would need the same kind
of check would be unordsf2, and all the comparisons (gtsf2, gesf2f...)
for floats and doubles.
But I will only consider those after/if we all agree that this needs to
be done instead of keeping the current QNaN only restrictions.
And you go out of your way to only accept a restricted
set of values.
This hold for the original optimized implementation as well, for example
I don't think that 0x7f800001 was caught. In fact implementing correctly
the isnan check without restricted set of value makes the original
discussion pointless, since the Q/S bits are a subpart of all possible
codings, with any fractional part != 0.
Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.
>
AFAICT you need only one cycle penalty, in the check_nan path:
GLOBAL(nesf2):
/* If the raw values are unequal, the result is unequal, unless
both values are +-zero.
If the raw values are equal, the result is equal, unless
the values are NaN. */
cmp/eq r4,r5
mov.l LOCAL(inf2),r1
bt/s LOCAL(check_nan)
mov r4,r0
or r5,r0
rts
add r0,r0
LOCAL(check_nan):
add r0,r0
cmp/hi r1,r0
rts
movt r0
.balign 4
LOCAL(inf2):
.long 0xff000000
You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch
prediction mechanism.
Thanks a lot of this one, It should fix the original problem on the
restricted set of values as well. The cmpund patterns fix should
probably have a similar checks.
Disclaimer: I've not tested this code.
For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?
If so, the DFmode code could become something like this:
GLOBAL(nedf2):
cmp/eq DBL0L,DBL1L
mov.l LOCAL(inf2),r1
bf LOCAL(ne)
cmp/eq DBL0H,DBL1H
bt/s LOCAL(check_nan)
mov DBL0H,r0
or DBL1H,r0
add r0,r0
rts
or DBL0L,r0
LOCAL(check_nan):
tst DBL0L,DBL0L
add r0,r0
subc r1,r0
mov #-1,r0
rts
negc r0,r0
LOCAL(ne):
rts
mov #1,r0
.balign 4
LOCAL(inf2):
.long 0xffe00000
> For an actual patch, you need to use the SL* macros from
> config/sh/lib1funcs.h because the SH1 does not have delayed branches.
OK, thanks