On 18/05/16 01:51, Joseph Myers wrote:
On Tue, 17 May 2016, Matthew Wahab wrote:

In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.

But why do you need to force that?  If the instructions follow IEEE
semantics including for exceptions and rounding modes, then X OP Y
computed directly with binary16 arithmetic has the same value as results
from promoting to binary32, doing binary32 arithmetic and converting back
to binary16, for OP in + - * /.  (Double-rounding problems can only occur
in round-to-nearest and if the binary32 result is exactly half way between
two representable binary16 values but the exact result is not exactly half
way between.  It's obvious that this can't occur to + - * and only a bit
harder to see this for /.  According to the logic used in
convert.c:convert_to_real_1, double rounding can't occur in this case for
square root either, though I haven't verified that.)

AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like flush-to-zero that could affect the outcome of a calculation.

So I'd expect e.g.

__fp16 a, b;
__fp16 c = a / b;

to generate the new instructions, because direct binary16 arithmetic is a
correct implementation of (__fp16) ((float) a / (float) b).

Something like

__fp16 a, b, c;
__fp16 d = (a / b) * c;

would be done as the sequence of single precision operations:

vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
vcvtb.f32.f16 s2, s2
vdiv.f32 s15, s0, s1
vmul.f32 s0, s15, s2
vcvtb.f16.f32 s0, s0

Doing this with vdiv.f16 and vmul.f16 could change the calculated result because the flush-to-zero rule is related to operation precision so affects the value of a vdiv.f16 differently from the vdiv.f32.

(At least, that's my understanding.)

Matthew

Reply via email to