Hi!

Per <https://gcc.gnu.org/PR98321> we've found that GCC's nvptx back end
doesn't make use of the PTX 32-bit floating-point add instruction,
<https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-atom>,
as declared in 'gcc/config/nvptx/nvptx.md:atomic_fetch_addsf'.

Trying '__atomic_fetch_add (&a, b, __ATOMIC_RELAXED);' (also outside of
the nvptx target context), I found this works fine for integer types, but
emits "error: operand type ‘float *’ is incompatible with argument 1 of
‘__atomic_fetch_add’".  In 'gcc/doc/extend.texi' we read for
'__atomic_fetch_add' referring to '__atomic_add_fetch' that "The object
pointed to by the first argument must be of integer or pointer type",
which explains where the error diagnostic is coming from.  That's
'gcc/c-family/c-common.c:sync_resolve_size', and dates back to 2005
PR14311 "builtins for atomic operations needed" commit r98154 (Git
commit 48ae6c138ca30c4c5e876a0be47c9a0b5c8bf5c2).  (I haven't studied
that in detail, yet.)

But: why, what's the rationale?  Are potential floating point exceptions
the problem, maybe?  "Simply, because nobody bothered implementing it
(common hardware doesn't provide it)" certainly is fine as an answer -- I
just wanted to make sure I'm not missing some "obvious detail".

As far as that's relevant here, I do understand that floating-point
'a + (b + c)' may be different from '(a + b) + c' etc.; '-ffast-math'
doesn't seem to help here.  (Have not yet worked out how to adequately
model in GCC the PTX ISA stating that the "Current implementation of
'atom.add.f32' on global memory flushes subnormal inputs and results to
sign-preserving zero; whereas 'atom.add.f32' on shared memory supports
subnormal inputs and results and doesn't flush them to zero".)


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

Reply via email to