https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104345
--- Comment #9 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- OK! Putting your "nvptx: Add support for 64-bit mul.hi (and other) instructions" on top, that considerably changes (simplifies!) the generated '__muldc3' PTX code; the regression disappears. :-) (I have, so far, only manually tested 'libgomp.oacc-c-c++-common/reduction-cplx-dbl.c'. I'll report later in the unlikely case that any other/new issues should appear.) (And, will later test your "nvptx: Tweak constraints on copysign instructions" on top, too.)