https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814
--- Comment #4 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- what is also interesting: template <typename T, typename CA, typename CB, typename CC> bool matrix_multiply_dot( mdspan<T, CA>& A, mdspan<T, CB>& B, mdspan<T, CC>& C, bool on_gpu=false,bool default_device=true,int devicenum=0) you can use the upload member functions of mdspan, if you replace the device alloc call by device_datastruct_alloc(dC,devicenum); then, host_update will download the correct values from the gpu. At first this then looks like an error in the function device_datastruct_alloc. But for a matrix multiplication, you need three matrices: storage reserved for the result, and 2 other matrices A and B. Since device_upload then surely works, You can, of course, upload also the initialised values for the result matrix C, which is initialized with 1 elements. But, unfortunately, this then fails too... leaving me with the only conclusion that there is an error in the compilation. If one does not download the values from the device, they arrives, so it looks like the computation was done on the host. But the device number supplied is that of the target.. All this leads me to suspect that there is an error in gcc. either in the mapping, nvptx or openmp and the reading of the pragmas or whatever...