https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814

--- Comment #4 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
what is also interesting:

template <typename T, typename CA, typename CB, typename CC>
bool matrix_multiply_dot( mdspan<T, CA>& A,   mdspan<T, CB>& B, mdspan<T, CC>&
C, bool on_gpu=false,bool default_device=true,int devicenum=0)


you can use the upload member functions of mdspan, if you replace the device
alloc call by 

   device_datastruct_alloc(dC,devicenum);

then, host_update will download the correct values from the gpu.


At first this then looks like an error in the function device_datastruct_alloc.
But for a matrix multiplication, you need three matrices: storage reserved for
the result, and 2 other matrices A and B. Since device_upload then surely
works, You can, of course, upload also the initialised values for the result
matrix C, which is initialized with 1 elements. But, unfortunately, this then
fails too... leaving me with the only conclusion that there is an error in the
compilation. If one does not download the values from the device, they arrives,
so it looks like the computation was done on the host. But the device number
supplied is that of the target..

All this leads me to suspect that there is an error in gcc. either in the
mapping, nvptx or openmp and the reading of the pragmas or whatever...

Reply via email to