https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120865

--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Options were:

-O1 -fopenmp -foffload=nvptx-none  -fno-stack-protector  -Wall


Note that without -O i get the following:

(i.e. without optimization, the program terminates ordinarily...)


Ordinary matrix multiplication, on gpu
1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16 

0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 

80 90 100 110 
176 202 228 254 
272 314 356 398 
368 426 484 542 

A Cholesky decomposition with the multiplication on gpu
4 12 -16 
12 37 -43 
-16 -43 98 

2 0 0 
6 1 0 
-8 5 3 

Now the cholesky decomposition is entirely done on gpu
2 0 0 
6 1 0 
-8 5 3 

Now we do the same with the lu decomposition
1 -2 -2 -3 
3 -9 0 -9 
-1 2 4 7 
-3 -6 26 2 

Just the multiplication on gpu
1 0 0 0 
3 1 0 0 
-1 -0 1 0 
-3 4 -2 1 

1 -2 -2 -3 
0 -3 6 0 
0 0 2 4 
0 0 0 1 

Entirely on gpu
1 0 0 0 
3 1 0 0 
-1 -0 1 0 
-3 4 -2 1 

1 -2 -2 -3 
0 -3 6 0 
0 0 2 4 
0 0 0 1 

Now we do the same with the qr decomposition
12 -51 4 
6 167 -68 
-4 24 -41 

Just the multiplication on gpu
0.857143 -0.394286 -0.331429 
0.428571 0.902857 0.0342857 
-0.285714 0.171429 -0.942857 

14 21 -14 
-2.22045e-16 175 -70 
-3.10862e-15 -4.79616e-14 35 

Entirely on gpu
0.857143 -0.394286 -0.626059 
0.428571 0.902857 -0.127334 
-0.285714 0.171429 -0.769309 

14 21 -14 
-2.22045e-16 175 -70 
-5.19947 -7.7992 37.6962 


Process returned 0 (0x0)   execution time : 0.829 s
Press ENTER to continue.

Reply via email to