https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82629
--- Comment #4 from Thorsten Kurth ---
Hello Richard,
Was the test case received?
Best Regards
Thorsten Kurth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82629
--- Comment #3 from Thorsten Kurth ---
One more thing,
In the test case I send, please change the $(XPPFLAGS) in the main.x target
compilation to $(CXXFLAGS), so that -fopenmp is used at link time also.
However, that does not solve the problem b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82629
--- Comment #2 from Thorsten Kurth ---
Created attachment 42420
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42420&action=edit
This is the test case demonstrating the problem.
Linking this code will produce:
-bash-4.2$ make main.x
g++ -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81896
--- Comment #2 from Thorsten Kurth ---
Hello,
another data point:
when I create a dummy variable, it works: for example alias data to tmp and
then use tmp. I think this is not working for the same reason one cannot
arbitrarily put class member v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #28 from Thorsten Kurth ---
Hello,
can someone please give me an update on this bug?
Best Regards
Thorsten Kurth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81896
--- Comment #1 from Thorsten Kurth ---
Hello,
is this report actually being worked on? It is in unconfirmed state for quite a
while now.
Best Regards
Thorsten Kurth
++
Assignee: unassigned at gcc dot gnu.org
Reporter: thorstenkurth at me dot com
Target Milestone: ---
Dear Sir/Madam,
I run into linking issues with gcc (GCC) 7.1.1 20170718 and OpenMP 4.5 target
offloading. I am compiling a mixed fortran/C++ code where target regions can be
in
++
Assignee: unassigned at gcc dot gnu.org
Reporter: thorstenkurth at me dot com
Target Milestone: ---
Created attachment 42005
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42005&action=edit
small test case
Dear Sir/Madam,
I am not sure if my report got posted the fir
++
Assignee: unassigned at gcc dot gnu.org
Reporter: thorstenkurth at me dot com
Target Milestone: ---
Created attachment 41990
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41990&action=edit
Test case
Dear Sir/Madam,
g++ 7.1.1 cannot compile correct OpenMP 4.5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #27 from Thorsten Kurth ---
Hello Jakub,
I wanted to follow up on this. Is there any progress on this issue?
Best Regards
Thorsten Kurth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #26 from Thorsten Kurth ---
Hello Jakub,
thanks for the clarification. So a team maps to a CTA which is somewhat
equivalent to a block in CUDA language, correct? And it is good to have some
categorical equivalency between GPU and CPU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #24 from Thorsten Kurth ---
Hello Jakub,
I know that the section you mean is racey and gets the wrong number of threads
is not right but I put this in in order to see if I get the correct numbers on
a CPU (I am not working on a GPU y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #22 from Thorsten Kurth ---
Hello Jakub,
that is stuff for Intel vTune. I have commented it out and added the NUM_TEAMS
defines in the GNUmakefile. Please pull the latest changes.
Best and thanks
Thorsten
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #20 from Thorsten Kurth ---
To compile the code, edit the GNUmakefile to suit your needs (feel free to ask
any questions) and in order to run it run the generated executable, called
something like
main3d.XXX...
and the XXX tell
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #19 from Thorsten Kurth ---
Thanks you very much. I am sorry that I do not have a simpler test case. The
kernel which is executed is in the same directory as ABecLaplacian and called
MG_3D_cpp.cpp.
We have seen similar problems with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #17 from Thorsten Kurth ---
the result though is correct, I verified that both codes generate the correct
output.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #16 from Thorsten Kurth ---
FYI, the code is:
https://github.com/zronaghi/BoxLib.git
in branch
cpp_kernels_openmp4dot5
and then in Src/LinearSolvers/C_CellMG
the file ABecLaplacian.cpp. For example, lines 542 and 543 can be comme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #15 from Thorsten Kurth ---
The code I care about definitely has optimization enabled. For the fortran
stuff it does (for example):
ftn -g -O3 -ffree-line-length-none -fno-range-check -fno-second-underscore
-Jo/3d.gnu.MPI.OMP.EXE -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #13 from Thorsten Kurth ---
Hello Jakub,
the compiler options are just -fopenmp. I am sure it does not have to do
anything with vectorization as I compare the code runtime with and without the
target directives and thus vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #11 from Thorsten Kurth ---
Hello Jakub,
yes, you are right. I thought that map(tofrom:) is the default mapping but
I might be wrong. In any case, teams is always 1. So this code is basically
just data streaming so there is no n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #9 from Thorsten Kurth ---
Sorry, in the second run I set the number of threads to 12. I think the code
works as expected.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #8 from Thorsten Kurth ---
Here is the output of the get_num_threads section:
[tkurth@cori02 omp_3_vs_45_test]$ export OMP_NUM_THREADS=32
[tkurth@cori02 omp_3_vs_45_test]$ ./nested_test_omp_4dot5.x
We got 1 teams and 32 threads.
and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #7 from Thorsten Kurth ---
Hello Jakub,
thanks for your comment but I think the parallel for is not racey. Every thread
is working a block of i-indices so that is fine. The dotprod kernel is actually
a kernel from the OpenMP standard
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #5 from Thorsten Kurth ---
To clarify the problem:
I think that the additional movq, pushq and other instructions generated when
using the target directive can cause a big hit on the performance. I understand
that these instructions a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #4 from Thorsten Kurth ---
Created attachment 41415
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41415&action=edit
Testcase
This is the test case. The files ending on .as contain the assembly code with
and without target regi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #3 from Thorsten Kurth ---
Created attachment 41414
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41414&action=edit
OpenMP 4.5 Testcase
This is the source code
++
Assignee: unassigned at gcc dot gnu.org
Reporter: thorstenkurth at me dot com
Target Milestone: ---
Dear Sir/Madam,
I am working on the Cori HPC system, a Cray XC-40 with intel Xeon Phi 7250. I
probably found a performance "bug" when using the OpenMP 4.5 target
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: thorstenkurth at me dot com
Created attachment 32071
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32071&action=edit
Archive which includes test ca
28 matches
Mail list logo