[ Perhaps we need a somewhat larger audience for this one, as it isn't a
gfortran specific issue (despite the COMMONs). ]
The reporter of this problem (perhaps it's necessary to open a bugzilla
PR) uses:
It is GNU/linux on x86_64, fedora 10
kernel 2.6.27.12-170.2.5.fc10.x86_64
glibc-2.9-3.x86_64
--
Toon Moene - e-mail: t...@moene.org (*NEW*) - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.4/changes.html
--- Begin Message ---
Hello,
We have parallelized a relatively large f77 project (GEANT3, ~200k loc) using
OpenMP.
Now we are running comparisons between standard and parallel version and it
turns out that just making the commons threadprivate results in 20% percent
speed penalty. This extra time is spent in __tls_get_addr() function which seems
to be called for every access of a threadprivate variable.
Would it be in principle possible to optimize this access?
I figure that the base address of all referenced commons could be obtained once
per function thus drastically reducing the __tls_get_addr() call count.
We are using gcc-4.3 branch from the beginning of February, with patches to
allow equivalence statements among threadprivate data.
Callgrind output of a sample run is available at:
-O2 <https://mtadel.web.cern.ch/mtadel/callgrind.out.13032>
-O2 -g <https://mtadel.web.cern.ch/mtadel/callgrind.out.13055>
Best,
Matevz
--- End Message ---