[ Perhaps we need a somewhat larger audience for this one, as it isn't a
  gfortran specific issue (despite the COMMONs). ]

The reporter of this problem (perhaps it's necessary to open a bugzilla PR) uses:

It is GNU/linux on x86_64, fedora 10

kernel 2.6.27.12-170.2.5.fc10.x86_64
glibc-2.9-3.x86_64

--
Toon Moene - e-mail: t...@moene.org (*NEW*) - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.4/changes.html
--- Begin Message ---
Hello,

We have parallelized a relatively large f77 project (GEANT3, ~200k loc) using OpenMP.

Now we are running comparisons between standard and parallel version and it turns out that just making the commons threadprivate results in 20% percent speed penalty. This extra time is spent in __tls_get_addr() function which seems to be called for every access of a threadprivate variable.

Would it be in principle possible to optimize this access?

I figure that the base address of all referenced commons could be obtained once per function thus drastically reducing the __tls_get_addr() call count.

We are using gcc-4.3 branch from the beginning of February, with patches to allow equivalence statements among threadprivate data.

Callgrind output of a sample run is available at:

-O2     <https://mtadel.web.cern.ch/mtadel/callgrind.out.13032>
-O2 -g  <https://mtadel.web.cern.ch/mtadel/callgrind.out.13055>

Best,
Matevz


--- End Message ---

Reply via email to