Hi, I have been exploring non-deterministic failures in cactusADM (when autopar is enabled with a low threshold)' on a Power7 multi core machine.
The failure actually reoccurs in several other spec2006 benchmarks when the threshold is lowered to allow for more loops to get parallelized. The scenario is that the program gets stuck, when one of the threads exits and the others remain waiting on a team barrier (futex_wait). I disabled autopar completely, and MANUALLY parallelized (using openmp pragmas) only one loop in cactusADM. I attached below the code before and after my changes. Running cactusADM with this modified loop produces the exact same problem. This makes me more confident that the problem is indeed with libgomp and not autopar, and is probably race condition. One of the threads somehow passes the two barriers that it is supposed to be stuck on, (the team barrier and the docking barrier) and exits while the other threads are waiting for its arrival on the team barrier. The barriers in libgomp are implemented using futex: static inline void futex_wait (int *addr, int val) { long err = sys_futex0 (addr, gomp_futex_wait, val); if (__builtin_expect (err == ENOSYS, 0)) { gomp_futex_wait &= ~FUTEX_PRIVATE_FLAG; gomp_futex_wake &= ~FUTEX_PRIVATE_FLAG; sys_futex0 (addr, gomp_futex_wait, val); } } sys_futex0 (int *addr, int op, int val) { register long int r0 __asm__ ("r0"); register long int r3 __asm__ ("r3"); register long int r4 __asm__ ("r4"); register long int r5 __asm__ ("r5"); register long int r6 __asm__ ("r6"); r0 = SYS_futex; r3 = (long) addr; r4 = op; r5 = val; r6 = 0; /* ??? The powerpc64 sysdep.h file clobbers ctr; the powerpc32 sysdep.h doesn't. It doesn't much matter for us. In the interest of unity, go ahead and clobber it always. */ __asm volatile ("sc; mfcr %0" : "=r"(r0), "=r"(r3), "=r"(r4), "=r"(r5), "=r"(r6) : "r"(r0), "r"(r3), "r"(r4), "r"(r5), "r"(r6) : "r7", "r8", "r9", "r10", "r11", "r12", "cr0", "ctr", "memory"); if (__builtin_expect (r0 & (1 << 28), 0)) return r3; return 0; } I've opened this PR: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50977 for this problem. These failures prevent changing autopar's cost model to allow for more parallelization to take place, which showed great performance potential. Therefore, any help/comments would be meaningful, Thanks, Razys
cactusADM.rtf
Description: RTF file