Hello, I found a problem in openmpi's ptmalloc2. The problem is that
TSD (thread specific data) does not work properly and it may cause
peformance loss and segfault.  In my case, heavy memory allocating
applications sometimes make segfault.

Please see opal/mca/memory/linux/sysdeps/pthread/malloc-machine.h.

When USE_TSD_DATA_HACK is defined, which is default of openmpi, the
hacked TSD is used as shown below.

  #if defined(__sgi) || defined(USE_TSD_DATA_HACK)
  typedef void *tsd_key_t[256];
  #define tsd_key_create(key, destr) do { \
    int i; \
    for(i=0; i<256; i++) (*key)[i] = 0; \
  } while(0)
  #define tsd_setspecific(key, data) \
   (key[(unsigned)pthread_self() % 256] = (data))
  #define tsd_getspecific(key, vptr) \
   (vptr = key[(unsigned)pthread_self() % 256])

On the other hand, thread ID(=pthread_self()) generated by pthread is
not a continuous number, at least in my environment.

  An example of threads created by t-test1 included in ptmalloc2:
  [mishima@manage ptmalloc2]$ ./t-test1 4 4
  Using posix threads.
  total=4 threads=4 i_max=10000 size=10000 bins=200
  Created thread 41cb4940.
  Created thread 41eb5940.
  Created thread 420b6940.
  Created thread 422b7940.

Since the interval of ID number is much larger than 256, each thread may
share key-array address. Most of [pthread_self() % 256] is 64 as shown
above, which means that the hacked TSD does not function at all.

I think -DUSE_TSD_DATA_HACK=1 should be removed from openmpi's
configuration. As far as I checked, when I use pthread's TSD by
"#undef USE_DATA_HACK", the problem goes away.

One more request is PGI compiler issue. PGI compiler does not have
pre-defined macro __GNUC__. Therefore, PGI does not use fast inline
mutex_lock wrriten in malloc-machine.h. Please consider to add 4 lines
arround the head of malloc.c.

--- opal/mca/memory/linux/malloc.c.org  2012-08-30 16:15:19.000000000 +0900
+++ opal/mca/memory/linux/malloc.c      2012-08-31 07:57:16.000000000 +0900
@@ -43,6 +43,11 @@
 #define MORECORE opal_memory_linux_free_ptmalloc2_sbrk
 #define munmap(a,b) opal_memory_linux_free_ptmalloc2_munmap(a,b,1)

+/* For PGI compiler to activate inline mutex_lock */
+#if defined(__PGI)
+#define __GNUC__ 1
+#endif
+
 /* make some non-GCC compilers happy */
 #ifndef __GNUC__
 #define __const const

P.S.
Since GNU and Intel compiler uses inline mutex_lock, mutex initialization
is very fast and the hacked TSD problem does not cause segfault. Only
the perfomance loss could be induced. The reason is a very long story,
please let it omitted today.

Best regards,
Tetsuya Mishima

Reply via email to