Thanks for bringing this to our attention. Brian just committed a fix on the trunk (https://svn.open-mpi.org/trac/ompi/changeset/27371). We'll let that soak for a day or three and then bring it over to v1.6 and v1.7.
On Sep 20, 2012, at 8:25 AM, <tmish...@jcity.maeda.co.jp> <tmish...@jcity.maeda.co.jp> wrote: > > Hello, I found a problem in openmpi's ptmalloc2. The problem is that > TSD (thread specific data) does not work properly and it may cause > peformance loss and segfault. In my case, heavy memory allocating > applications sometimes make segfault. > > Please see opal/mca/memory/linux/sysdeps/pthread/malloc-machine.h. > > When USE_TSD_DATA_HACK is defined, which is default of openmpi, the > hacked TSD is used as shown below. > > #if defined(__sgi) || defined(USE_TSD_DATA_HACK) > typedef void *tsd_key_t[256]; > #define tsd_key_create(key, destr) do { \ > int i; \ > for(i=0; i<256; i++) (*key)[i] = 0; \ > } while(0) > #define tsd_setspecific(key, data) \ > (key[(unsigned)pthread_self() % 256] = (data)) > #define tsd_getspecific(key, vptr) \ > (vptr = key[(unsigned)pthread_self() % 256]) > > On the other hand, thread ID(=pthread_self()) generated by pthread is > not a continuous number, at least in my environment. > > An example of threads created by t-test1 included in ptmalloc2: > [mishima@manage ptmalloc2]$ ./t-test1 4 4 > Using posix threads. > total=4 threads=4 i_max=10000 size=10000 bins=200 > Created thread 41cb4940. > Created thread 41eb5940. > Created thread 420b6940. > Created thread 422b7940. > > Since the interval of ID number is much larger than 256, each thread may > share key-array address. Most of [pthread_self() % 256] is 64 as shown > above, which means that the hacked TSD does not function at all. > > I think -DUSE_TSD_DATA_HACK=1 should be removed from openmpi's > configuration. As far as I checked, when I use pthread's TSD by > "#undef USE_DATA_HACK", the problem goes away. > > One more request is PGI compiler issue. PGI compiler does not have > pre-defined macro __GNUC__. Therefore, PGI does not use fast inline > mutex_lock wrriten in malloc-machine.h. Please consider to add 4 lines > arround the head of malloc.c. > > --- opal/mca/memory/linux/malloc.c.org 2012-08-30 16:15:19.000000000 +0900 > +++ opal/mca/memory/linux/malloc.c 2012-08-31 07:57:16.000000000 +0900 > @@ -43,6 +43,11 @@ > #define MORECORE opal_memory_linux_free_ptmalloc2_sbrk > #define munmap(a,b) opal_memory_linux_free_ptmalloc2_munmap(a,b,1) > > +/* For PGI compiler to activate inline mutex_lock */ > +#if defined(__PGI) > +#define __GNUC__ 1 > +#endif > + > /* make some non-GCC compilers happy */ > #ifndef __GNUC__ > #define __const const > > P.S. > Since GNU and Intel compiler uses inline mutex_lock, mutex initialization > is very fast and the hacked TSD problem does not cause segfault. Only > the perfomance loss could be induced. The reason is a very long story, > please let it omitted today. > > Best regards, > Tetsuya Mishima > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/