Thanks for bringing this to our attention.

Brian just committed a fix on the trunk 
(https://svn.open-mpi.org/trac/ompi/changeset/27371).  We'll let that soak for 
a day or three and then bring it over to v1.6 and v1.7.


On Sep 20, 2012, at 8:25 AM, <tmish...@jcity.maeda.co.jp> 
<tmish...@jcity.maeda.co.jp> wrote:

> 
> Hello, I found a problem in openmpi's ptmalloc2. The problem is that
> TSD (thread specific data) does not work properly and it may cause
> peformance loss and segfault.  In my case, heavy memory allocating
> applications sometimes make segfault.
> 
> Please see opal/mca/memory/linux/sysdeps/pthread/malloc-machine.h.
> 
> When USE_TSD_DATA_HACK is defined, which is default of openmpi, the
> hacked TSD is used as shown below.
> 
>  #if defined(__sgi) || defined(USE_TSD_DATA_HACK)
>  typedef void *tsd_key_t[256];
>  #define tsd_key_create(key, destr) do { \
>    int i; \
>    for(i=0; i<256; i++) (*key)[i] = 0; \
>  } while(0)
>  #define tsd_setspecific(key, data) \
>   (key[(unsigned)pthread_self() % 256] = (data))
>  #define tsd_getspecific(key, vptr) \
>   (vptr = key[(unsigned)pthread_self() % 256])
> 
> On the other hand, thread ID(=pthread_self()) generated by pthread is
> not a continuous number, at least in my environment.
> 
>  An example of threads created by t-test1 included in ptmalloc2:
>  [mishima@manage ptmalloc2]$ ./t-test1 4 4
>  Using posix threads.
>  total=4 threads=4 i_max=10000 size=10000 bins=200
>  Created thread 41cb4940.
>  Created thread 41eb5940.
>  Created thread 420b6940.
>  Created thread 422b7940.
> 
> Since the interval of ID number is much larger than 256, each thread may
> share key-array address. Most of [pthread_self() % 256] is 64 as shown
> above, which means that the hacked TSD does not function at all.
> 
> I think -DUSE_TSD_DATA_HACK=1 should be removed from openmpi's
> configuration. As far as I checked, when I use pthread's TSD by
> "#undef USE_DATA_HACK", the problem goes away.
> 
> One more request is PGI compiler issue. PGI compiler does not have
> pre-defined macro __GNUC__. Therefore, PGI does not use fast inline
> mutex_lock wrriten in malloc-machine.h. Please consider to add 4 lines
> arround the head of malloc.c.
> 
> --- opal/mca/memory/linux/malloc.c.org  2012-08-30 16:15:19.000000000 +0900
> +++ opal/mca/memory/linux/malloc.c      2012-08-31 07:57:16.000000000 +0900
> @@ -43,6 +43,11 @@
> #define MORECORE opal_memory_linux_free_ptmalloc2_sbrk
> #define munmap(a,b) opal_memory_linux_free_ptmalloc2_munmap(a,b,1)
> 
> +/* For PGI compiler to activate inline mutex_lock */
> +#if defined(__PGI)
> +#define __GNUC__ 1
> +#endif
> +
> /* make some non-GCC compilers happy */
> #ifndef __GNUC__
> #define __const const
> 
> P.S.
> Since GNU and Intel compiler uses inline mutex_lock, mutex initialization
> is very fast and the hacked TSD problem does not cause segfault. Only
> the perfomance loss could be induced. The reason is a very long story,
> please let it omitted today.
> 
> Best regards,
> Tetsuya Mishima
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to