Dear OpenMPI developers, Some people (Michele Martone Rome University Tor Vergata) found a bug present in openmpi (1.2.5 and 1.2.6) compiled with PGI (7.1-4 and 7.2).
This bug doesn¹t involve fabric interconnection (infiniband or GE or other) because is regard just only a simple memory allocation. You can reproduce the bug with this simple code: #include <stdio.h> #include <stdlib.h> int main( int argc, char *argv[]) { /* * memory allocations simulation for ~50M nonzeros: * nd=180 md=350 mdy=420 * * if this program crashes, there is a compiler problem */ printf("memory allocations simulation for ~50M nonzeros: nd=180 md=350 mdy=420\n"); printf("if this program crashes, there check your compiler/environment configuration\n"); printf("sizeof(int) %d\n",sizeof(int)); printf("sizeof(int*) %d\n",sizeof(int*)); printf("sizeof(size_t) %d\n",sizeof(size_t)); if( sizeof(size_t)<8 || sizeof(int*)<8 ) { printf("please compile this program for a 64 bit environment!\n"); return -1; } int * p; printf("allocation 1/4..\n"); p = calloc(47109185,16); if(!p)printf("..failed.\n"); printf("allocation 2/4..\n"); p = calloc(47109185,4); if(!p)printf("..failed.\n"); printf("allocation 3/4..\n"); p = calloc(47109185,4); if(!p)printf("..failed.\n"); printf("allocation 4/4..\n"); p = calloc(622947588,16); if(!p)printf("..failed.\n"); if(!p) return -1; printf("allocations test passed (no crash)\n"); return 0; } So we test: 1. the above code compiled with gcc4 and PGI (7.1-4 or 7.2) is ok 2. the above code compiled with openmpi (1.2.5 or 1.2.6) with gcc4 is ok 3. the above code compiled with openmpi (1.2.5 or 1.2.6) with PGI (7.1-4 or 7.2) the test doesn¹t pass (Segmentation fault) Some output of ldd: > > libmpi.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libmpi.so.0 > > (0x0000002a95558000) > > libopen-rte.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-rte.so.0 > > (0x0000002a957b2000) > > libopen-pal.so.0 => /opt/mpi/openmpi-1.2.5/pgi/lib/libopen-pal.so.0 > > (0x0000002a9599c000) > > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003d7b600000) > > librt.so.1 => /lib64/tls/librt.so.1 (0x0000003d80d00000) > > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a95b30000) > > libdl.so.2 => /lib64/libdl.so.2 (0x0000003d7bd00000) > > libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003d81500000) > > libutil.so.1 => /lib64/libutil.so.1 (0x0000002a95c35000) > > libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003d7c100000) > > libm.so.6 => /lib64/tls/libm.so.6 (0x0000003d7bb00000) > > libc.so.6 => /lib64/tls/libc.so.6 (0x0000003d7b800000) > > libpgc.so => > > /afs/efda-itm.eu/project/compilers/pgi/linux86-64/7.1-4/libso/libpgc.so > > (0x0000002a95d3a000) > > /lib64/ld-linux-x86-64.so.2 (0x0000003d7b400000 I think it is a bug to wrap the calloc function. greetings Dr. Francesco Iannone Associazione EURATOM-ENEA sulla Fusione C.R. ENEA Frascati Via E. Fermi 45 00044 Frascati (Roma) Italy phone 00-39-06-9400-5124 fax 00-39-06-9400-5524 mailto:francesco.iann...@frascati.enea.it http://www.afs.enea.it/iannone