What interconnect are you using? Infiniband? Use "--without-memory-manager" option while building ompi in order to disable ptmalloc.
Regards --Nysal On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere < nicolas.deladerri...@gmail.com> wrote: > Yes, I'am using 24G machine on 64 bit Linux OS. > If I compile without wrapper, I did not get any problems. > > It seems that when I am linking with openmpi, my program use a kind of > openmpi implemented malloc. Is it possible to switch it off in order ot only > use malloc from libc ? > > Nicolas > > 2010/8/8 Terry Frankcombe <te...@chem.gu.se> > > You're trying to do a 6GB allocate. Can your underlying system handle >> that? IF you compile without the wrapper, does it work? >> >> I see your executable is using the OMPI memory stuff. IIRC there are >> switches to turn that off. >> >> >> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote: >> > Hello, >> > >> > I'am having an sigsegv error when using simple program compiled and >> > link with openmpi. >> > I have reproduce the problem using really simple fortran code. It >> > actually does not even use MPI, but just link with mpi shared >> > libraries. (problem does not appear when I do not link with mpi >> > libraries) >> > % cat allocate.F90 >> > program test >> > implicit none >> > integer, dimension(:), allocatable :: z >> > integer(kind=8) :: l >> > >> > write(*,*) "l ?" >> > read(*,*) l >> > >> > ALLOCATE(z(l)) >> > z(1) = 111 >> > z(l) = 222 >> > DEALLOCATE(z) >> > >> > end program test >> > >> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the >> > compilation : >> > >> > % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate >> > allocate.F90 >> > gfortran -g -o testallocate allocate.F90 >> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread >> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib >> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90 >> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl >> > -lutil -lm -ldl -pthread >> > >> > When I am running that test with different length, I sometimes get a >> > "Segmentation fault" error. Here are two examples using two specific >> > values, but error happens for many other values of length (I did not >> > manage to find which values of lenght gives that error) >> > >> > % ./testallocate >> > l ? >> > 1600000000 >> > Segmentation fault >> > % ./testallocate >> > l ? >> > 2000000000 >> > >> > I used debugger with re-compiled version of openmpi using debug flag. >> > I got the folowing error in function sYSMALLOc >> > >> > Program received signal SIGSEGV, Segmentation fault. >> > 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200) >> > at malloc.c:3239 >> > 3239 set_head(remainder, remainder_size | PREV_INUSE); >> > Current language: auto; currently c >> > (gdb) bt >> > #0 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, >> > av=0x2aaaab930200) at malloc.c:3239 >> > #1 0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc >> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322 >> > #2 0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc >> > (bytes=6400000000) at malloc.c:3435 >> > #3 0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook >> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667 >> > #4 0x00002aaaabf8534d in _gfortran_internal_free () >> > from /usr/lib64/libgfortran.so.1 >> > #5 0x0000000000400bcc in MAIN__ () at allocate.F90:11 >> > #6 0x0000000000400c4e in main () >> > (gdb) display >> > (gdb) list >> > 3234 if ((unsigned long)(size) >= (unsigned long)(nb + >> > MINSIZE)) { >> > 3235 remainder_size = size - nb; >> > 3236 remainder = chunk_at_offset(p, nb); >> > 3237 av->top = remainder; >> > 3238 set_head(p, nb | PREV_INUSE | (av != &main_arena ? >> > NON_MAIN_ARENA : 0)); >> > 3239 set_head(remainder, remainder_size | PREV_INUSE); >> > 3240 check_malloced_chunk(av, p, nb); >> > 3241 return chunk2mem(p); >> > 3242 } >> > 3243 >> > >> > >> > I also did the same test in C and I got the same problem. >> > >> > Does someone has any idea that could help me understand what's going >> > on ? >> > >> > Regards >> > Nicolas >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >