What interconnect are you using? Infiniband? Use  "--without-memory-manager"
option while building ompi in order to disable ptmalloc.

Regards
--Nysal

On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere <
nicolas.deladerri...@gmail.com> wrote:

> Yes, I'am using 24G machine on 64 bit Linux OS.
> If I compile without wrapper, I did not get any problems.
>
> It seems that when I am linking with openmpi, my program use a kind of
> openmpi implemented malloc. Is it possible to switch it off in order ot only
> use malloc from libc ?
>
> Nicolas
>
> 2010/8/8 Terry Frankcombe <te...@chem.gu.se>
>
> You're trying to do a 6GB allocate.  Can your underlying system handle
>> that?  IF you compile without the wrapper, does it work?
>>
>> I see your executable is using the OMPI memory stuff.  IIRC there are
>> switches to turn that off.
>>
>>
>> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
>> > Hello,
>> >
>> > I'am having an sigsegv error when using simple program compiled and
>> > link with openmpi.
>> > I have reproduce the problem using really simple fortran code. It
>> > actually does not even use MPI, but just link with mpi shared
>> > libraries. (problem does not appear when I do not link with mpi
>> > libraries)
>> >    % cat allocate.F90
>> >    program test
>> >    implicit none
>> >        integer, dimension(:), allocatable :: z
>> >        integer(kind=8) :: l
>> >
>> >        write(*,*) "l ?"
>> >        read(*,*) l
>> >
>> >        ALLOCATE(z(l))
>> >        z(1) = 111
>> >        z(l) = 222
>> >        DEALLOCATE(z)
>> >
>> >    end program test
>> >
>> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the
>> > compilation :
>> >
>> >    % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
>> > allocate.F90
>> >    gfortran -g -o testallocate allocate.F90
>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
>> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
>> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
>> > -lutil -lm -ldl -pthread
>> >
>> > When I am running that test with different length, I sometimes get a
>> > "Segmentation fault" error. Here are two examples using two specific
>> > values, but error happens for many other values of length (I did not
>> > manage to find which values of lenght gives that error)
>> >
>> >    %  ./testallocate
>> >     l ?
>> >    1600000000
>> >    Segmentation fault
>> >    % ./testallocate
>> >     l ?
>> >    2000000000
>> >
>> > I used debugger with re-compiled version of openmpi using debug flag.
>> > I got the folowing error in function sYSMALLOc
>> >
>> >    Program received signal SIGSEGV, Segmentation fault.
>> >    0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200)
>> > at malloc.c:3239
>> >    3239        set_head(remainder, remainder_size | PREV_INUSE);
>> >    Current language:  auto; currently c
>> >    (gdb) bt
>> >    #0  0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016,
>> > av=0x2aaaab930200) at malloc.c:3239
>> >    #1  0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc
>> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322
>> >    #2  0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc
>> > (bytes=6400000000) at malloc.c:3435
>> >    #3  0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook
>> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667
>> >    #4  0x00002aaaabf8534d in _gfortran_internal_free ()
>> > from /usr/lib64/libgfortran.so.1
>> >    #5  0x0000000000400bcc in MAIN__ () at allocate.F90:11
>> >    #6  0x0000000000400c4e in main ()
>> >    (gdb) display
>> >    (gdb) list
>> >    3234      if ((unsigned long)(size) >= (unsigned long)(nb +
>> > MINSIZE)) {
>> >    3235        remainder_size = size - nb;
>> >    3236        remainder = chunk_at_offset(p, nb);
>> >    3237        av->top = remainder;
>> >    3238        set_head(p, nb | PREV_INUSE | (av != &main_arena ?
>> > NON_MAIN_ARENA : 0));
>> >    3239        set_head(remainder, remainder_size | PREV_INUSE);
>> >    3240        check_malloced_chunk(av, p, nb);
>> >    3241        return chunk2mem(p);
>> >    3242      }
>> >    3243
>> >
>> >
>> > I also did the same test in C and I got the same problem.
>> >
>> > Does someone has any idea that could help me understand what's going
>> > on ?
>> >
>> > Regards
>> > Nicolas
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to