Hi, can you try Open MPI 1.3 version.
On 3/9/09, Prasanna Ranganathan <prasa...@searchme.com> wrote: > > Hi all, > > I have a distributed program running on 400+ nodes and using OpenMPI. I > have run the same binary with nearly the same setup successfully previously. > However in my last two runs the program seems to be getting stuck after a > while before it completes. The stack trace at the time it gets stuck is as > follows: > > #0 0x00002ad0000c00df in poll () from /lib/libc.so.6 > #1 0x00002acfffa49c27 in opal_poll_dispatch () from > /usr/lib64/libopen-pal.so.0 > #2 0x00002acfffa47add in opal_event_base_loop () from > /usr/lib64/libopen-pal.so.0 > #3 0x00002acfffa43203 in opal_progress () from /usr/lib64/libopen-pal.so.0 > #4 0x00002acfff78b315 in ompi_request_test_some () from > /usr/lib64/libmpi.so.0 > #5 0x00002acfff7adf7a in PMPI_Testsome () from /usr/lib64/libmpi.so.0 > .... > > I checked all the nodes and they seem to be up and doing fine. Any > suggestions/hints on what might be happening here would help greatly. Thanks > in advance. > > I am using OpenMPI 1.2.7 on gentoo linux. > > Regards, > > Prasanna. > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >