You can use a debugger (just gdb will do, no TotalView needed) to find out which MPI send & receive calls are hanging the code on the distributed cluster, and see if the send & receive pair is due to a problem described at:
Deadlock avoidance in your MPI programs: http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html Rayson ================================= Grid Engine / Open Grid Scheduler http://gridscheduler.sourceforge.net Wikipedia Commons http://commons.wikimedia.org/wiki/User:Raysonho On Fri, Sep 30, 2011 at 11:06 AM, Jack Bryan <dtustud...@hotmail.com> wrote: > Hi, > > I have a Open MPI program, which works well on a Linux shared memory > multicore (2 x 6 cores) machine. > > But, it does not work well on a distributed cluster with Linux Open MPI. > > I found that the the process sends out some messages to other processes, > which can not receive them. > > What is the possible reason ? > > I do not change anything of the program. > > Any help is really appreciated. > > Thanks > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/