You can use a debugger (just gdb will do, no TotalView needed) to find
out which MPI send & receive calls are hanging the code on the
distributed cluster, and see if the send & receive pair is due to a
problem described at:

Deadlock avoidance in your MPI programs:
http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net

Wikipedia Commons
http://commons.wikimedia.org/wiki/User:Raysonho


On Fri, Sep 30, 2011 at 11:06 AM, Jack Bryan <dtustud...@hotmail.com> wrote:
> Hi,
>
> I have a Open MPI program, which works well on a Linux shared memory
> multicore (2 x 6 cores) machine.
>
> But, it does not work well on a distributed cluster with Linux Open MPI.
>
> I found that the the process sends out some messages to other processes,
> which can not receive them.
>
> What is the possible reason ?
>
> I do not change anything of the program.
>
> Any help is really appreciated.
>
> Thanks
>
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Reply via email to