Re: [OMPI users] MPI hangs on multiple nodes

Jeff Squyres Tue, 20 Sep 2011 08:11:41 -0400

On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:

> Hi all - and sorry for the multiple postings, but I have more information.


+1 on Eugene's comments.  The test program looks fine to me.

FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler 
allows you to just:

    mpicc mpi_test.c -o mpi_test -Wall

> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't 
> happen until the third iteration. I take that to mean that the basic 
> communication works, but that something is saturating. Is there some notion 
> of buffer size somewhere in the MPI system that could explain this?

Hmm.  This is not a good sign; it somewhat indicates a problem with your OS.  
Based on this email and your prior emails, I'm guessing you're using TCP for 
communication, and that the problem is based on inter-node communication (e.g., 
the problem would occur even if you only run 1 process per machine, but does 
not occur if you run all N processes on a single machine, per your #4, below).

> 2: The nodes have 4 ethernet cards each. Could the mapping be a problem?

Shouldn't be.  If it runs at all, then it should run fine.

Do you have all your ethernet cards on a single subnet, or multiple subnets?  I 
have heard of problems when you have multiple ethernet cards on the same subnet 
-- I believe there's some non-determinism in than case in what wire/NIC a 
packet will actually go out, which may be problematic for OMPI.

> 3: The cpus are running at a 100% for all processes involved in the freeze

That's probably right.  OMPI aggressively polls for progress as a way to 
decrease latency.  So all processes are trying to make progress, and therefore 
are aggressively polling, eating up 100% of the CPU.

> 4: The same test program 
> (http://code.google.com/p/pypar/source/browse/source/mpi_test.c) works fine 
> when run within one node so the problem must be with MPI and/or our network. 

This helps identify the issue as the TCP communication, not the shared memory 
communication.

> 5: The network and ssh works otherwise fine.

Good.

> Again many thanks for any hint that can get us going again. The main thing we 
> need is some diagnostics that may point to what causes this problem for MPI.

If you are running with multiple NICs on the same subnet, change them to 
multiple subnets and see if it starts working fine.

If they're on different subnets, try using the btl_tcp_if_include / 
btl_tcp_if_exclude MCA parameters to exclude certain networks and see if 
they're the problematic ones.  Keep in mind that ..._include and ..._exclude 
are mutually exclusive; you should only specify one.  And if you specify 
exclude, be sure to exclude loopback.  E.g:

  mpirun --mca btl_if_include eth0,eth1 -np 16 --hostfile hostfile mpi_test
or
  mpirun --mca btl_if_exclude lo0,eth1 -np 16 --hostfile hostfile mpi_test

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] MPI hangs on multiple nodes

Reply via email to