On Aug 8, 2014, at 1:24 AM, Lane, William <william.l...@cshs.org> wrote:

> Using the "--mca btl tcp,self" switch to mpirun solved all the issues (in 
> addition to
> the requirement to include the --mca btl_tcp_if_include eth0 switch). I 
> believe
> the "--mca btl tcp,self" switch limits inter-process communication within a 
> node to using the TCP
> loopback rather than shared memory.

Correct.  You will not be using shared memory for MPI communication at all -- 
just TCP.

> I should also point out that all of the nodes
> on this cluster feature NUMA architecture.
> 
> Will using the "--mca btl tcp,self" switch to mpirun result in any degraded 
> performance
> issues over using shared memory?

Generally yes, but it depends on your application.  If your application does 
very little MPI communication, then the difference between shared memory and 
TCP is likely negligible.

I'd strongly suggest two things:

- Upgrade to at least Open MPI 1.6.5 (1.8.x would be better, if possible)
- Run your program through a memory-checking debugger such as Valgrind

Seg faults like you initially described can be caused by errors in your MPI 
application itself -- the fact that using TCP only (and not shared memory) 
avoids the segvs does not mean that the issue is actually fixed; it may well 
mean that the error is still there, but is happening in a case that doesn't 
seem to cause enough damage to cause a segv.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to