On Jul 12, 2011, at 3:26 PM, Steve Kargl wrote:

> % /usr/local/ompi/bin/mpiexec -machinefile mf --mca btl self,tcp \
>  --mca btl_base_verbose 30 ./z
> 
> with mf containing 
> 
> node11 slots=1   (node11 contains a single bge0=168.192.0.11)
> node16 slots=1   (node16 contains a single bge0=168.192.0.16)
> 
> or
> 
> node11 slots=2   (communication on memory bus)
> 
> However, if mf contains
> 
> node10 slots=1   (node10 contains bge0=10.208.xx and bge1=192.168.0.10)
> node16 slots=1   (node16 contains a single bge0=192.168.0.16)
> 
> I see the same problem where node10 cannot communicate with node16.

If you ever get the time to check into the code to see why this is happening, 
I'd be curious to hear what you find (per my explanation of the TCP BTL here: 
http://www.open-mpi.org/community/lists/users/2011/07/16872.php).

> Good News:
> 
> Adding 'btl_tcp_if_include=192.168.0.0/16' to my ~/.openmpi/mca-params.conf
> file seems to cure the communication problem.

Good.

> Thanks for the help.  If I run into any other problems with trunk,
> I'll report those here.

Keep in mind the usual disclaimers with development trunks -- it's *usually* 
stable, but sometimes it does break.  

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to