Hi,

I want to use openmpi across two machines, each machine has more than one
NIC:

wukong: eth0 (152.48.249.102, no MPI traffic), eth1 (128.109.34.20,yes MPI
traffic)

zelda01: eth0 (130.207.252.131, yes MPI traffic), eth2 (10.0.0.12, no MPI
traffic)

on wukong, I have :
[humphrey@wukong ~]$ more ~/.openmpi/mca-params.conf
btl_tcp_if_include=eth1

on zelda01, I have : 
[humphrey@zelda01 humphrey]$ more ~/.openmpi/mca-params.conf
btl_tcp_if_include=eth0

Here's what I get when I attempt to run it from wukong (128.109.34.20). It
just hangs at this point, as I believe the remote machine (Zelda01) is
trying to make contact with wukong on the non-accessible interface
(152.48.249.102). This is based on openmpi-1.0rc5r7944. 

What am I doing wrong?

Thanks,
Marty

Marty Humphrey
Assistant Professor
Department of Computer Science
University of Virginia


[humphrey@wukong ~]$ mpirun -d --mca btl tcp  --host
128.109.34.20,130.207.252.131 -np 2 a.out [wukong.ncren.net:17236] [0,0,0]
setting up session dir with
[wukong.ncren.net:17236]        universe default-universe
[wukong.ncren.net:17236]        user humphrey
[wukong.ncren.net:17236]        host wukong.ncren.net
[wukong.ncren.net:17236]        jobid 0
[wukong.ncren.net:17236]        procid 0
[wukong.ncren.net:17236] procdir:
/tmp/openmpi-sessions-humphrey@wukong.ncren.net_0/default-universe/0/0
[wukong.ncren.net:17236] jobdir:
/tmp/openmpi-sessions-humphrey@wukong.ncren.net_0/default-universe/0
[wukong.ncren.net:17236] unidir:
/tmp/openmpi-sessions-humphrey@wukong.ncren.net_0/default-universe
[wukong.ncren.net:17236] top: openmpi-sessions-humphrey@wukong.ncren.net_0
[wukong.ncren.net:17236] tmp: /tmp
[wukong.ncren.net:17236] [0,0,0] contact_file
/tmp/openmpi-sessions-humphrey@wukong.ncren.net_0/default-universe/universe-
setup.txt
[wukong.ncren.net:17236] [0,0,0] wrote setup file [wukong.ncren.net:17236]
pls:rsh: local csh: 0, local bash: 1 [wukong.ncren.net:17236] pls:rsh:
assuming same remote shell as local shell [wukong.ncren.net:17236] pls:rsh:
remote csh: 0, remote bash: 1 [wukong.ncren.net:17236] pls:rsh: final
template argv:
[wukong.ncren.net:17236] pls:rsh:     ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 3 --vpid_start 0 --nodename
<template> --universe humph...@wukong.ncren.net:default-universe --nsreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964" --gprreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0
[wukong.ncren.net:17236] pls:rsh: launching on node 128.109.34.20
[wukong.ncren.net:17236] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0 [wukong.ncren.net:17236] pls:rsh: 128.109.34.20 is
a LOCAL node [wukong.ncren.net:17236] pls:rsh: executing: orted --debug
--bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename
128.109.34.20 --universe humph...@wukong.ncren.net:default-universe
--nsreplica "0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--gprreplica "0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0 [wukong.ncren.net:17237] [0,0,1] setting up session dir
with
[wukong.ncren.net:17237]        universe default-universe
[wukong.ncren.net:17237]        user humphrey
[wukong.ncren.net:17237]        host 128.109.34.20
[wukong.ncren.net:17237]        jobid 0
[wukong.ncren.net:17237]        procid 1
[wukong.ncren.net:17237] procdir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe/0/1
[wukong.ncren.net:17237] jobdir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe/0
[wukong.ncren.net:17237] unidir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe
[wukong.ncren.net:17237] top: openmpi-sessions-humphrey@128.109.34.20_0
[wukong.ncren.net:17237] tmp: /tmp
[wukong.ncren.net:17236] pls:rsh: launching on node 130.207.252.131
[wukong.ncren.net:17236] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0 [wukong.ncren.net:17236] pls:rsh: 130.207.252.131
is a REMOTE node [wukong.ncren.net:17236] pls:rsh: executing: ssh
130.207.252.131 orted --debug --bootproxy 1 --name 0.0.2 --num_procs 3
--vpid_start 0 --nodename 130.207.252.131 --universe
humph...@wukong.ncren.net:default-universe --nsreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964" --gprreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0 [zelda01.localdomain:08631] [0,0,2] setting up session
dir with
[zelda01.localdomain:08631]     universe default-universe
[zelda01.localdomain:08631]     user humphrey
[zelda01.localdomain:08631]     host 130.207.252.131
[zelda01.localdomain:08631]     jobid 0
[zelda01.localdomain:08631]     procid 2
[zelda01.localdomain:08631] procdir:
/tmp/openmpi-sessions-humphrey@130.207.252.131_0/default-universe/0/2
[zelda01.localdomain:08631] jobdir:
/tmp/openmpi-sessions-humphrey@130.207.252.131_0/default-universe/0
[zelda01.localdomain:08631] unidir:
/tmp/openmpi-sessions-humphrey@130.207.252.131_0/default-universe
[zelda01.localdomain:08631] top: openmpi-sessions-humphrey@130.207.252.131_0
[zelda01.localdomain:08631] tmp: /tmp
[wukong.ncren.net:17239] [0,1,0] setting up session dir with
[wukong.ncren.net:17239]        universe default-universe
[wukong.ncren.net:17239]        user humphrey
[wukong.ncren.net:17239]        host 128.109.34.20
[wukong.ncren.net:17239]        jobid 1
[wukong.ncren.net:17239]        procid 0
[wukong.ncren.net:17239] procdir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe/1/0
[wukong.ncren.net:17239] jobdir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe/1
[wukong.ncren.net:17239] unidir:
/tmp/openmpi-sessions-humphrey@128.109.34.20_0/default-universe
[wukong.ncren.net:17239] top: openmpi-sessions-humphrey@128.109.34.20_0
[wukong.ncren.net:17239] tmp: /tmp



Reply via email to