Hi Kevin,
Thanks for your reply.
Dasher is physically located under my desk and vixen is in a
cecure data center.
> does dasher have any network interfaces that vixen does not?
No, I don't think so.
Here is more definitive info:
[tsakai@dasher Rmpi]$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:1A:A0:E1:84:A9
inet addr:172.16.0.116 Bcast:172.16.3.255 Mask:255.255.252.0
inet6 addr: fe80::21a:a0ff:fee1:84a9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2347 errors:0 dropped:0 overruns:0 frame:0
TX packets:1005 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:531809 (519.3 KiB) TX bytes:269872 (263.5 KiB)
Memory:c2200000-c2220000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:74 errors:0 dropped:0 overruns:0 frame:0
TX packets:74 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:7824 (7.6 KiB) TX bytes:7824 (7.6 KiB)
[tsakai@dasher Rmpi]$
However, vixen has two ethernet[tsakai@vixen Rmpi]$ cat moo
[root@vixen ec2]# /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:31
inet addr:10.1.1.2 Bcast:192.168.255.255 Mask:255.0.0.0
inet6 addr: fe80::21a:a0ff:fe1c:31/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:61913135 errors:0 dropped:0 overruns:0 frame:0
TX packets:61923635 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:47832124690 (44.5 GiB) TX bytes:54515478860 (50.7 GiB)
Interrupt:185 Memory:ea000000-ea012100
eth1 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:33
inet addr:172.16.1.107 Bcast:172.16.3.255 Mask:255.255.252.0
inet6 addr: fe80::21a:a0ff:fe1c:33/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5204431112 errors:0 dropped:0 overruns:0 frame:0
TX packets:8935796075 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:371123590892 (345.6 GiB) TX bytes:13424246629869 (12.2
TiB)
Interrupt:193 Memory:ec000000-ec012100
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:244169216 errors:0 dropped:0 overruns:0 frame:0
TX packets:244169216 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1190976360356 (1.0 TiB) TX bytes:1190976360356 (1.0
TiB)
[root@vixen ec2]# interfaces:
Please see the mail posting that follows this, my reply to Ashley,
whom nailed the problem precisely.
Regards,
Tena
On 2/14/11 1:35 PM, "[email protected]"
<[email protected]> wrote:
>
> This probably shows my lack of understanding as to how OpenMPI
> negotiates the connectivity between nodes when given a choice
> of interfaces but anyway:
>
> does dasher have any network interfaces that vixen does not?
>
> The scenario I am imgaining would be that you ssh into dasher
> from vixen using a "network" that both share and similarly, when
> you mpirun from vixen, the network that OpenMPI uses is constrained
> by the interfaces that can be seen from vixen, so you are fine.
>
> However when you are on dasher, mpirun sees another interface which
> it takes a liking to and so tries to use that, but that interface
> is not available to vixen so the OpenMPI processes spawned there
> terminate when they can't find that interface so as to talk back
> to dasher's controlling process.
>
> I know that you are no longer working with VMs but it's along those
> lines that I was thinking: extra network interfaces that you assume
> won't be used but which are and which could then be overcome by use
> of an explicit
>
> --mca btl_tcp_if_exclude virbr0
>
> or some such construction (virbr0 used as an example here).
>
> Kevin