netstat don't show loopback interface even on head node while ifconfig
shows Loopback up and running on compute nodes as well as master node.
[root@pmd ~]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.3.0 0.0.0.0
but it is running on your head node isnt't it ?
you might want to double check why there is no loopback interface on
your compute nodes.
in the mean time, you can disable lo and ib0 interfaces
Cheers,
Gilles
On 2014/11/13 16:59, Syed Ahsan Ali wrote:
> I don't see it running
>
> [pmdtest@compu
I don't see it running
[pmdtest@compute-01-01 ~]$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.108.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
169.254.0.0 0.0.0.0 255.255.0.0 U
This is really weird ?
is the loopback interface up and running on both nodes and with the same
ip ?
can you run on both compute nodes ?
netstat -nr
On 2014/11/13 16:50, Syed Ahsan Ali wrote:
> Now it looks through the loopback address
>
> [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01
Ok ok I can disable that as well.
Thank you guys. :)
On Thu, Nov 13, 2014 at 12:50 PM, Syed Ahsan Ali wrote:
> Now it looks through the loopback address
>
> [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 --mca
> btl_tcp_if_exclude ib0 ring_c
> Process 0 sending 10 to 1, tag 201 (2 pro
Now it looks through the loopback address
[pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 --mca
btl_tcp_if_exclude ib0 ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
[compute-01-01.private.dns.zone][[37713,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_conne
--mca btl ^openib
disables the openib btl, which is native infiniband only.
ib0 is treated as any TCP interface and then handled by the tcp btl
an other option is you to use
--mca btl_tcp_if_exclude ib0
On 2014/11/13 16:43, Syed Ahsan Ali wrote:
> You are right it is running on 10.0.0.0 interfac
You are right it is running on 10.0.0.0 interface [pmdtest@pmd ~]$
mpirun --mca btl ^openib --host compute-01-01,compute-01-06 --mca
btl_tcp_if_include 10.0.0.0/8 ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented
mpirun complains about the 192.168.108.10 ip address, but ping reports a
10.0.0.8 address
is the 192.168.* network a point to point network (for example between a
host and a mic) so two nodes
cannot ping each other via this address ?
/* e.g. from compute-01-01 can you ping the 192.168.108.* ip add
Same result in both cases
[pmdtest@pmd ~]$ mpirun --mca btl ^openib --host
compute-01-01,compute-01-06 ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
[compute-01-01.private.dns.zone][[47139,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp
Hi,
it seems you messed up the command line
could you try
$ mpirun --mca btl ^openib --host compute-01-01,compute-01-06 ring_c
can you also try to run mpirun from a compute node instead of the head
node ?
Cheers,
Gilles
On 2014/11/13 16:07, Syed Ahsan Ali wrote:
> Here is what I see when di
Here is what I see when disabling openib support.\
[pmdtest@pmd ~]$ mpirun --host --mca btl ^openib
compute-01-01,compute-01-06 ring_c
ssh: orted: Temporary failure in name resolution
ssh: orted: Temporary failure in name resolution
--
Hi Jefff
No firewall is enabled. Running the diagnostics I found that non
communication mpi job is running . While ring_c remains stuck. There
are of course warnings for open fabrics but in my case I an running
application by disabling openib., Please see below
[pmdtest@pmd ~]$ mpirun --host co
Do you have firewalling enabled on either server?
See this FAQ item:
http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali wrote:
> Dear All
>
> I need your advice. While trying to run mpirun job across nodes I get
> follo
Dear All
I need your advice. While trying to run mpirun job across nodes I get
following error. It seems that the two nodes i.e, compute-01-01 and
compute-01-06 are not able to communicate with each other. While nodes
see each other on ping.
[pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostli
15 matches
Mail list logo