Try adding "-mca btl sm,self,tcp" to your cmd line. Does everything work then?
I'm thinking the problem is that we detect something not quite right about the
ofed installation and abort, but earlier versions of OMPI may have just warned
and continued by running TCP instead. IIRC, some users comp
On issuing ibhosts command I can see this:
*# ibhosts | sort*
Ca: 0x00228870a432 ports 2 "sv-2 qib0"
Ca: 0x00228870a47c ports 2 "sv-3 qib0"
Ca: 0x00228870a4a8 ports 2 "sv-1 qib0"
Ca: 0x00228877ca2c ports 1 "@ HCA-1"
Ca: 0x00228877d7f4 ports 1 "SERVER-14 HC
On first 7 nodes:
*[mpidemo@SERVER-3 ~]$ ofed_info | head -n 1*
OFED-1.5.3.2:
*[mpidemo@SERVER-3 ~]$ which ofed_info*
/usr/bin/ofed_info
On last 4 nodes:
*[mpidemo@sv-2 ~]$ ofed_info | head -n 1*
-bash: ofed_info: command not found
*[mpidemo@sv-2 ~]$ which ofed_info*
/usr/bin/which: no ofed_i
Are the ofed versions the same across all the machines? I would suspect that
might be the problem.
On Aug 3, 2013, at 4:06 PM, RoboBeans wrote:
> Hi Ralph, I tried using 1.5.4, 1.6.5 and 1.7.2 (compiled from source code)
> with no configuration arguments but I am facing the same issue. When I
Hi Ralph, I tried using 1.5.4, 1.6.5 and 1.7.2 (compiled from source
code) with no configuration arguments but I am facing the same issue.
When I run a job using 1.5.4 (installed using yum), I get warnings but
it doesn't affect my output.
Example of warning that I get:
sv-2.7960ipath_userinit
Hmmm...strange indeed. I would remove those four configure options and give it
a try. That will eliminate all the obvious things, I would think, though they
aren't generally involved in the issue shown here. Still, worth taking out
potential trouble sources.
What is the connectivity between SER
Thanks for looking into in Ralph. I modified the hosts file but I am
still getting the same error. Any other pointers you can think of? The
difference between this 1.7.2 installation and 1.5.4 is that I installed
1.5.4 using yum and for 1.7.2, I used the source code and configured
with *--enabl
It looks like SERVER-2 cannot talk to your x.x.x.100 machine. I note that you
have some entries at the end of the hostfile that I don't understand - a list
of hosts that can be reached? And I see that your x.x.x.22 machine isn't on it.
Is that SERVER-2 by chance?
Our hostfile parsing changed be
Hello everyone,
I have installed openmpi 1.5.4 on 11 node cluster using "yum install
openmpi openmpi-devel" and everything seems to be working fine. For
testing I am using this test program
//**
*$ cat test.cpp*
#include
#incl