Hi All,
Just to polish this thread off
To make openmpi work on my OS X 10.5 machine I need only:
./configure --prefix=/Network/Xgrid/openmpi
make
make install
I then edited
/Network/Xgrid/openmpi/etc/openmpi-mca-params.conf
and added
# set ports so that they are more valid than the defaul
On Aug 12, 2009, at 19:09 PM, Ralph Castain wrote:
Hmmm...well, I'm going to ask our TCP friends for some help here.
Meantime, I do see one thing that stands out. Port 4 is an awfully
low port number that usually sits in the reserved range. I checked
the /etc/services file on my Mac, and
Agreed -- ports 4 and 260 should be in the reserved ports range. Are
you running as root, perchance?
On Aug 12, 2009, at 10:09 PM, Ralph Castain wrote:
Hmmm...well, I'm going to ask our TCP friends for some help here.
Meantime, I do see one thing that stands out. Port 4 is an awfully
low
Hmmm...well, I'm going to ask our TCP friends for some help here.
Meantime, I do see one thing that stands out. Port 4 is an awfully low
port number that usually sits in the reserved range. I checked the /
etc/services file on my Mac, and it was commented out as unassigned,
which should mean
Hi Jody
Jody Klymak wrote:
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote:
Did you wipe off the old directories before reinstalling?
Check.
I prefer to install on a NFS mounted directory,
Check
Have you tried to ssh from node to node on all possible pairs?
check - fixed this toda
On Aug 12, 2009, at 12:46 PM, Jody Klymak wrote:
So I think ranks 0 and 2 are on xserve02 and rank 1 is on xserve01,
Should read xserve03,
--
Jody Klymak
http://web.uvic.ca/~jklymak/
On Aug 12, 2009, at 12:31 PM, Ralph Castain wrote:
Well, it is getting better! :-)
On your cmd line, what btl's are you specifying? You should try -mca
btl sm,tcp,self for this to work. Reason: sometimes systems block
tcp loopback on the node. What I see below indicates that inter-node
Well, it is getting better! :-)
On your cmd line, what btl's are you specifying? You should try -mca btl
sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback
on the node. What I see below indicates that inter-node comm was fine, but
the two procs that share a node couldn't co
Hi Ralph,
That gives me something more to work with...
On Aug 12, 2009, at 9:44 AM, Ralph Castain wrote:
I believe TCP works fine, Jody, as it is used on Macs fairly widely.
I suspect this is something funny about your installation.
One thing I have found is that you can get this error me
I believe TCP works fine, Jody, as it is used on Macs fairly widely. I
suspect this is something funny about your installation.
One thing I have found is that you can get this error message when you have
multiple NICs installed, each with a different subnet, and the procs try to
connect across dif
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote:
Did you wipe off the old directories before reinstalling?
Check.
I prefer to install on a NFS mounted directory,
Check
Have you tried to ssh from node to node on all possible pairs?
check - fixed this today, works fine with the spawni
Hi Jody
Jody Klymak wrote:
On Aug 11, 2009, at 17:35 PM, Gus Correa wrote:
You can check this, say, by logging in to each node and doing
/usr/local/openmpi/bin/ompi_info and comparing the output.
Yep, they are all the same 1.3.3, SVN r21666, July 14th 2009.
Did you wipe off the old dir
On Aug 11, 2009, at 17:35 PM, Gus Correa wrote:
You can check this, say, by logging in to each node and doing /usr/
local/openmpi/bin/ompi_info and comparing the output.
Yep, they are all the same 1.3.3, SVN r21666, July 14th 2009.
What about passwords? ssh from server to node is password
Hi Jody
Are you sure you have the same OpenMPI version installed on
/usr/local/openmpi on *all* nodes?
The fact that the programs run on the xserver0, but hang
when you try xserver0 and xserver1 together suggest
some inconsistency in the runtime environment,
which may come from different OpenM
I can't speak to the tcp problem, but the following:
[xserve02.local:43625] [[28627,0],2] orte:daemon:send_relay -
recipient list is empty!
is not an error message. It is perfectly normal operation.
Ralph
On Aug 11, 2009, at 1:54 PM, Jody Klymak wrote:
Hello,
On Aug 11, 2009, at 8:15
15 matches
Mail list logo