Re: [OMPI users] open-mpi ssh hostname problem

2009-02-09 Thread Bernhard Knapp
Thanks for the hint. If I set the hostname via the console command 
hostname it does not work but if I use the GUI instead to change the 
name it works fine (problem solved). May be there are more commands 
necessary than simply hostname to make it running on the console? 
Bernhard -- Message: 4 Date: Fri, 6 Feb 2009 
17:48:44 -0500 From: Jeff Squyres  Subject: Re: 
[OMPI users] open-mpi ssh hostname problem To: Open MPI Users 
 Message-ID: 
<340a96dd-6cd3-4bec-bcbd-92aa2cfdd...@cisco.com> Content-Type: 
text/plain; charset=US-ASCII; format=flowed; delsp=yes I'm not quite 
sure what you did here; did you set the IP address and hostname to 
something that is resolvable via gethostbyname()? E.g., does the 
hostname exist in DNS or in /etc/hosts and match the IP address that you 
set? On Feb 6, 2009, at 6:18 AM, Bernhard Knapp wrote:



Dear users

I am using the parallel software Gromacs on Fedora8 nodes. I  
installed the software and run it without problems but thereafter I  
moved the node to our server-room and did the following:

- set ip adress, subnetmask and gateway
- changed the ssh port in /etc/ssh/sshd_config since we use port  
forwarding on our router and /usr/sbin/semanage port -a -t  
inetd_child_port_t -p tcp 5101

- changed the firewall settings to additionally allow the new port
- changed the hostname via hostname command

Then I started exactly the same simulation (same command, same data)  
as before (before the network configuration) and it comes up with  
the following error:



ssh: quoVadis01: Name or service not known
--
A daemon (pid 5039) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed  
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to  
have the

location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished


Currently the simulation is only running in parallel on the local 4  
cores and not using the network at all.


Why is it a problem for open-mpi to change the hostname from  
"localhost" to "quoVadis01"? If i change the hostname back it works  
again. How can I make open-mpi running using a hostname different to  
localhost. Simply to reinstall mpi after changing the hostname does  
not help.


cheers
Bernhard





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 




-- Jeff Squyres Cisco Systems



Re: [OMPI users] Job hangs when daemon does not report back from remote machine

2009-02-09 Thread Ralph Castain
The default launcher is ssh - the "rsh" things you see are the name of  
the particular component, not the name of the actual command being  
used. That launcher looks for "ssh" first, and then falls back to  
"rsh" if ssh isn't found.


OMPI currently doesn't support restricted port ranges. We are working  
on a new release that does, but it won't be out for a few weeks. Until  
that time, my only suggestion would be to look at removing the  
firewall on every node in favor of a firewall on the outside of the  
cluster. I'm not sure any other solution is available just yet.


Ralph

On Feb 8, 2009, at 2:08 PM, Kersey Black wrote:


Many thanks.  The firewall is the issue.

On Feb 9, 2009, at 5:56 AM, Ralph Castain wrote:
It sounds to me like TCP communication isn't getting through for  
some reason. Try the following:


mpirun --mca plm_base_verbose 5 --hostfile myh3 -pernode hostname
black@ccn3:~/Documents/mp> mpirun --mca plm_base_verbose 5 -- 
hostfile myh3 -pernode hostname

[ccn3:26932] mca:base:select:(  plm) Querying component [rsh]
[ccn3:26932] mca:base:select:(  plm) Query of component [rsh] set  
priority to 10

[ccn3:26932] mca:base:select:(  plm) Querying component [slurm]
[ccn3:26932] mca:base:select:(  plm) Skipping component [slurm].  
Query failed to return a module

[ccn3:26932] mca:base:select:(  plm) Selected component [rsh]
-hangs here

But, when I turn off the firewall for a moment on both machines,  
local and remote, everything works:
black@ccn3:~/Documents/mp> mpirun --mca plm_base_verbose 5 -- 
hostfile myh3 -pernode hostname

[ccn3:26442] mca:base:select:(  plm) Querying component [rsh]
[ccn3:26442] mca:base:select:(  plm) Query of component [rsh] set  
priority to 10

[ccn3:26442] mca:base:select:(  plm) Querying component [slurm]
[ccn3:26442] mca:base:select:(  plm) Skipping component [slurm].  
Query failed to return a module

[ccn3:26442] mca:base:select:(  plm) Selected component [rsh]
ccn3
ccn4

2 Questions:
1)  Is it really trying to use 'rsh', or is that just part of the  
language in the debugging reporting?  I assume it is actually using  
ssh under the hood, but it is worth asking.  I am using the default  
configuration on this.

black@ccn3:~/Documents/mp> ompi_info --param all all | grep pls
MCA plm: parameter "plm_rsh_agent" (current value:  
"ssh : rsh", data source: default value, synonyms: pls_rsh_agent)
2)  Since it is a firewall issue, I read what I could find and it  
seems there is not a means of restricting port ranges.  Right now,  
each node in this small cluster is running its own firewall rather  
than all being hidden behind some other machine or switch.  Any  
pointers for handling this most easily.


Cheers, Kersey

You should see output from the receipt of a daemon callback for  
each daemon, the the sending of the launch command. My guess is  
that you won't see all the daemons callback, which is why you hang.


This should tell you which node isn't getting a message back to  
wherever mpirun is executing. You might then check to ensure no  
firewalls are in the way to that node, there is a TCP path back  
from it, etc.


I can help with additional diagnostics once we get that far.
Ralph

On Feb 7, 2009, at 2:40 PM, Kersey Black wrote:

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

2009-02-09 Thread Iannetti, Anthony C. (GRC-RTB0)
I have attached the ./configure and make all output for version 1.2.8 as 
directed in the Open MPI "Getting Help" section.   Hopefully, this will guide 
us on what is going on with the 1.3 assembler code.

Tony


Anthony C. Iannetti, P.E.

NASA Glenn Research Center

Aeropropulsion Division, Combustion Branch

21000 Brookpark Road, MS 5-10

Cleveland, OH 44135

phone: (216)433-5586

email: anthony.c.ianne...@nasa.gov

Please note:  All opinions expressed in this message are my own and NOT of 
NASA.  Only the NASA Administrator can speak on behalf of NASA.




LinuxItanium-output.tar.gz
Description: LinuxItanium-output.tar.gz


Re: [OMPI users] Job hangs when daemon does not report back from remote machine

2009-02-09 Thread Kersey Black

Thanks much for all the help.
I will work to wall things off, but as the means of doing that is not  
obvious with the way the network is configured, I will also be  
watchful for new versions which might provide options for this  
situation.


Cheers, Kersey

On Feb 10, 2009, at 2:54 AM, Ralph Castain wrote:

The default launcher is ssh - the "rsh" things you see are the name  
of the particular component, not the name of the actual command  
being used. That launcher looks for "ssh" first, and then falls back  
to "rsh" if ssh isn't found.


OMPI currently doesn't support restricted port ranges. We are  
working on a new release that does, but it won't be out for a few  
weeks. Until that time, my only suggestion would be to look at  
removing the firewall on every node in favor of a firewall on the  
outside of the cluster. I'm not sure any other solution is available  
just yet.


Ralph



Re: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

2009-02-09 Thread Joe Griffin
Tony,

 

My compile line with the error was the following. I believe the one you
had with the error was similar:

 

icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
-I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa
-I../.. \

-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -MT
atomic-asm.lo -MD -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC
-DPIC -o .libs/atomic-asm.o

 

However, your 1.2.8 output had:

 

icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
-I../../ompi/include -I../.. \

-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -pthread
-MT asm.lo -MD -MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o
.libs/asm.o

 

If I use these options, the error goes away.  Here is output from my
screen:

 

ia64b <94> pwd

/scratch/open13/openmpi-1.3/opal/asm

 

ia64b <95> icc -DHAVE_CONFIG_H -I. -I../../opal/include
-I../../orte/include -I../../ompi/include
-I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -restrict -MT atomic-asm.lo -MD
-MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC -DPIC -o
.libs/atomic-asm.o

/scratch/icc777XKf.s(1) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(2) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(3) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(4) : error A2040: Unexpected token: Unary Diez
Operator at: Start

.libs/atomic-asm.o - 4 error(s), 0 warning(s)

 

ia64b <96> icc -DHAVE_CONFIG_H -I. -I../../opal/include
-I../../orte/include -I../../ompi/include -I../.. -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -restrict -pthread -MT asm.lo
-MD -MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o .libs/asm.o

 

ia64b <97> ls -l .libs/asm.o

-rw-r--r--  1 jjg develop 472 Feb  9 16:27 .libs/asm.o

 

So ... for some reasons the compiler options changed.   Can you please

 

1. cd into the .../opal/asm directory

2. Issue the BAD command I have at my prompt 95 and verify the error.

3. Issue the GOOD command I have at my prompt 96 and verify it works.

 

Now .. as to why the options are different .. .I don't know.

 

Just trying to help,

Joe

 

 

 



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Iannetti, Anthony C. (GRC-RTB0)
Sent: Monday, February 09, 2009 6:10 AM
To: Open MPI Users
Subject: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

 

I have attached the ./configure and make all output for version 1.2.8 as
directed in the Open MPI "Getting Help" section.   Hopefully, this will
guide us on what is going on with the 1.3 assembler code.

 

Tony

 

Anthony C. Iannetti, P.E.

NASA Glenn Research Center

Aeropropulsion Division, Combustion Branch

21000 Brookpark Road, MS 5-10

Cleveland, OH 44135

phone: (216)433-5586

email: anthony.c.ianne...@nasa.gov

 

Please note:  All opinions expressed in this message are my own and NOT
of NASA.  Only the NASA Administrator can speak on behalf of NASA.

 

 



Re: [OMPI users] Open MPI 1.3 segfault on amd64 with Rmpi

2009-02-09 Thread Dirk Eddelbuettel

To bring closure to this thread, we found that the following simple patch to
Rmpi/src/Rmpi.c fixes the problem:


--- rmpi-0.5-6.orig/src/Rmpi.c
+++ rmpi-0.5-6/src/Rmpi.c
@@ -63,7 +63,7 @@
else {

 #ifdef OPENMPI
-   dlopen("libmpi.so.0", RTLD_GLOBAL);
+   dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY);
 #endif

 #ifndef MPI2


The fix has been applied to Debian's package and should also be forthcoming
in future releases of Rmpi.  Big thanks to Jeff Squyres for patient help with
the debugging.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

2009-02-09 Thread Brian W. Barrett

Joe -

There are two different files being discussed, which might be the cause of 
the confusion.  And this is really complicated, undocumented code I'm 
shamefully responsible for, so the confusion is quite understandable :).


There's asm.c, which on all non-Sparc v8 platforms just pre-processes down 
to the line:


  #include "opal/sys/atomic.h"

That header file includes all the inlined versions of the assembly, if the 
compiler is detected as supporting inline assembly.


There's then atomic-asm.S, which on all platforms is an assembly file 
(obviously) of all the functions which would be defined by 
opal/sys/atomic.h, to help deal with weird compilerisms.  This file is 
generated from opal/sys/atomic.h by hand, which is a pain.  The file is 
then preprocessed at configure time to generate a file that should work 
with the given compiler.


Anyway, that describes the difference between your two commands, the one 
that works and the one that doesn't.  Why there's a failure, I'm not sure 
and unfortunately, I dont' have time to look into it in detail for the 
next month or so (in that mad, must finish dissertation this month, mode).


Brian

On Mon, 9 Feb 2009, Joe Griffin wrote:



Tony,

 

My compile line with the error was the following. I believe the one you
had with the error was similar:

 

icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
-I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa
-I../.. \

-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -MT
atomic-asm.lo -MD -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC
-DPIC -o .libs/atomic-asm.o

 

However, your 1.2.8 output had:

 

icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
-I../../ompi/include -I../.. \

-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -pthread
-MT asm.lo -MD -MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o .libs/asm.o

 

If I use these options, the error goes away.  Here is output from my
screen:

 

ia64b <94> pwd

/scratch/open13/openmpi-1.3/opal/asm

 

ia64b <95> icc -DHAVE_CONFIG_H -I. -I../../opal/include
-I../../orte/include -I../../ompi/include
-I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -restrict -MT atomic-asm.lo -MD
-MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC -DPIC -o
.libs/atomic-asm.o

/scratch/icc777XKf.s(1) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(2) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(3) : error A2040: Unexpected token: Unary Diez
Operator at: Start

/scratch/icc777XKf.s(4) : error A2040: Unexpected token: Unary Diez
Operator at: Start

.libs/atomic-asm.o - 4 error(s), 0 warning(s)

 

ia64b <96> icc -DHAVE_CONFIG_H -I. -I../../opal/include
-I../../orte/include -I../../ompi/include -I../.. -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -restrict -pthread -MT asm.lo -MD
-MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o .libs/asm.o

 

ia64b <97> ls -l .libs/asm.o

-rw-r--r--  1 jjg develop 472 Feb  9 16:27 .libs/asm.o

 

So ? for some reasons the compiler options changed.   Can you please

 

1. cd into the ?/opal/asm directory

2. Issue the BAD command I have at my prompt 95 and verify the error.

3. Issue the GOOD command I have at my prompt 96 and verify it works.

 

Now .. as to why the options are different .. .I don?t know.

 

Just trying to help,

Joe

 

 

 





From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Iannetti, Anthony C. (GRC-RTB0)
Sent: Monday, February 09, 2009 6:10 AM
To: Open MPI Users
Subject: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

 

I have attached the ./configure and make all output for version 1.2.8 as
directed in the Open MPI "Getting Help" section.   Hopefully, this will
guide us on what is going on with the 1.3 assembler code.

 

Tony

 

Anthony C. Iannetti, P.E.

NASA Glenn Research Center

Aeropropulsion Division, Combustion Branch

21000 Brookpark Road, MS 5-10

Cleveland, OH 44135

phone: (216)433-5586

email: anthony.c.ianne...@nasa.gov

 

Please note:  All opinions expressed in this message are my own and NOT
of NASA.  Only the NASA Administrator can speak on behalf of NASA.

 

 




[OMPI users] undefined symbol: tm_init

2009-02-09 Thread Brett Pemberton

Hey,

I've just installed OpenMPI 1.3 on our cluster, and am getting this 
issue on jobs > 1 node.


mpiexec: symbol lookup error: 
/usr/local/openmpi/1.3-pgi/lib/openmpi/mca_plm_tm.so: undefined symbol: 
tm_init


As reported before, I saw someone saying that they solved this with: 
--enable-mca-static=plm:tm


A new install using this configure option does work for me, but only for 
code recompiled with this new mpicc.  Existing code doesn't spawn properly.


As such, I'd much rather get the existing install working again.

It was suggested that I need the torque libraries on the compute nodes, 
which they are.  However adding them to ld.so.conf has not solved this, 
so I'm not sure what more needs to be done to solve this without 
recompiling openmpi.


Thanks in advance for any help.

/ Brett

--
Brett Pemberton - VPAC Senior Systems Administrator
http://www.vpac.org/ - (03) 9925 4899



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8

2009-02-09 Thread Joe Griffin
Hi Brian, 
 
First off I want to thank you and Jeff for all the work you do.
 
The issue was actually Tony's.  I got involved just because I have a few
itaniums and was willing to try.   Sorry I did not notice the asm.c 
and atomic-asm.S ... argh ... been too long a day.
 
So .. the real question is why is the atomic-asm.S now in the compile.
I am good as I do not have the problem.  Perhaps we we upgrade 
( probably 2010 I will hunt more ).
 
Joe
 



From: users-boun...@open-mpi.org on behalf of Brian W. Barrett
Sent: Mon 2/9/2009 5:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8



Joe -

There are two different files being discussed, which might be the cause of
the confusion.  And this is really complicated, undocumented code I'm
shamefully responsible for, so the confusion is quite understandable :).

There's asm.c, which on all non-Sparc v8 platforms just pre-processes down
to the line:

   #include "opal/sys/atomic.h"

That header file includes all the inlined versions of the assembly, if the
compiler is detected as supporting inline assembly.

There's then atomic-asm.S, which on all platforms is an assembly file
(obviously) of all the functions which would be defined by
opal/sys/atomic.h, to help deal with weird compilerisms.  This file is
generated from opal/sys/atomic.h by hand, which is a pain.  The file is
then preprocessed at configure time to generate a file that should work
with the given compiler.

Anyway, that describes the difference between your two commands, the one
that works and the one that doesn't.  Why there's a failure, I'm not sure
and unfortunately, I dont' have time to look into it in detail for the
next month or so (in that mad, must finish dissertation this month, mode).

Brian

On Mon, 9 Feb 2009, Joe Griffin wrote:

>
> Tony,
>
>  
>
> My compile line with the error was the following. I believe the one you
> had with the error was similar:
>
>  
>
> icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
> -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa
> -I../.. \
>
> -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -MT
> atomic-asm.lo -MD -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC
> -DPIC -o .libs/atomic-asm.o
>
>  
>
> However, your 1.2.8 output had:
>
>  
>
> icc -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include
> -I../../ompi/include -I../.. \
>
> -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -pthread
> -MT asm.lo -MD -MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o .libs/asm.o
>
>  
>
> If I use these options, the error goes away.  Here is output from my
> screen:
>
>  
>
> ia64b <94> pwd
>
> /scratch/open13/openmpi-1.3/opal/asm
>
>  
>
> ia64b <95> icc -DHAVE_CONFIG_H -I. -I../../opal/include
> -I../../orte/include -I../../ompi/include
> -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -O3 -DNDEBUG
> -finline-functions -fno-strict-aliasing -restrict -MT atomic-asm.lo -MD
> -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC -DPIC -o
> .libs/atomic-asm.o
>
> /scratch/icc777XKf.s(1) : error A2040: Unexpected token: Unary Diez
> Operator at: Start
>
> /scratch/icc777XKf.s(2) : error A2040: Unexpected token: Unary Diez
> Operator at: Start
>
> /scratch/icc777XKf.s(3) : error A2040: Unexpected token: Unary Diez
> Operator at: Start
>
> /scratch/icc777XKf.s(4) : error A2040: Unexpected token: Unary Diez
> Operator at: Start
>
> .libs/atomic-asm.o - 4 error(s), 0 warning(s)
>
>  
>
> ia64b <96> icc -DHAVE_CONFIG_H -I. -I../../opal/include
> -I../../orte/include -I../../ompi/include -I../.. -O3 -DNDEBUG
> -finline-functions -fno-strict-aliasing -restrict -pthread -MT asm.lo -MD
> -MP -MF .deps/asm.Tpo -c asm.c  -fPIC -DPIC -o .libs/asm.o
>
>  
>
> ia64b <97> ls -l .libs/asm.o
>
> -rw-r--r--  1 jjg develop 472 Feb  9 16:27 .libs/asm.o
>
>  
>
> So ? for some reasons the compiler options changed.   Can you please
>
>  
>
> 1. cd into the ?/opal/asm directory
>
> 2. Issue the BAD command I have at my prompt 95 and verify the error.
>
> 3. Issue the GOOD command I have at my prompt 96 and verify it works.
>
>  
>
> Now .. as to why the options are different .. .I don?t know.
>
>  
>
> Just trying to help,
>
> Joe
>
>  
>
>  
>
>  
>
>
> 
>
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Iannetti, Anthony C. (GRC-RTB0)
> Sent: Monday, February 09, 2009 6:10 AM
> To: Open MPI Users
> Subject: [OMPI users] Linux Itanium Configure and Make Logs for 1.2.8
>
>  
>
> I have attached the ./configure and make all output for version 1.2.8 as
> directed in the Open MPI "Getting Help" section.   Hopefully, this will
> guide us on what is going on with the 1.3 assembler code.
>
>  
>
> Tony
>
>  
>
> Anthony C. Iannetti, P.E.
>
> NASA Glenn Research Center
>
> Aeropropulsion Division, Combustion Branch
>