Yes, there is a system firewall. I don't think the sysadmin will
allow it to go disabled. Each Linux machine has the built-in RHEL
firewall. SSH is enabled through the firewall though.
--- On Tue, 7/6/10, Ralph Castain <r...@open-mpi.org> wrote:
From: Ralph Castain <r...@open-mpi.org>
Subject: Re: [OMPI users] OpenMPI Hangs, No Error
To: "Open MPI Users" <us...@open-mpi.org>
Date: Tuesday, July 6, 2010, 4:19 PM
It looks like the remote daemon is starting - is there a firewall
in the way?
On Jul 6, 2010, at 2:04 PM, Robert Walters wrote:
Hello all,
I am using OpenMPI 1.4.2 on RHEL. I have a cluster of AMD
Opteron's and right now I am just working on getting OpenMPI
itself up and running. I have a successful configure and make all
install. LD_LIBRARY_PATH and PATH variables were correctly edited.
mpirun -np 8 hello_c successfully works on all machines. I have
setup my two test machines with DSA key pairs that successfully
work with each other.
The problem comes when I initiate my hostfile to attempt to
communicate across machines. The hostfile is setup correctly with
<host_name> <slots> <max-slots>. When running with all verbose
options enabled "mpirun --mca plm_base_verbose 99 --debug-daemons
--mca btl_base_verbose 30 --mca oob_base_verbose 99 --mca
pml_base_verbose 99 -hostfile hostfile -np 16 hello_c" I receive
the following text output.
[machine1:03578] mca: base: components_open: Looking for plm
components
[machine1:03578] mca: base: components_open: opening plm components
[machine1:03578] mca: base: components_open: found loaded
component rsh
[machine1:03578] mca: base: components_open: component rsh has no
register function
[machine1:03578] mca: base: components_open: component rsh open
function successful
[machine1:03578] mca: base: components_open: found loaded
component slurm
[machine1:03578] mca: base: components_open: component slurm has
no register function
[machine1:03578] mca: base: components_open: component slurm open
function successful
[machine1:03578] mca:base:select: Auto-selecting plm components
[machine1:03578] mca:base:select:( plm) Querying component [rsh]
[machine1:03578] mca:base:select:( plm) Query of component [rsh]
set priority to 10
[machine1:03578] mca:base:select:( plm) Querying component [slurm]
[machine1:03578] mca:base:select:( plm) Skipping component
[slurm]. Query failed to return a module
[machine1:03578] mca:base:select:( plm) Selected component [rsh]
[machine1:03578] mca: base: close: component slurm closed
[machine1:03578] mca: base: close: unloading component slurm
[machine1:03578] mca: base: components_open: Looking for oob
components
[machine1:03578] mca: base: components_open: opening oob components
[machine1:03578] mca: base: components_open: found loaded
component tcp
[machine1:03578] mca: base: components_open: component tcp has no
register function
[machine1:03578] mca: base: components_open: component tcp open
function successful
Daemon was launched on machine2- beginning to initialize
[machine2:01962] mca: base: components_open: Looking for oob
components
[machine2:01962] mca: base: components_open: opening oob components
[machine2:01962] mca: base: components_open: found loaded
component tcp
[machine2:01962] mca: base: components_open: component tcp has no
register function
[machine2:01962] mca: base: components_open: component tcp open
function successful
Daemon [[1418,0],1] checking in as pid 1962 on host machine2
Daemon [[1418,0],1] not using static ports
At this point the system hangs indefinitely. While running top on
the machine2 terminal, I see several things come up briefly. These
items are: sshd (root), tcsh (myuser), orted (myuser), and
mcstransd (root). I was wondering if sshd needs to be initiated by
myuser? It is currently turned off in sshd_config through UsePAM
yes. This was setup by the sysadmin but it can be worked around if
this is necessary.
So in summary, mpirun works on each machine individually, but
hangs when initiated through a hostfile or with the -host flag. ./
configure with defaults and --prefix. LD_LIBRARY_PATH and PATH set
up correctly. Any help is appreciated. Thanks!
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
-----Inline Attachment Follows-----
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users