I've benn investigating and there is no firewall that could stop TCP
traffic in the cluster. With the option --mca plm_base_verbose 30 I get
the following output:
[itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host itanium2
helloworld.out
[itanium1:08311] mca: base: components_open: Looking for plm components
[itanium1:08311] mca: base: components_open: opening plm components
[itanium1:08311] mca: base: components_open: found loaded component rsh
[itanium1:08311] mca: base: components_open: component rsh has no
register
function
[itanium1:08311] mca: base: components_open: component rsh open function
successful
[itanium1:08311] mca: base: components_open: found loaded component
slurm
[itanium1:08311] mca: base: components_open: component slurm has no
register function
[itanium1:08311] mca: base: components_open: component slurm open
function
successful
[itanium1:08311] mca:base:select: Auto-selecting plm components
[itanium1:08311] mca:base:select:( plm) Querying component [rsh]
[itanium1:08311] mca:base:select:( plm) Query of component [rsh] set
priority to 10
[itanium1:08311] mca:base:select:( plm) Querying component [slurm]
[itanium1:08311] mca:base:select:( plm) Skipping component [slurm].
Query
failed to return a module
[itanium1:08311] mca:base:select:( plm) Selected component [rsh]
[itanium1:08311] mca: base: close: component slurm closed
[itanium1:08311] mca: base: close: unloading component slurm
--Hangs here
It seems a slurm problem??
Thanks to any idea
El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió:
Did you configure OMPI with --enable-debug? You should do this so that
more diagnostic output is available.
You can also add the following to your cmd line to get more info:
--debug --debug-daemons --leave-session-attached
Something is likely blocking proper launch of the daemons and
processes so
you aren't getting to the btl's at all.
On Mar 19, 2010, at 9:42 AM, uriz.49...@e.unavarra.es wrote:
The processes are running on the remote nodes but they don't give the
response to the origin node. I don't know why.
With the option --mca btl_base_verbose 30, I have the same problems
and
it
doesn't show any message.
Thanks
On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres <jsquy...@cisco.com>
wrote:
On Mar 17, 2010, at 4:39 AM, <uriz.49...@e.unavarra.es> wrote:
Hi everyone I'm a new Open MPI user and I have just installed Open
MPI
in
a 6 nodes cluster with Scientific Linux. When I execute it in local
it
works perfectly, but when I try to execute it on the remote nodes
with
the
--host option it hangs and gives no message. I think that the
problem
could be with the shared libraries but i'm not sure. In my opinion
the
problem is not ssh because i can access to the nodes with no
password
You might want to check that Open MPI processes are actually running
on
the remote nodes -- check with ps if you see any "orted" or other
MPI-related processes (e.g., your processes).
Do you have any TCP firewall software running between the nodes? If
so,
you'll need to disable it (at least for Open MPI jobs).
I also recommend running mpirun with the option --mca
btl_base_verbose
30 to troubleshoot tcp issues.
In some environments, you need to explicitly tell mpirun what network
interfaces it can use to reach the hosts. Read the following FAQ
section for more information:
http://www.open-mpi.org/faq/?category=tcp
Item 7 of the FAQ might be of special interest.
Regards,
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users