I changed the SELinux config to permissive (log only), and it didn't change anything. Back to the drawing board.

Robert Collyer wrote:
I've been having similar problems using Fedora core 9. I believe the issue may be with SELinux, but this is just an educated guess. In my setup, shortly after a login via mpi, there is a notation in the /var/log/messages on the compute node as follows:

Mar 30 12:39:45 <node_name> kernel: type=1400 audit(1269970785.534:588): avc: denied { read } for pid=8047 comm="unix_chkpwd" name="hosts" dev=dm-0 ino=24579 scontext=system_u:system_r:system_chkpwd_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:etc_runtime_t:s0 tclass=file

which says SELinux denied unix_chkpwd read access to hosts.
Are you getting anything like this?

In the meantime, I'll check if allowing unix_chkpwd read access to hosts eliminates the problem on my system, and if it works, I'll post the steps involved.

uriz.49...@e.unavarra.es wrote:
I've benn investigating and there is no firewall that could stop TCP
traffic in the cluster. With the option --mca plm_base_verbose 30 I get
the following output:

[itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host itanium2
helloworld.out
[itanium1:08311] mca: base: components_open: Looking for plm components
[itanium1:08311] mca: base: components_open: opening plm components
[itanium1:08311] mca: base: components_open: found loaded component rsh
[itanium1:08311] mca: base: components_open: component rsh has no register
function
[itanium1:08311] mca: base: components_open: component rsh open function
successful
[itanium1:08311] mca: base: components_open: found loaded component slurm
[itanium1:08311] mca: base: components_open: component slurm has no
register function
[itanium1:08311] mca: base: components_open: component slurm open function
successful
[itanium1:08311] mca:base:select: Auto-selecting plm components
[itanium1:08311] mca:base:select:(  plm) Querying component [rsh]
[itanium1:08311] mca:base:select:(  plm) Query of component [rsh] set
priority to 10
[itanium1:08311] mca:base:select:(  plm) Querying component [slurm]
[itanium1:08311] mca:base:select:( plm) Skipping component [slurm]. Query
failed to return a module
[itanium1:08311] mca:base:select:(  plm) Selected component [rsh]
[itanium1:08311] mca: base: close: component slurm closed
[itanium1:08311] mca: base: close: unloading component slurm

--Hangs here

It seems a slurm problem??

Thanks to any idea

El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió:
Did you configure OMPI with --enable-debug? You should do this so that
more diagnostic output is available.

You can also add the following to your cmd line to get more info:

--debug --debug-daemons --leave-session-attached

Something is likely blocking proper launch of the daemons and processes so
you aren't getting to the btl's at all.


On Mar 19, 2010, at 9:42 AM, uriz.49...@e.unavarra.es wrote:

The processes are running on the remote nodes but they don't give the
response to the origin node. I don't know why.
With the option --mca btl_base_verbose 30, I have the same problems and
it
doesn't show any message.

Thanks

On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres <jsquy...@cisco.com>
wrote:
On Mar 17, 2010, at 4:39 AM, <uriz.49...@e.unavarra.es> wrote:

Hi everyone I'm a new Open MPI user and I have just installed Open
MPI
in
a 6 nodes cluster with Scientific Linux. When I execute it in local
it
works perfectly, but when I try to execute it on the remote nodes
with
the
--host  option it hangs and gives no message. I think that the
problem
could be with the shared libraries but i'm not sure. In my opinion
the
problem is not ssh because i can access to the nodes with no password
You might want to check that Open MPI processes are actually running
on
the remote nodes -- check with ps if you see any "orted" or other
MPI-related processes (e.g., your processes).

Do you have any TCP firewall software running between the nodes?  If
so,
you'll need to disable it (at least for Open MPI jobs).
I also recommend running mpirun with the option --mca btl_base_verbose
30 to troubleshoot tcp issues.

In some environments, you need to explicitly tell mpirun what network
interfaces it can use to reach the hosts. Read the following FAQ
section for more information:

http://www.open-mpi.org/faq/?category=tcp

Item 7 of the FAQ might be of special interest.

Regards,

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to