Hi Götz,

I have tried using SSH instead of rsh before but I didn't use with the kerberos auth. I can see you've tried to run qrsh -inherit via ssh already before the mpirun line and verify the connection works.

I believe the "Permission denied, please try again." message is coming from ssh daemon (sshd) on geminide2 and 7 that are preventing the connections from geminide8, which in turns they cause orted not able to launch on those 2 nodes.

Can you enable debug for sshd (with either -d or -ddd) on the SGE cluster config with qconf -mconf, to see why the sshd sometimes blocking ssh connection? You may get tons of outputs but it should show you the reason why the permission is denied. It could be the setting in sshd_config or something else we don't know about yet.

Götz Waschk wrote:
Hello everyone,

I have trouble with the Gridengine integration of openmpi. When I run
a job with only 4 processes, it runs fine. With more processes, mpirun
sometimes fails to connect to the remote nodes, the qrsh calls fail.

I'll attach a job script and the error output. As you can see from the
'for' loop, I can connect to all nodes just fine, it is the qrsh
executed by mpirun that fails. Qrsh was configured to run ssh with
kerberos authentification (ssh -tt -o GSSAPIDelegateCredentials=no).

My versions are openmpi 1.2.2, SGE 6.0u9, RHEL5. Any idea where the
problem could be?

Regards, Götz Waschk


------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--

- Pak Lui
pak....@sun.com

Reply via email to