Hi everybody! I hope this list is the right place for my problem concerning OpenMPI with Sun Gridengine. I'm running OpenMPI with gridengine support:
MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7) on 4 Debian Lenny system with Sun Gridengine 6.2. I've written a small test program which only displays the hostname of each MPI process its running on and start this via a simple script with a submit by qsub: #!/bin/bash #$ -V ### number of processors and parallel environment #$ -pe sol 32 ### Job name #$ -N "mpi_test" ### Start from current working directory #$ -cwd #$ -l arch=lx26-amd64 /usr/bin/mpirun.openmpi --mca pls_gridengine_verbose 1 -v ~/grid/mpi_test/main The gridengine starts the jobs, but fails with Host key verification failed. in the logfiles: local configuration sol2.XXX not defined - using global configuration Starting server daemon at host "sol2.XXX" Starting server daemon at host "sol3.XXX" Starting server daemon at host "sol4.XXX" Starting server daemon at host "sol1.XXX" Server daemon successfully started with task id "1.sol2" Server daemon successfully started with task id "1.sol4" Server daemon successfully started with task id "1.sol1" Server daemon successfully started with task id "1.sol3" Establishing /usr/bin/ssh session to host sol2.XXX ... Host key verification failed. /usr/bin/ssh exited with exit code 255 reading exit code from shepherd ... 129 [sol2:22892] ERROR: A daemon on node sol2.XXX failed to start as expected. [sol2:22892] ERROR: There may be more information available from [sol2:22892] ERROR: the 'qstat -t' command on the Grid Engine tasks. [sol2:22892] ERROR: If the problem persists, please restart the [sol2:22892] ERROR: Grid Engine PE job [sol2:22892] ERROR: The daemon exited unexpectedly with status 129. ... The host keys for all 4 solX hosts are in the known_hosts file of the user submitting the job and of the known_hosts file of root. Any hints why this could go wrong? Regards Tobias