I set up ompi before I configured Torque. Do I need to recompile ompi with appropriate torque configure options to get better integration?
Sam Adams General Dynamics Information Technology Phone: 210.536.5945 -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Friday, July 27, 2007 12:14 PM To: Open MPI Users Subject: Re: [OMPI users] torque and openmpi Are you not using the built-in OMPI support for Torque? The ssh keys should be irrelevant if using the TM API in Torque (i.e., OMPI won't be using ssh to launch remote processes; we use the internal TM API in Torque). On Jul 27, 2007, at 11:38 AM, Adams, Samuel D Contr AFRL/HEDR wrote: > I deleted all of the entries out of the know_hosts file, but that > didn't > seem to help. I can run jobs just fine without torque on multiple > nodes. I can also ssh to all nodes without using passwords, so I > am not > sure what the deal is. > > ... > > Okay, I found the problem. The keys that I had in know_hosts were for > only the hostname i.e. prodnode2; whereas, the hostname that torque > was > using were fully qualified names i.e. prodnode2.brooks.af.mil and the > keys did not exist for the fully qualified names. > > Thanks for the help. > > Sam Adams > General Dynamics Information Technology > Phone: 210.536.5945 > > -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-bounces@open- > mpi.org] On > Behalf Of George Bosilca > Sent: Friday, July 27, 2007 10:13 AM > To: Open MPI Users > Subject: Re: [OMPI users] torque and openmpi > > The key is in the first line of the provided output. One of the > connection failed because a wrong ssh key. Clean your .ssh/ > known_hosts and the problem will vanish. > > Thanks, > george. > > On Jul 27, 2007, at 11:01 AM, Adams, Samuel D Contr AFRL/HEDR wrote: > >> When I run jobs with torque, I get this error message. Any ideas? >> >> [sam@prodnode1 all]$ cat script.sh.err >> Host key verification failed. >> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in >> file >> base/pls_base_orted_cmds.c at line 275 >> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in >> file >> pls_rsh_module.c at line 1164 >> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in >> file >> errmgr_hnp.c at line 90 >> [prodnode3.brooks.af.mil:03321] ERROR: A daemon on node >> prodnode2.brooks.af.mil failed to start as expected. >> [prodnode3.brooks.af.mil:03321] ERROR: There may be more information >> available from >> [prodnode3.brooks.af.mil:03321] ERROR: the remote shell (see above). >> [prodnode3.brooks.af.mil:03321] ERROR: The daemon exited unexpectedly >> with status 255. >> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in >> file >> base/pls_base_orted_cmds.c at line 188 >> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in >> file >> pls_rsh_module.c at line 1196 >> --------------------------------------------------------------------- >> - > >> -- >> -- >> mpirun was unable to cleanly terminate the daemons for this job. >> Returned value Timeout instead of ORTE_SUCCESS. >> >> --------------------------------------------------------------------- >> - > >> -- >> -- >> >> Sam Adams >> General Dynamics Information Technology >> Phone: 210.536.5945 >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users