I set up ompi before I configured Torque.  Do I need to recompile ompi
with appropriate torque configure options to get better integration?  

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Friday, July 27, 2007 12:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] torque and openmpi

Are you not using the built-in OMPI support for Torque?  The ssh keys  
should be irrelevant if using the TM API in Torque (i.e., OMPI won't  
be using ssh to launch remote processes; we use the internal TM API  
in Torque).


On Jul 27, 2007, at 11:38 AM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I deleted all of the entries out of the know_hosts file, but that  
> didn't
> seem to help.  I can run jobs just fine without torque on multiple
> nodes.  I can also ssh to all nodes without using passwords, so I  
> am not
> sure what the deal is.
>
> ...
>
> Okay, I found the problem.  The keys that I had in know_hosts were for
> only the hostname i.e. prodnode2; whereas, the hostname that torque  
> was
> using were fully qualified names i.e. prodnode2.brooks.af.mil and the
> keys did not exist for the fully qualified names.
>
> Thanks for the help.
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
> mpi.org] On
> Behalf Of George Bosilca
> Sent: Friday, July 27, 2007 10:13 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] torque and openmpi
>
> The key is in the first line of the provided output. One of the
> connection failed because a wrong ssh key. Clean your .ssh/
> known_hosts and the problem will vanish.
>
>    Thanks,
>      george.
>
> On Jul 27, 2007, at 11:01 AM, Adams, Samuel D Contr AFRL/HEDR wrote:
>
>> When I run jobs with torque, I get this error message.  Any ideas?
>>
>> [sam@prodnode1 all]$ cat script.sh.err
>> Host key verification failed.
>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>> file
>> base/pls_base_orted_cmds.c at line 275
>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>> file
>> pls_rsh_module.c at line 1164
>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>> file
>> errmgr_hnp.c at line 90
>> [prodnode3.brooks.af.mil:03321] ERROR: A daemon on node
>> prodnode2.brooks.af.mil failed to start as expected.
>> [prodnode3.brooks.af.mil:03321] ERROR: There may be more information
>> available from
>> [prodnode3.brooks.af.mil:03321] ERROR: the remote shell (see above).
>> [prodnode3.brooks.af.mil:03321] ERROR: The daemon exited unexpectedly
>> with status 255.
>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>> file
>> base/pls_base_orted_cmds.c at line 188
>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>> file
>> pls_rsh_module.c at line 1196
>> ---------------------------------------------------------------------

>> -
>
>> --
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> ---------------------------------------------------------------------

>> -
>
>> --
>> --
>>
>> Sam Adams
>> General Dynamics Information Technology
>> Phone: 210.536.5945
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to