I have an OGS 2011.11p1 cluster. The primary submit host is a separate machine 
from the queue master. When I try to use qrsh from the submit node, I get a 
commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86 is a 
compute node):

$ qrsh  -verbose
Your job 590725 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590725 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini86 ...
error: commlib error: local host name error (IP based host name resolving 
"Levi-Montalcini01" doesn't match client host name from connect message 
"Levi-Montalcini86")
$

When I use qrsh from the queue master, it works fine:

$ qrsh -verbose
Your job 590750 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590750 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini88 ...
Levi-Montalcini88|~>

During the failed attempt, I see traffic from the compute node back to the 
queue master, but no traffic to the submit node from either the queue master or 
the compute node. Is qrsh from a separate submit node expected to work? Thanks,

John

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to