I have an OGS 2011.11p1 cluster. The primary submit host is a separate machine from the queue master. When I try to use qrsh from the submit node, I get a commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86 is a compute node):
$ qrsh -verbose Your job 590725 ("QRLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 590725 has been successfully scheduled. Establishing builtin session to host Levi-Montalcini86 ... error: commlib error: local host name error (IP based host name resolving "Levi-Montalcini01" doesn't match client host name from connect message "Levi-Montalcini86") $ When I use qrsh from the queue master, it works fine: $ qrsh -verbose Your job 590750 ("QRLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 590750 has been successfully scheduled. Establishing builtin session to host Levi-Montalcini88 ... Levi-Montalcini88|~> During the failed attempt, I see traffic from the compute node back to the queue master, but no traffic to the submit node from either the queue master or the compute node. Is qrsh from a separate submit node expected to work? Thanks, John
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users