Hi, Am 12.05.2020 um 23:27 schrieb Mun Johl:
> Hi, > > Just some additional testing results ... > > Our IT guy turned off the firewall on a Submit Host and Execution Host for > experimental purposes. That got me further but not all the way. Here is the > verbose log from qrsh: > > waiting for interactive job to be scheduled ... > Your interactive job 460937 has been successfully scheduled. > Establishing /usr/bin/ssh -X session to host sim.domain.com ... > ssh_exchange_identification: Connection closed by remote host > /usr/bin/ssh -X exited with exit code 255 > reading exit code from shepherd ... 129 > > We aren't yet able to get around the ssh -X error. Any ideas? But a plain `ssh`to the nodes work? In case a different hostname must be used, there is an option "HostbasedUsesNameFromPacketOnly" in "sshd_config". > But even if we could, we still need to figure out which ports of the firewall > need to be opened up. Every time we ran an experiment, the port number that > was used for SSH was different. I hope we don't have to open up too big a > range of ports. Unfortunately the port is randomly chosen with any new connection. But wouldn't it be possible to adjust the firewall to allow all ports only when connecting from the nodes in the cluster (are the nodes in a VLAN behind a head node or all submit machines and nodes also connected to the Internet?) Also in SSH itself it is possible with the "match" option in "sshd_config" to allow only certain users from certain nodes. Nevertheless: maybe adding "-v" to the `ssh` command will output additional info, also the messages of `sshd` might be in some log file. -- Reuti > Feedback would be welcomed. > > Best regards, > > -- > Mun > > > >> -----Original Message----- >> Hi William, et al., >> >>> On Mon, May 11, 2020 at 09:30:14PM +0000, Mun Johl wrote: >>>> Hi William, et al., >>>> [Mun] Thanks for the tip; I'm still trying to get back to where I can >>>> launch qsrh again. Even after I put the requisite >> /etc/pam.d/sshd >>> line at the head of the file I'm still getting the "Your "qrsh" request >>> could not be scheduled, try again later." message for some >> reason. >>> But I will continue to debug that issue. >>> >>> The pam_sge-qrsh-setup.so shouldn't have anything to do with this since >>> the message occurs before any attempt to launch the job. You could try >>> running a qrsh -w p or and/or qrsh -w v to get a report on why the qrsh >>> isn't being scheduled. They aren't always easy to read and -w v doesn't >>> reliably ignore exclusive vars in use but can nevertheless be helpful. >> >> [Mun] With 'qrsh -w p' and 'qrsh -w v' I got the following output: >> verification: found suitable queue(s) >> >> I then replaced the -w option with -verbose which produced the following >> output: >> >> waiting for interactive job to be scheduled ...timeout (54 s) expired while >> waiting on socket fd 4 >> Your "qrsh" request could not be scheduled, try again later. >> >> I have no idea what is meant by "socket fd 4"; but that leads me to believe >> we have some sort of blocked port or something. >> >> Are there any additional ports that need to be opened up in order to use >> 'qrsh & ssh -X' ? >> >> One last noteworthy item that recently occurred to me is that when SGE was >> initially installed on our servers, we had a different >> domain name. Late last year we were acquired and our domain changed. >> However, our /etc/hosts still has the old domain simply >> because SGE couldn't deal with the change in the domain--or rather, it was >> the easiest course of action for me to take and keep SGE >> working. I wonder if that is in some way interfering with 'qrsh & ssh -X'? >> >> I am going to try and do some additional debug today and will report any >> progress. >> >> Thank you and regards, >> >> -- >> Mun _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users