-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi,
Am 17.02.2017 um 21:55 schrieb Raymond Norris: >> Hi, >> >>> Am 17.02.2017 um 18:54 schrieb Raymond Norris >> <raymond.nor...@mathworks.com>: >>> >>> Hi, >>> >>> I'm using SoGE 8.1.9. >>> >>> When we setup our jobs to run with loose integration and SSH, things work >> fine. I'm now trying to configure this with tight integration and qrsh. >> Sans a >> dozen or so ports, everything's closed. My understanding is that qrsh uses >> rsh underneath and that we need to open port 514, which we've done. >> However, we get the following: >> >> No. If you set: >> >> $ qconf -mconf >> … >> rsh_command builtin >> rsh_daemon builtin >> >> it uses its own protocol. Even when set to `rsh` resp. `rshd`, it will in >> both >> cases use a random port, not 514. >> >> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html might >> give some insight. >> >> >>> qrsh -inherit -verbose node-01 smpd -s -phrase ... -port ... >> >> This looks like an old startup of MPICH2. You don't use Hydra? > > We ship MPICH2 with our application, so I'm starting there. I've built it > with Hydra, but it fails. smpd startup is outdated: https://arc.liv.ac.uk/SGE/howto/mpich2-integration/mpich2-integration.html Which exact version of MPICH are you using in detail? To ship something with your application you can also try Open MPI or IBM's platform MPI in the community edition: https://www.ibm.com/developerworks/downloads/im/mpi/ > I'm guessing, as Simao Afonso references, each ssh connection requires a > different ephemeral port. Correct. >>> Starting server daemon at host "node-01" >>> error: executing task of job 1032 failed: failed sending task to execd@node- >> 01: got send error >>> error: commlib error: got select error (No route to host) >>> >>> To me, that tells me we don't have 514 open ("No route to host"). Also, I >> think the issue is that we're not running rshd on node-01, which I'm not sure >> if they're ok with running - which we'd need to, righ? So my question is, >> how >> do I get qrsh working so that I can track resources, but configure it with >> SSH - >> or can this not work - do I need to rsh? >> >> See the above provided link, `ssh` is also covered near the end of the >> document. > > I can test the PAM, but in the end, will I be in the same situation where > ssh/sshd will use an array of ports? As it stands now, keeping most of the > ports closed, these are my options > > 1. loose integration with SSH: works Startup works as it uses port 22 only. It doesn't need any additional ports for communication during operation later on? > 2. tight integration with Hydra: fails -- needs ephemeral ports Yep. > 3. tight integration with qrsh: fails -- needs ephemeral ports Yep. > 4. tight integration with qrsh, using SSH: not tested yet (needs ephemeral > ports?) Yes. > 5. recompile SoGE w/ tight SSH Doesn't change the original problem. The idea to recompile SoGE with SSH is to include an additional group ID for proper accounting and job termination (even when the process jumps out of the process tree) even for processes started by the job's private `sshd`. > Am I overlooking any other options? I see additional problems: at least Open MPI will use a range of ports on its own after it was started (by whatever means). I would assume that the other MPI implementations behave the same. As we have no firewalls on the nodes (just disabled `ssh` for ordinary users for the default port 22), I never faced that MPICH may or may not need additional ports. - -- Reuti -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAlinaywACgkQo/GbGkBRnRrthACg3yLABkSxDwNNZUl9ShylsMF3 FyUAniQknu4WMQ4XYdp4HqCqoibvR9qN =aYZj -----END PGP SIGNATURE----- _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss