-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Am 17.02.2017 um 21:55 schrieb Raymond Norris:

>> Hi,
>> 
>>> Am 17.02.2017 um 18:54 schrieb Raymond Norris
>> <raymond.nor...@mathworks.com>:
>>> 
>>> Hi,
>>> 
>>> I'm using SoGE 8.1.9.
>>> 
>>> When we setup our jobs to run with loose integration and SSH, things work
>> fine.  I'm now trying to configure this with tight integration and qrsh.  
>> Sans a
>> dozen or so ports, everything's closed.  My understanding is that qrsh uses
>> rsh underneath and that we need to open port 514, which we've done.
>> However, we get the following:
>> 
>> No. If you set:
>> 
>> $ qconf -mconf
>> …
>> rsh_command                  builtin
>> rsh_daemon                   builtin
>> 
>> it uses its own protocol. Even when set to `rsh` resp.  `rshd`, it will in 
>> both
>> cases use a random port, not 514.
>> 
>> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html might
>> give some insight.
>> 
>> 
>>> qrsh -inherit -verbose node-01 smpd -s -phrase ... -port ...
>> 
>> This looks like an old startup of MPICH2. You don't use Hydra?
> 
> We ship MPICH2 with our application, so I'm starting there.  I've built it 
> with Hydra, but it fails.

smpd startup is outdated:

https://arc.liv.ac.uk/SGE/howto/mpich2-integration/mpich2-integration.html

Which exact version of MPICH are you using in detail?

To ship something with your application you can also try Open MPI or IBM's 
platform MPI in the community edition: 
https://www.ibm.com/developerworks/downloads/im/mpi/


>  I'm guessing, as Simao Afonso references, each ssh connection requires a 
> different ephemeral port.

Correct.


>>> Starting server daemon at host "node-01"
>>> error: executing task of job 1032 failed: failed sending task to execd@node-
>> 01: got send error
>>> error: commlib error: got select error (No route to host)
>>> 
>>> To me, that tells me we don't have 514 open ("No route to host").  Also, I
>> think the issue is that we're not running rshd on node-01, which I'm not sure
>> if they're ok with running - which we'd need to, righ?  So my question is, 
>> how
>> do I get qrsh working so that I can track resources, but configure it with 
>> SSH -
>> or can this not work - do I need to rsh?
>> 
>> See the above provided link, `ssh` is also covered near the end of the
>> document.
> 
> I can test the PAM, but in the end, will I be in the same situation where 
> ssh/sshd will use an array of ports?  As it stands now, keeping most of the 
> ports closed, these are my options
> 
> 1. loose integration with SSH: works

Startup works as it uses port 22 only. It doesn't need any additional ports for 
communication during operation later on?


> 2. tight integration with Hydra: fails -- needs ephemeral ports

Yep.


> 3. tight integration with qrsh: fails -- needs ephemeral ports

Yep.


> 4. tight integration with qrsh, using SSH: not tested yet (needs ephemeral 
> ports?)

Yes.


> 5. recompile SoGE w/ tight SSH

Doesn't change the original problem. The idea to recompile SoGE with SSH is to 
include an additional group ID for proper accounting and job termination (even 
when the process jumps out of the process tree) even for processes started by 
the job's private `sshd`.


> Am I overlooking any other options?

I see additional problems: at least Open MPI will use a range of ports on its 
own after it was started (by whatever means). I would assume that the other MPI 
implementations behave the same.

As we have no firewalls on the nodes (just disabled `ssh` for ordinary users 
for the default port 22), I never faced that MPICH may or may not need 
additional ports.

- -- Reuti
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAlinaywACgkQo/GbGkBRnRrthACg3yLABkSxDwNNZUl9ShylsMF3
FyUAniQknu4WMQ4XYdp4HqCqoibvR9qN
=aYZj
-----END PGP SIGNATURE-----
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to