I finally tracked this one down. The global conf had a default value of 0 for 
h_vmem + requestable and consumable, and apparently GridEngine does not like 
that. Nothing in the docs says it should be anything other than 0. A user 
reporte that qrsh worked with an -l h_vmem of 10G so I did the logical noodle 
dance and put a default value of 1G in the global conf. Now it works. 

Somebody explain why the error message related to this has absolutely, positive 
NOTHING to do with the root cause?

Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800

From: Reuti [re...@staff.uni-marburg.de]
Sent: Tuesday, May 30, 2017 11:36
To: Jimenez, Juan Esteban
Cc: SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Another QRSH problem

> Am 30.05.2017 um 11:32 schrieb juanesteban.jime...@mdc-berlin.de:
> Ok, now I understand how this works and what the limitations and advantages 
> of each option are. Our users are not forwarding X so we will try to go back 
> to built-in and see what happens. Does the qmaster need to be restarted when 
> I made the change to the conf?

No. These entries are interpreted live. Just wait one or two minutes after the 
change before you use it, until all exechosts honor the new setting.

-- Reuti

> Mfg,
> Juan Jimenez
> System Administrator, BIH HPC Cluster
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
> On 29.05.17, 19:45, "SGE-discuss on behalf of 
> juanesteban.jime...@mdc-berlin.de" <sge-discuss-boun...@liverpool.ac.uk on 
> behalf of juanesteban.jime...@mdc-berlin.de> wrote:
>    How is the sheperd bring up this separate sshd daemon? What arguments are 
> being used?
>    Mfg,
>    Juan Jimenez
>    System Administrator, HPC
>    MDC Berlin / IT-Dept.
>    Tel.: +49 30 9406 2800
>    ________________________________________
>    From: Reuti [re...@staff.uni-marburg.de]
>    Sent: Monday, May 29, 2017 18:14
>    To: Jimenez, Juan Esteban
>    Cc: SGE-discuss@liv.ac.uk
>    Subject: Re: [SGE-discuss] Another QRSH problem
>> Am 29.05.2017 um 18:00 schrieb juanesteban.jime...@mdc-berlin.de:
>> On 29.05.17, 17:56, "Reuti" <re...@staff.uni-marburg.de> wrote:
>>> Am 29.05.2017 um 17:26 schrieb juanesteban.jime...@mdc-berlin.de:
>>> I am getting this very specific error:
>>> debug1: ssh_exchange_identification: /usr/sbin/sshd: error while loading 
>>> shared libraries: libselinux.so.1: failed to map segment from shared object
>>>  I don't have a specific idea as this seems to be a permission problem. Are 
>>> you running selinux and could disable it?
>> SELINUX is disabled on all nodes.
>    Ok.
>>> However, ssh works fine outside of qrsh. Every single test succeeds, from 
>>> all nodes to all nodes.
>>>  This is usually handled by the default running `sshd` on port 22, but the 
>>> one started by SGE runs on a different port.
>> Started where??? On the node where the qrsh will be sent, by the exec daemon?
>    It's the shepherd who will start it.
>> That’s a heck of a big clue! Is there a way to disable this and use the 
>> existing sshd?
>    Not in the default setting.
>    sgeadmin root     /usr/sge/bin/lx24-em64t/sge_execd
>    sgeadmin root      \_ sge_shepherd-224557 -bg
>    root     root          \_ sshd: reuti [priv]
>    reuti    reuti             \_ sshd: reuti@pts/0
>    reuti    reuti                 \_ -bash
>    reuti    reuti                     \_ ps -e f -o user,ruser,command
>    This is completely unrelated to the default `sshd` to log in on port 22. 
> In the standard configuration it will use the same config files though.
>    ==
>    You could try to use wrappers for both entries and ignore the port, but 
> then you will lose job control and accounting (I have no clue whether this 
> will work). The detailed startup is explained here:
>    https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
>    -- Reuti
>    _______________________________________________
>    SGE-discuss mailing list
>    SGE-discuss@liv.ac.uk
>    https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

SGE-discuss mailing list

Reply via email to