Hi Reuti, all,

On 21 April 2017 at 18:21, Reuti <re...@staff.uni-marburg.de> wrote:

> > I want to switch from using qrsh directly to using a wrapper ('qrshx')
> > that gives me a session in which all the env vars set in qsub/qsh
> > sessions (e.g. JOB_ID) are defined:
> >
> > $ cat /usr/local/scripts/qrshx
> > #!/bin/sh
> > # ...
> > exec qrsh $( [ -z ${DISPLAY+x} ] || echo '-v DISPLAY' ) -pty y "$@"
> $SHELL
> > (from https://gist.github.com/willfurnass/10277756070c4f374e6149a2813248
> 41)
> >
> > However, I find that using qrshx that unless I specify '-now n' I
> > don't get a session but attempts to start a qrsh session directly with
> > the same resource requests succeed.
> >
> > [te1st@sharc-login1 ~]$ qrsh -P rse -l gpu=1
> > [te1st@sharc-node126 ~]$ # works
>
> Without a command, it will go to "qtype INTERACTIVE"
>
> > [te1st@sharc-login1 ~]$ qrshx -P rse -l gpu=1
> > [te1st@sharc-login1 ~]$ # failed
>
> This has a command: $SHELL, and will go to "qtype BATCH"
>
> > [te1st@sharc-login1 ~]$ qrshx -P rse -l gpu=1 -now n
> > [te1st@sharc-node126 ~]$ # works
>
> Same here.
>
> Do you have more than one queue in the cluster? Do you use a JSV which
> could influence this behavior?


I've gotten to the bottom of the issue, and it wasn't due to our JSV: `qrshx
-P rse -l gpu=1` was triggering a SGE prolog script that took ~5s to run,
causing some kind of timeout.  I've made the prolog script much faster [1]
and now my `qrshx` script works can start interactive sessions. Going to
promote qrshx to our users as a nicer alternative to qrsh/qsh.

[1] I'm now using /proc/driver/nvidia/gpus to determine the number of
NVIDIA GPUs on the execution host as `nvidia-smi -L` is very slow.

Cheers,

Will


-- 
Dr Will Furnass | Research Software Engineer
Dept of Computer Science | University of Sheffield
https://rse.shef.ac.uk | @willfurnass | +44 (0)114 22 21872
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to