Re: [gridengine users] qrsh session failed to execute prolog script?

Reuti Thu, 10 Jan 2019 03:11:01 -0800

> Am 09.01.2019 um 23:39 schrieb Derrick Lin <klin...@gmail.com>:
> 
> Hi Reuti and Iyad,
> 
> Here is my prolog script, it just does one thing, setting quota on the XFS 
> volume for each job:
> 
> The prolog_exec_xx_xx.log file was generated, so I assumed the first exec 
> command got executed. 
> 
> Since the generated log file is empty, I think nothing was executed after 
> that.
> 
> Cheers
> 
> [root@zeta-4-12 common]# cat prolog_exec.sh
> #!/bin/sh


Are the shells the same, i.e. same version? Maybe you can alos use the full 
path /bin/bash here, as /bin/sh will also switch on some kind of compatibility 
mode to the original sh in case bash in invoked by this name.

-- Reuti

> 
> exec >> /tmp/prolog_exec_"$JOB_ID"_"$SGE_TASK_ID".log
> exec 2>&1
> 
> SGE_TMP_ROOT="/scratch_local"
> 
> pe_num=$(cat $PE_HOSTFILE | grep $HOSTNAME | awk '{print $2}')
> 
> tmp_req_var=$(echo "$tmp_requested" | grep -o -E '[0-9]+')
> tmp_req_unit=$(echo "$tmp_requested" | sed 's/[0-9]*//g')
> 
> if [ -z "$pe_num" ]; then
>         quota=$tmp_requested
> else
>         quota=$(expr $tmp_req_var \* $pe_num)$tmp_req_unit
> fi
> 
> echo "############################# [$HOSTNAME PROLOG] - JOB_ID:$JOB_ID 
> TASK_ID:$SGE_TASK_ID #############################"
> echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'project -s -p $TMP $JOB_ID' 
> $SGE_TMP_ROOT"
> echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'limit -p bhard=$quota 
> $JOB_ID' $SGE_TMP_ROOT"
> 
> xfs_quota_rc=0
> 
> /usr/sbin/xfs_quota -x -c "project -s -p $TMP $JOB_ID" $SGE_TMP_ROOT
> ((xfs_quota_rc+=$?))
> 
> /usr/sbin/xfs_quota -x -c "limit -p bhard=$quota $JOB_ID" $SGE_TMP_ROOT
> ((xfs_quota_rc+=$?))
> 
> if [ $xfs_quota_rc -eq 0 ]; then
>         exit 0
> else
>         exit 100 # Put job in error state
> fi
> 
> 
> On Wed, Jan 9, 2019 at 7:36 PM Reuti <re...@staff.uni-marburg.de> wrote:
> Hi,
> 
> > Am 09.01.2019 um 01:14 schrieb Derrick Lin <klin...@gmail.com>:
> > 
> > Hi guys,
> > 
> > I just brought up a new SGE cluster, but somehow the qrsh session does not 
> > work:
> > 
> > tester@login-gpu:~$ qrsh
> > ^Cerror: error while waiting for builtin IJS connection: "got select 
> > timeout"
> > 
> > after I hit entered, the session just stuck there forever instead of bring 
> > me to a compute node. I have to entered Crtl+c to terminate and it gave the 
> > above error.
> > 
> > I noticed, the SGE did send my qrsh request to a compute node as I could 
> > tell from qstat:
> > 
> > ---------------------------------------------------------------------------------
> > short.q@zeta-4-15.local        BIP   0/1/80         0.01     lx-amd64
> >      15 0.55500 QRLOGIN    tester       r    01/09/2019 10:47:13     1
> > 
> > We have a prolog script configured globally, the script deals with local 
> > disk quota and keep all output to a log file for each job. So I went to 
> > that compute node, and check, found that a log file was created but it was 
> > empty. 
> > 
> > So my thinking so far is, my qrsh stuck because the prolog script is not 
> > fully executed.
> 
> Is there any statement in the prolog, which could wait for stdin – and in a 
> batch job there is just no stdin, hence it continues? Could be tested with 
> "-i" to a batch job.
> 
> -- Reuti
> 
> 
> > qsub job are working fine.
> > 
> > Any idea will be appreciated 
> > 
> > Cheers,
> > Derrick
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qrsh session failed to execute prolog script?

Reply via email to