> Am 09.01.2019 um 23:39 schrieb Derrick Lin <klin...@gmail.com>: > > Hi Reuti and Iyad, > > Here is my prolog script, it just does one thing, setting quota on the XFS > volume for each job: > > The prolog_exec_xx_xx.log file was generated, so I assumed the first exec > command got executed. > > Since the generated log file is empty, I think nothing was executed after > that. > > Cheers > > [root@zeta-4-12 common]# cat prolog_exec.sh > #!/bin/sh
Are the shells the same, i.e. same version? Maybe you can alos use the full path /bin/bash here, as /bin/sh will also switch on some kind of compatibility mode to the original sh in case bash in invoked by this name. -- Reuti > > exec >> /tmp/prolog_exec_"$JOB_ID"_"$SGE_TASK_ID".log > exec 2>&1 > > SGE_TMP_ROOT="/scratch_local" > > pe_num=$(cat $PE_HOSTFILE | grep $HOSTNAME | awk '{print $2}') > > tmp_req_var=$(echo "$tmp_requested" | grep -o -E '[0-9]+') > tmp_req_unit=$(echo "$tmp_requested" | sed 's/[0-9]*//g') > > if [ -z "$pe_num" ]; then > quota=$tmp_requested > else > quota=$(expr $tmp_req_var \* $pe_num)$tmp_req_unit > fi > > echo "############################# [$HOSTNAME PROLOG] - JOB_ID:$JOB_ID > TASK_ID:$SGE_TASK_ID #############################" > echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'project -s -p $TMP $JOB_ID' > $SGE_TMP_ROOT" > echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'limit -p bhard=$quota > $JOB_ID' $SGE_TMP_ROOT" > > xfs_quota_rc=0 > > /usr/sbin/xfs_quota -x -c "project -s -p $TMP $JOB_ID" $SGE_TMP_ROOT > ((xfs_quota_rc+=$?)) > > /usr/sbin/xfs_quota -x -c "limit -p bhard=$quota $JOB_ID" $SGE_TMP_ROOT > ((xfs_quota_rc+=$?)) > > if [ $xfs_quota_rc -eq 0 ]; then > exit 0 > else > exit 100 # Put job in error state > fi > > > On Wed, Jan 9, 2019 at 7:36 PM Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > > Am 09.01.2019 um 01:14 schrieb Derrick Lin <klin...@gmail.com>: > > > > Hi guys, > > > > I just brought up a new SGE cluster, but somehow the qrsh session does not > > work: > > > > tester@login-gpu:~$ qrsh > > ^Cerror: error while waiting for builtin IJS connection: "got select > > timeout" > > > > after I hit entered, the session just stuck there forever instead of bring > > me to a compute node. I have to entered Crtl+c to terminate and it gave the > > above error. > > > > I noticed, the SGE did send my qrsh request to a compute node as I could > > tell from qstat: > > > > --------------------------------------------------------------------------------- > > short.q@zeta-4-15.local BIP 0/1/80 0.01 lx-amd64 > > 15 0.55500 QRLOGIN tester r 01/09/2019 10:47:13 1 > > > > We have a prolog script configured globally, the script deals with local > > disk quota and keep all output to a log file for each job. So I went to > > that compute node, and check, found that a log file was created but it was > > empty. > > > > So my thinking so far is, my qrsh stuck because the prolog script is not > > fully executed. > > Is there any statement in the prolog, which could wait for stdin – and in a > batch job there is just no stdin, hence it continues? Could be tested with > "-i" to a batch job. > > -- Reuti > > > > qsub job are working fine. > > > > Any idea will be appreciated > > > > Cheers, > > Derrick > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users