Hi Reuti and Iyad, Here is my prolog script, it just does one thing, setting quota on the XFS volume for each job:
The prolog_exec_xx_xx.log file was generated, so I assumed the first exec command got executed. Since the generated log file is empty, I think nothing was executed after that. Cheers [root@zeta-4-12 common]# cat prolog_exec.sh #!/bin/sh exec >> /tmp/prolog_exec_"$JOB_ID"_"$SGE_TASK_ID".log exec 2>&1 SGE_TMP_ROOT="/scratch_local" pe_num=$(cat $PE_HOSTFILE | grep $HOSTNAME | awk '{print $2}') tmp_req_var=$(echo "$tmp_requested" | grep -o -E '[0-9]+') tmp_req_unit=$(echo "$tmp_requested" | sed 's/[0-9]*//g') if [ -z "$pe_num" ]; then quota=$tmp_requested else quota=$(expr $tmp_req_var \* $pe_num)$tmp_req_unit fi echo "############################# [$HOSTNAME PROLOG] - JOB_ID:$JOB_ID TASK_ID:$SGE_TASK_ID #############################" echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'project -s -p $TMP $JOB_ID' $SGE_TMP_ROOT" echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'limit -p bhard=$quota $JOB_ID' $SGE_TMP_ROOT" xfs_quota_rc=0 /usr/sbin/xfs_quota -x -c "project -s -p $TMP $JOB_ID" $SGE_TMP_ROOT ((xfs_quota_rc+=$?)) /usr/sbin/xfs_quota -x -c "limit -p bhard=$quota $JOB_ID" $SGE_TMP_ROOT ((xfs_quota_rc+=$?)) if [ $xfs_quota_rc -eq 0 ]; then exit 0 else exit 100 # Put job in error state fi On Wed, Jan 9, 2019 at 7:36 PM Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > > Am 09.01.2019 um 01:14 schrieb Derrick Lin <klin...@gmail.com>: > > > > Hi guys, > > > > I just brought up a new SGE cluster, but somehow the qrsh session does > not work: > > > > tester@login-gpu:~$ qrsh > > ^Cerror: error while waiting for builtin IJS connection: "got select > timeout" > > > > after I hit entered, the session just stuck there forever instead of > bring me to a compute node. I have to entered Crtl+c to terminate and it > gave the above error. > > > > I noticed, the SGE did send my qrsh request to a compute node as I could > tell from qstat: > > > > > --------------------------------------------------------------------------------- > > short.q@zeta-4-15.local BIP 0/1/80 0.01 lx-amd64 > > 15 0.55500 QRLOGIN tester r 01/09/2019 10:47:13 1 > > > > We have a prolog script configured globally, the script deals with local > disk quota and keep all output to a log file for each job. So I went to > that compute node, and check, found that a log file was created but it was > empty. > > > > So my thinking so far is, my qrsh stuck because the prolog script is not > fully executed. > > Is there any statement in the prolog, which could wait for stdin – and in > a batch job there is just no stdin, hence it continues? Could be tested > with "-i" to a batch job. > > -- Reuti > > > > qsub job are working fine. > > > > Any idea will be appreciated > > > > Cheers, > > Derrick > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users