Hello, two things; you don't actually seem to have the '--x11' flag on your srun command? I.e. does 'srun --x11 --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A y8 -p RUBY xclock' get you any further?
I had some trouble getting the inbuild X forwarding to work, which had to do with hostnames & xauth magic cookies. If you do something like srun --x11 --pty /bin/bash to just get an interactive session, and then run xauth list | grep $(hostname) (note: $(hostname) - not $HOSTNAME - you want the local hostname) does that find a ticket for your session, i.e. does it print anything? If it does, you should be good; try running 'xclock' or something from that session. Needless to say, if you haven't got a magic cookie, it won't work. Tina On 17/11/2018 17:24, Mahmood Naderan wrote: > >What does this command say? > >scontrol show config | fgrep PrologFlags > > [root@rocks7 ~]# scontrol show config | fgrep PrologFlags > PrologFlags = Alloc,Contain,X11 > > That means x11 has been compiled in the code (while Werner created the > roll). > > > > >>Check your slurmd logs on the compute node. What errors are there? > > In one terminal, I run the following command > > [mahmood@rocks7 ~]$ srun --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A y8 > -p RUBY xclock > Error: Can't open display :1 > srun: error: compute-0-5: task 0: Exited with exit code 1 > > At the same time, in another terminal I see this > > [root@compute-0-5 ~]# tail -f /var/log/slurm/slurmd.log > [2018-11-17T20:47:23.017] _run_prolog: run job script took usec=4 > [2018-11-17T20:47:23.017] _run_prolog: prolog with lock for job 1580 ran > for 0 seconds > [2018-11-17T20:47:23.131] launch task 1580.0 request from UID:1000 > GID:1000 HOST:10.1.1.1 PORT:54950 > [2018-11-17T20:47:23.131] lllp_distribution jobid [1580] implicit auto > binding: sockets,one_thread, dist 1 > [2018-11-17T20:47:23.131] _task_layout_lllp_cyclic > [2018-11-17T20:47:23.131] _lllp_generate_cpu_bind jobid [1580]: > mask_cpu,one_thread, 0x00000070000007 > [2018-11-17T20:47:23.204] [1580.0] task_p_pre_launch: Using > sched_affinity for tasks > [2018-11-17T20:47:23.231] [1580.0] done with job > [2018-11-17T20:47:23.263] [1580.extern] done with job > ^C > > > > Also, at the same time, I see this in the frontend log > > [root@rocks7 ~]# tail -f /var/log/slurm/slurmctld.log > [2018-11-17T20:52:10.908] Fairhare priority of job 1582 for user mahmood > in acct y8 is 0.242424 > [2018-11-17T20:52:10.908] Weighted Age priority is 0.000000 * 10 = 0.00 > [2018-11-17T20:52:10.908] Weighted Fairshare priority is 0.242424 * > 10000 = 2424.24 > [2018-11-17T20:52:10.908] Weighted JobSize priority is 0.097756 * 100 = 9.78 > [2018-11-17T20:52:10.908] Weighted Partition priority is 0.001000 * > 10000 = 10.00 > [2018-11-17T20:52:10.908] Weighted QOS priority is 0.000000 * 0 = 0.00 > [2018-11-17T20:52:10.908] Weighted TRES:cpu is 0.041667 * 2000.00 = 83.33 > [2018-11-17T20:52:10.908] Weighted TRES:mem is 0.031884 * 1.00 = 0.03 > [2018-11-17T20:52:10.908] Job 1582 priority: 0.00 + 2424.24 + 9.78 + > 10.00 + 0.00 + 83 - 0 = 2527.38 > [2018-11-17T20:52:10.909] BillingWeight: JobId=1582 is either new or it > was resized > [2018-11-17T20:52:10.909] sched: _slurm_rpc_allocate_resources > JobId=1582 NodeList=compute-0-5 usec=977 > [2018-11-17T20:52:11.123] _job_complete: JobId=1582 WEXITSTATUS 1 > [2018-11-17T20:52:11.123] priority_p_job_end: called for job 1582 > [2018-11-17T20:52:11.123] job 1582 ran for 1 seconds with TRES counts of > [2018-11-17T20:52:11.123] TRES cpu: 6 > [2018-11-17T20:52:11.123] TRES mem: 8192 > [2018-11-17T20:52:11.123] TRES node: 1 > [2018-11-17T20:52:11.123] TRES billing: 6 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed > 15552000 unused seconds from QOS normal TRES cpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed > 21233664000 unused seconds from QOS normal TRES mem > grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed > 2592000 unused seconds from QOS normal TRES node grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed > 15552000 unused seconds from QOS normal TRES billing > grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0 > unused seconds from QOS normal TRES fs/disk grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0 > unused seconds from QOS normal TRES vmem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0 > unused seconds from QOS normal TRES pages grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_qos_tres_run_secs: job 1582: Removed 0 > unused seconds from QOS normal TRES gres/gpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] Adding 5.999997 new usage to assoc 42 > (y8/mahmood/ruby) raw usage is now 437603.824918. Group wall added > 0.999999 making it 72831.944878. > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 42 TRES cpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 21233664000 unused seconds from assoc 42 TRES mem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 2592000 unused seconds from assoc 42 TRES node grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 42 TRES billing > grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 42 TRES fs/disk grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 42 TRES vmem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 42 TRES pages grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 42 TRES gres/gpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] Adding 5.999997 new usage to assoc 41 > (y8/(null)/(null)) raw usage is now 28311279.361228. Group wall added > 0.999999 making it 1466496.669595. > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 41 TRES cpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 21233664000 unused seconds from assoc 41 TRES mem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 2592000 unused seconds from assoc 41 TRES node grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.123] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 41 TRES billing > grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 41 TRES fs/disk grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 41 TRES vmem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 41 TRES pages grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 41 TRES gres/gpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] Adding 5.999997 new usage to assoc 1 > (root/(null)/(null)) raw usage is now 107651994.109022. Group wall > added 0.999999 making it 4989938.597661. > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 1 TRES cpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 21233664000 unused seconds from assoc 1 TRES mem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 2592000 unused seconds from assoc 1 TRES node grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 15552000 unused seconds from assoc 1 TRES billing grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 1 TRES fs/disk grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 1 TRES vmem grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 1 TRES pages grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _handle_assoc_tres_run_secs: job 1582: Removed > 0 unused seconds from assoc 1 TRES gres/gpu grp_used_tres_run_secs = 0 > [2018-11-17T20:52:11.124] _job_complete: JobId=1582 done > > > > > > All those happened with the following two entries in slurm.conf > > PrologFlags=x11 > X11Parameters=local_xauthority > > > > Regards, > Mahmood > > >