Hello,
check sshd settings (here are ours):
X11Forwarding yes
X11DisplayOffset 10
*X11UseLocalhost no*
Add PrologFlags in slurm.conf:
PrologFlags=x11
Cheers,
Barbara
On 11/16/20 7:20 PM, Russell Jones wrote:
Here's some debug logs from the compute node after launching an
interactive shell with the x11 flag. I see it show X11 forwarding
established, then it ends with a connection timeout.
[2020-11-16T12:12:34.097] debug: Checking credential with 492 bytes
of sig data
[2020-11-16T12:12:34.098] _run_prolog: run job script took usec=1284
[2020-11-16T12:12:34.098] _run_prolog: prolog with lock for job 30873
ran for 0 seconds
[2020-11-16T12:12:34.111] debug: AcctGatherEnergy NONE plugin loaded
[2020-11-16T12:12:34.112] debug: AcctGatherProfile NONE plugin loaded
[2020-11-16T12:12:34.113] debug: AcctGatherInterconnect NONE plugin
loaded
[2020-11-16T12:12:34.114] debug: AcctGatherFilesystem NONE plugin loaded
[2020-11-16T12:12:34.115] debug: switch NONE plugin loaded
[2020-11-16T12:12:34.116] debug: init: Gres GPU plugin loaded
[2020-11-16T12:12:34.116] [30873.extern] debug: Job accounting gather
LINUX plugin loaded
[2020-11-16T12:12:34.117] [30873.extern] debug: cont_id hasn't been
set yet not running poll
[2020-11-16T12:12:34.117] [30873.extern] debug: Message thread
started pid = 18771
[2020-11-16T12:12:34.119] [30873.extern] debug: task NONE plugin loaded
[2020-11-16T12:12:34.120] [30873.extern] Munge credential signature
plugin loaded
[2020-11-16T12:12:34.121] [30873.extern] debug: job_container none
plugin loaded
[2020-11-16T12:12:34.121] [30873.extern] debug: spank: opening plugin
stack /apps/slurm/cluster/20.02.0/etc/plugstack.conf
[2020-11-16T12:12:34.121] [30873.extern] debug: X11Parameters: (null)
[2020-11-16T12:12:34.133] [30873.extern] X11 forwarding established on
DISPLAY=cluster-cn02.domain:66.0
[2020-11-16T12:12:34.133] [30873.extern] debug: jag_common_poll_data:
Task 0 pid 18775 ave_freq = 4023000 mem size/max 6750208/6750208 vmem
size/max 147521536/147521536, disk read size/max (7200/7200), disk
write size/max (374/374), time 0.000000(0+0) Energy tot/max 0/0
TotPower 0 MaxPower 0 MinPower 0
[2020-11-16T12:12:34.133] [30873.extern] debug: x11 forwarding local
display is 66
[2020-11-16T12:12:34.133] [30873.extern] debug: x11 forwarding local
xauthority is /tmp/.Xauthority-MkU8aA
[2020-11-16T12:12:34.202] launch task 30873.0 request from UID:1368
GID:512 HOST:172.21.150.10 PORT:4795
[2020-11-16T12:12:34.202] debug: Checking credential with 492 bytes
of sig data
[2020-11-16T12:12:34.202] [30873.extern] debug: Handling
REQUEST_X11_DISPLAY
[2020-11-16T12:12:34.202] [30873.extern] debug: Leaving
_handle_get_x11_display
[2020-11-16T12:12:34.202] debug: Leaving stepd_get_x11_display
[2020-11-16T12:12:34.202] debug: Waiting for job 30873's prolog to
complete
[2020-11-16T12:12:34.202] debug: Finished wait for job 30873's prolog
to complete
[2020-11-16T12:12:34.213] debug: AcctGatherEnergy NONE plugin loaded
[2020-11-16T12:12:34.214] debug: AcctGatherProfile NONE plugin loaded
[2020-11-16T12:12:34.214] debug: AcctGatherInterconnect NONE plugin
loaded
[2020-11-16T12:12:34.214] debug: AcctGatherFilesystem NONE plugin loaded
[2020-11-16T12:12:34.215] debug: switch NONE plugin loaded
[2020-11-16T12:12:34.215] debug: init: Gres GPU plugin loaded
[2020-11-16T12:12:34.216] [30873.0] debug: Job accounting gather
LINUX plugin loaded
[2020-11-16T12:12:34.216] [30873.0] debug: cont_id hasn't been set
yet not running poll
[2020-11-16T12:12:34.216] [30873.0] debug: Message thread started pid
= 18781
[2020-11-16T12:12:34.216] debug: task_p_slurmd_reserve_resources: 30873 0
[2020-11-16T12:12:34.217] [30873.0] debug: task NONE plugin loaded
[2020-11-16T12:12:34.217] [30873.0] Munge credential signature plugin
loaded
[2020-11-16T12:12:34.217] [30873.0] debug: job_container none plugin
loaded
[2020-11-16T12:12:34.217] [30873.0] debug: mpi type = pmix
[2020-11-16T12:12:34.244] [30873.0] debug: spank: opening plugin
stack /apps/slurm/cluster/20.02.0/etc/plugstack.conf
[2020-11-16T12:12:34.244] [30873.0] debug: mpi type = pmix
[2020-11-16T12:12:34.244] [30873.0] debug: (null) [0] mpi_pmix.c:153
[p_mpi_hook_slurmstepd_prefork] mpi/pmix: start
[2020-11-16T12:12:34.244] [30873.0] debug: mpi/pmix: setup sockets
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_client_v2.c:69 [_errhandler_reg_callbk] mpi/pmix: Error handler
registration callback is called with status=0, ref=0
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_client.c:697 [pmixp_libpmix_job_set] mpi/pmix: task initialization
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_agent.c:229 [_agent_thread] mpi/pmix: Start agent thread
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_agent.c:330 [pmixp_agent_start] mpi/pmix: agent thread started:
tid = 70366934331824
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_agent.c:335 [pmixp_agent_start] mpi/pmix: timer thread started:
tid = 70366933283248
[2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0]
pmixp_agent.c:267 [_pmix_timer_thread] mpi/pmix: Start timer thread
[2020-11-16T12:12:34.273] [30873.0] debug: stdin uses a pty object
[2020-11-16T12:12:34.274] [30873.0] debug: init pty size 34:159
[2020-11-16T12:12:34.274] [30873.0] in _window_manager
[2020-11-16T12:12:34.274] [30873.0] debug level = 2
[2020-11-16T12:12:34.274] [30873.0] debug: IO handler started pid=18781
[2020-11-16T12:12:34.275] [30873.0] starting 1 tasks
[2020-11-16T12:12:34.276] [30873.0] task 0 (18801) started
2020-11-16T12:12:34
[2020-11-16T12:12:34.276] [30873.0] debug: task_p_pre_launch_priv:
30873.0
[2020-11-16T12:12:34.288] [30873.0] debug: jag_common_poll_data: Task
0 pid 18801 ave_freq = 4023000 mem size/max 9961472/9961472 vmem
size/max 512557056/512557056, disk read size/max (0/0), disk write
size/max (0/0), time 0.000000(0+0) Energy tot/max 0/0 TotPower 0
MaxPower 0 MinPower 0
[2020-11-16T12:12:34.288] [30873.0] debug: Sending launch resp rc=0
[2020-11-16T12:12:34.288] [30873.0] debug: mpi type = pmix
[2020-11-16T12:12:34.288] [30873.0] debug: cluster-cn02 [0]
mpi_pmix.c:180 [p_mpi_hook_slurmstepd_task] mpi/pmix: Patch
environment for task 0
[2020-11-16T12:12:34.289] [30873.0] debug: task_p_pre_launch:
30873.0, task 0
[2020-11-16T12:12:39.475] [30873.extern] error: _x11_socket_read:
slurm_open_msg_conn: Connection timed out
On Mon, Nov 16, 2020 at 11:50 AM Russell Jones <arjone...@gmail.com
<mailto:arjone...@gmail.com>> wrote:
Hello,
Thanks for the reply!
We are using Slurm 20.02.0.
On Mon, Nov 16, 2020 at 10:59 AM sathish
<sathish.sathishku...@gmail.com
<mailto:sathish.sathishku...@gmail.com>> wrote:
Hi Russell Jones,
I believe you are using a slurm version older than 19.05. X11
forwarding code has been revamped and it works as expected
starting from the 19.05.0 version.
On Mon, Nov 16, 2020 at 10:02 PM Russell Jones
<arjone...@gmail.com <mailto:arjone...@gmail.com>> wrote:
Hi all,
Hoping I can get pointed in the right direction here.
I have X11 forwarding enabled in Slurm, however I cannot
seem to get it working properly. It works when I test with
"ssh -Y" to the compute node from the login node, however
when I try through Slurm the Display variable looks very
different, and I get an error. Example below:
[user@cluster-1 ~]$ ssh -Y cluster-cn02
Last login: Mon Nov 16 10:09:18 2020 from 172.21.150.10
-bash-4.2$ env | grep -i display
DISPLAY=172.21.150.102:10.0
-bash-4.2$ xclock
Warning: Missing charsets in String to FontSet conversion
** Clock pops up and works **
[user@cluster-1 ~]$ srun -p cluster -w cluster-cn02 --x11
--pty bash -l
bash-4.2$ env | grep -i display
DISPLAY=localhost:28.0
bash-4.2$ xclock
Error: Can't open display: localhost:28.0
Any ideas on where to begin looking? I'm not sure why the
display variable is being set to localhost instead of the
login node.
Thanks!
--
Regards.....
Sathish