Thank you! I do have X11UseLocalhost set to no, X11Forwarding set to yes:
[root@cluster-cn02 ssh]# sshd -T | grep -i X11 x11displayoffset 10 x11maxdisplays 1000 x11forwarding yes x11uselocalhost no No firewalls on this network between the login node and compute node. On Tue, Nov 17, 2020 at 1:21 AM Patrick Bégou < patrick.be...@legi.grenoble-inp.fr> wrote: > Hi Russell Jones, > > did you try to stop firewall on the client cluster-cn02 ? > > Patrick > > Le 16/11/2020 à 19:20, Russell Jones a écrit : > > Here's some debug logs from the compute node after launching an > interactive shell with the x11 flag. I see it show X11 forwarding > established, then it ends with a connection timeout. > > > [2020-11-16T12:12:34.097] debug: Checking credential with 492 bytes of > sig data > [2020-11-16T12:12:34.098] _run_prolog: run job script took usec=1284 > [2020-11-16T12:12:34.098] _run_prolog: prolog with lock for job 30873 ran > for 0 seconds > [2020-11-16T12:12:34.111] debug: AcctGatherEnergy NONE plugin loaded > [2020-11-16T12:12:34.112] debug: AcctGatherProfile NONE plugin loaded > [2020-11-16T12:12:34.113] debug: AcctGatherInterconnect NONE plugin loaded > [2020-11-16T12:12:34.114] debug: AcctGatherFilesystem NONE plugin loaded > [2020-11-16T12:12:34.115] debug: switch NONE plugin loaded > [2020-11-16T12:12:34.116] debug: init: Gres GPU plugin loaded > [2020-11-16T12:12:34.116] [30873.extern] debug: Job accounting gather > LINUX plugin loaded > [2020-11-16T12:12:34.117] [30873.extern] debug: cont_id hasn't been set > yet not running poll > [2020-11-16T12:12:34.117] [30873.extern] debug: Message thread started > pid = 18771 > [2020-11-16T12:12:34.119] [30873.extern] debug: task NONE plugin loaded > [2020-11-16T12:12:34.120] [30873.extern] Munge credential signature plugin > loaded > [2020-11-16T12:12:34.121] [30873.extern] debug: job_container none plugin > loaded > [2020-11-16T12:12:34.121] [30873.extern] debug: spank: opening plugin > stack /apps/slurm/cluster/20.02.0/etc/plugstack.conf > [2020-11-16T12:12:34.121] [30873.extern] debug: X11Parameters: (null) > [2020-11-16T12:12:34.133] [30873.extern] X11 forwarding established on > DISPLAY=cluster-cn02.domain:66.0 > [2020-11-16T12:12:34.133] [30873.extern] debug: jag_common_poll_data: > Task 0 pid 18775 ave_freq = 4023000 mem size/max 6750208/6750208 vmem > size/max 147521536/147521536, disk read size/max (7200/7200), disk write > size/max (374/374), time 0.000000(0+0) Energy tot/max 0/0 TotPower 0 > MaxPower 0 MinPower 0 > [2020-11-16T12:12:34.133] [30873.extern] debug: x11 forwarding local > display is 66 > [2020-11-16T12:12:34.133] [30873.extern] debug: x11 forwarding local > xauthority is /tmp/.Xauthority-MkU8aA > [2020-11-16T12:12:34.202] launch task 30873.0 request from UID:1368 > GID:512 HOST:172.21.150.10 PORT:4795 > [2020-11-16T12:12:34.202] debug: Checking credential with 492 bytes of > sig data > [2020-11-16T12:12:34.202] [30873.extern] debug: Handling > REQUEST_X11_DISPLAY > [2020-11-16T12:12:34.202] [30873.extern] debug: Leaving > _handle_get_x11_display > [2020-11-16T12:12:34.202] debug: Leaving stepd_get_x11_display > [2020-11-16T12:12:34.202] debug: Waiting for job 30873's prolog to > complete > [2020-11-16T12:12:34.202] debug: Finished wait for job 30873's prolog to > complete > [2020-11-16T12:12:34.213] debug: AcctGatherEnergy NONE plugin loaded > [2020-11-16T12:12:34.214] debug: AcctGatherProfile NONE plugin loaded > [2020-11-16T12:12:34.214] debug: AcctGatherInterconnect NONE plugin loaded > [2020-11-16T12:12:34.214] debug: AcctGatherFilesystem NONE plugin loaded > [2020-11-16T12:12:34.215] debug: switch NONE plugin loaded > [2020-11-16T12:12:34.215] debug: init: Gres GPU plugin loaded > [2020-11-16T12:12:34.216] [30873.0] debug: Job accounting gather LINUX > plugin loaded > [2020-11-16T12:12:34.216] [30873.0] debug: cont_id hasn't been set yet > not running poll > [2020-11-16T12:12:34.216] [30873.0] debug: Message thread started pid = > 18781 > [2020-11-16T12:12:34.216] debug: task_p_slurmd_reserve_resources: 30873 0 > [2020-11-16T12:12:34.217] [30873.0] debug: task NONE plugin loaded > [2020-11-16T12:12:34.217] [30873.0] Munge credential signature plugin > loaded > [2020-11-16T12:12:34.217] [30873.0] debug: job_container none plugin > loaded > [2020-11-16T12:12:34.217] [30873.0] debug: mpi type = pmix > [2020-11-16T12:12:34.244] [30873.0] debug: spank: opening plugin stack > /apps/slurm/cluster/20.02.0/etc/plugstack.conf > [2020-11-16T12:12:34.244] [30873.0] debug: mpi type = pmix > [2020-11-16T12:12:34.244] [30873.0] debug: (null) [0] mpi_pmix.c:153 > [p_mpi_hook_slurmstepd_prefork] mpi/pmix: start > [2020-11-16T12:12:34.244] [30873.0] debug: mpi/pmix: setup sockets > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_client_v2.c:69 [_errhandler_reg_callbk] mpi/pmix: Error handler > registration callback is called with status=0, ref=0 > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_client.c:697 [pmixp_libpmix_job_set] mpi/pmix: task initialization > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_agent.c:229 [_agent_thread] mpi/pmix: Start agent thread > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_agent.c:330 [pmixp_agent_start] mpi/pmix: agent thread started: tid = > 70366934331824 > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_agent.c:335 [pmixp_agent_start] mpi/pmix: timer thread started: tid = > 70366933283248 > [2020-11-16T12:12:34.273] [30873.0] debug: cluster-cn02 [0] > pmixp_agent.c:267 [_pmix_timer_thread] mpi/pmix: Start timer thread > [2020-11-16T12:12:34.273] [30873.0] debug: stdin uses a pty object > [2020-11-16T12:12:34.274] [30873.0] debug: init pty size 34:159 > [2020-11-16T12:12:34.274] [30873.0] in _window_manager > [2020-11-16T12:12:34.274] [30873.0] debug level = 2 > [2020-11-16T12:12:34.274] [30873.0] debug: IO handler started pid=18781 > [2020-11-16T12:12:34.275] [30873.0] starting 1 tasks > [2020-11-16T12:12:34.276] [30873.0] task 0 (18801) started > 2020-11-16T12:12:34 > [2020-11-16T12:12:34.276] [30873.0] debug: task_p_pre_launch_priv: 30873.0 > [2020-11-16T12:12:34.288] [30873.0] debug: jag_common_poll_data: Task 0 > pid 18801 ave_freq = 4023000 mem size/max 9961472/9961472 vmem size/max > 512557056/512557056, disk read size/max (0/0), disk write size/max (0/0), > time 0.000000(0+0) Energy tot/max 0/0 TotPower 0 MaxPower 0 MinPower 0 > [2020-11-16T12:12:34.288] [30873.0] debug: Sending launch resp rc=0 > [2020-11-16T12:12:34.288] [30873.0] debug: mpi type = pmix > [2020-11-16T12:12:34.288] [30873.0] debug: cluster-cn02 [0] > mpi_pmix.c:180 [p_mpi_hook_slurmstepd_task] mpi/pmix: Patch environment for > task 0 > [2020-11-16T12:12:34.289] [30873.0] debug: task_p_pre_launch: 30873.0, > task 0 > [2020-11-16T12:12:39.475] [30873.extern] error: _x11_socket_read: > slurm_open_msg_conn: Connection timed out > > On Mon, Nov 16, 2020 at 11:50 AM Russell Jones <arjone...@gmail.com> > wrote: > >> Hello, >> >> Thanks for the reply! >> >> We are using Slurm 20.02.0. >> >> On Mon, Nov 16, 2020 at 10:59 AM sathish <sathish.sathishku...@gmail.com> >> wrote: >> >>> Hi Russell Jones, >>> >>> I believe you are using a slurm version older than 19.05. X11 >>> forwarding code has been revamped and it works as expected starting from >>> the 19.05.0 version. >>> >>> >>> On Mon, Nov 16, 2020 at 10:02 PM Russell Jones <arjone...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> Hoping I can get pointed in the right direction here. >>>> >>>> I have X11 forwarding enabled in Slurm, however I cannot seem to get it >>>> working properly. It works when I test with "ssh -Y" to the compute node >>>> from the login node, however when I try through Slurm the Display variable >>>> looks very different, and I get an error. Example below: >>>> >>>> [user@cluster-1 ~]$ ssh -Y cluster-cn02 >>>> Last login: Mon Nov 16 10:09:18 2020 from 172.21.150.10 >>>> -bash-4.2$ env | grep -i display >>>> DISPLAY=172.21.150.102:10.0 >>>> -bash-4.2$ xclock >>>> Warning: Missing charsets in String to FontSet conversion >>>> ** Clock pops up and works ** >>>> >>>> [user@cluster-1 ~]$ srun -p cluster -w cluster-cn02 --x11 --pty bash -l >>>> bash-4.2$ env | grep -i display >>>> DISPLAY=localhost:28.0 >>>> bash-4.2$ xclock >>>> Error: Can't open display: localhost:28.0 >>>> >>>> >>>> Any ideas on where to begin looking? I'm not sure why the display >>>> variable is being set to localhost instead of the login node. >>>> >>>> Thanks! >>>> >>>> >>> >>> -- >>> Regards..... >>> Sathish >>> >> >