Perhaps just a very trivial question, but it doesn't look you mentioned it: does your X-forwarding work from the login node? Maybe the X-server on your client is the problem and trying xclock on the login node would clarify that
On Wed, Oct 5, 2022 at 12:03 PM Allan Streib <astr...@indiana.edu> wrote: > > Hi everyone, > > I'm trying to get X11 forwarding working on my cluster. I've read some > of the threads and web posts on X11 forwarding and most of the common > issues I'm finding seem to pertain to older versions of Slurm. > > I log in from my workstation to the login node with ssh -X. I have x11 > apps installed on a test compute node, j-096. Here is what I see: > > From the config.log when I built slurm: > > $ grep X11 config.log > configure:19906: checking whether Slurm internal X11 support is enabled > | #define WITH_SLURM_X11 1 > | #define WITH_SLURM_X11 1 > | #define WITH_SLURM_X11 1 > #define WITH_SLURM_X11 1 > > > From the login node: > > $ scontrol show config | grep X11 > PrologFlags = Alloc,Contain,X11 > X11Parameters = home_xauthority > > $ grep ^X11 /etc/ssh/sshd_config > X11Forwarding yes > X11UseLocalhost no > > > Here is what I see when I try to run "xclock" on my test node: > > $ srun --x11 -w j-096 xclock > Error: Can't open display: localhost:64.0 > srun: error: j-096: task 0: Exited with exit code 1 > > > From the sshd_config on the test node: > > $ grep ^X11 /etc/ssh/sshd_config > X11Forwarding yes > > We are using hostbased ssh authentication in this cluster. > > From the slurmd.log on the test node: > > [2022-10-05T13:29:51.065] [2822.extern] X11 forwarding established on > DISPLAY=j-096:64.0 > [2022-10-05T13:29:51.165] launch task StepId=2822.0 request from UID:8348 > GID:100 HOST:172.16.100.132 PORT:58948 > [2022-10-05T13:29:51.165] task/affinity: lllp_distribution: JobId=2822 > auto binding off: mask_cpu > [2022-10-05T13:29:51.311] [2822.extern] error: _x11_socket_read: > slurm_open_msg_conn(127.0.0.1:34811): Connection refused > [2022-10-05T13:29:51.330] [2822.0] done with job > [2022-10-05T13:29:51.346] [2822.extern] done with job > [2022-10-05T13:29:51.436] [2822.extern] x11 forwarding shutdown complete > > Is the issue the two different DISPLAY values, i.e. j-096:64.0 > vs. localhost:64.0. Not sure how/where to reconcile these? I have tried > with and without "X11UseLocalhost no" on the login node. > > Best wishes, > > Allan >