On Tuesday, 16 October 2018, at 09:30:13 (-0400), Dave Botsch wrote: > Hrm... it looks like the default install of OHPC went with DHA keys > instead: > > .ssh]$ cat config > # Added by Warewulf 2018-10-08 > Host * > IdentityFile ~/.ssh/cluster > StrictHostKeyChecking=no > $ file cluster > cluster: PEM DSA private key
That's not OHPC. That's a (rather unfortunate) part of Warewulf called `cluster-env`, a tool used to seamlessly make passphrase-less SSH work within a cluster without admin/user intervention. You can see the code here: https://github.com/warewulf/warewulf3/blob/master/cluster/bin/cluster-env If you install the warewulf-cluster RPM, a script installed as /etc/profile.d/cluster-env.sh will run /usr/bin/cluster-env on each login (for sh/ksh/bash users...and an equivalent script is installed for csh/tcsh users). See e.g. https://github.com/warewulf/warewulf3/blob/master/cluster/etc/cluster-env.sh for the stub script. The above version on GitHub has been updated to use RSA keys instead of DSA, but the *actually* correct solution, rather than forceably altering each user's SSH configuration and ~/.ssh/ contents, is to enable Host-based authentication for SSH in /etc/ssh/sshd_config (or GSSAPI authentication, or host-based certificates, or any of the other options available to have machines authenticate themselves so that users can move between cluster hosts seamlessly and securely). When that utility was written, DSA was the "state-of-the-art," and it unfortunately went untouched for a very long time. The key type should not have been hard-coded with no way to permit site-specific configuration, but it was. As I said, though, there are better ways to accomplish user auth between nodes without passphrases, and I recommend disabling `cluster-env` and using one of those alternatives instead. (In fact, it's probably best to remove the entire warewulf-cluster RPM. wwinit and wwfirstboot are similarly ancient tools in need of updating/replacement.) As for X11 forwarding/authentication, there is no easy/simple answer to why it won't work. Lots of things need to be in sync for it to work, including xauth, xhost, $DISPLAY, firewall rules, etc., and there are numerous opportunities for minor misconfigurations to break the whole kit-and-kaboodle. To troubleshoot, I recommend examining the values of $DISPLAY and the results of `xauth list` and `xhost` under both working and non-working conditions, and see if you can see a pattern. Also make sure `ssh -Y` is being used all along the way, not just `ssh -X`. Our solution at LANL uses a 130-line PERL script that does proper NFS-based locking of the user's ~/.Xauthority file, forceably resets their $DISPLAY to the correct value, and adds the correct entry to ~/.Xauthority using `xauth add`. Our experience has been that's the only way to correctly handle all cases. (And no, unfortunately I can't share it, but it's not a difficult thing to write.) Michael -- Michael E. Jennings <m...@lanl.gov> HPC Systems Team, Los Alamos National Laboratory Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605