Re: [pve-devel] [RFC cluster 0/2] fix #4886: improve SSH handling

Esi Y via pve-devel Thu, 11 Jan 2024 04:23:27 -0800

--- Begin Message ---

On Tue, Jan 09, 2024 at 09:57:30AM +0100, Fabian Grünbichler wrote:
> > Esi Y via pve-devel <pve-devel@lists.proxmox.com> hat am 09.01.2024 06:01 
> > CET geschrieben:
> > On Thu, Dec 21, 2023 at 10:53:11AM +0100, Fabian Grünbichler wrote:
> > > RFC since this would be a bigger change in how we approach intra-cluster
> > > SSH access.
> > > 
> > > there are still a few parts that currently don't use SSHInfo, but
> > > would need to be switched over if we want to pursue this approach:
> > > 
> > > - get_vnc_connection_info in PVE::API2::Nodes
> > > - 'upload' API endpoint in PVE::API2::Storage::Status
> > > - SSH proxy in pvesh
> > > 
> > > these changes would need to happen coordinated with the patches from
> > > this RFC series!
> > 
> > Not necessarily.
> 
> if we do the unmerge as well, then yes - else a node with unmerged known host 
> would fail to connect to nodes that joined the cluster after unmerging.


First of all, I have just seen the barrage of related patches re the helper, so 
let me say I think it's good to refactor and bring into consistency invoking 
ssh by that means.

I apologise for late reply, it belongs to here though and I believe is still 
relevant with your currently pursued solution as well.

PVE (and unsuspected users)  would not need to depend on the helper if it had 
not depended on options such as -o HostKeyAlias. For the sake of brewity I 
filed bug report #5170 [1] re details of that, here I would just pitch that 
dropping in a config file into /etc/ssh/ssh_config.d/ would be much better than 
including options into each and every cluster SSH, a config file from which 
user connections would benefit as well and which would not run the risk of 
keeping stale entries in "remote" known_hosts partial blocks. The solution is 
to allow GlobalKnownHostsFile and UserKnownHostsFile as-is, but to add 
KnownHostsCommand to the client config. This allows for:
1. All entries to be considered alongside with cluster ones;
2. Check what is being sought for with %H and adapt approprietaly - no more 
worries about all the configs being in sync;
3. Log whenever something when wrong;
3. User never worrying about getting different ssh conn from a command-line and 
botching any one particulars machine local ones.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5170

>   
> > > next steps afterwards:
> > > - unmerge known hosts in `pvecm updatecerts`, instead of merging
> > > -- to disentangle regular ssh from intra-cluster SSH
> > 
> > Both of these could be accomplished with codebase/complexity reduction in 
> > an approach to:
> 
> I am not sure what "both" means here? there is only a single thing quoted ;)

Fair enough, I expected there will still be some merging necessary, now those 
are just snippets. The current name could be confusing to a user, e.g. would be 
better called self_known_hosts or such, but see above, no need for that at all.
 
> > 1.  eliminate shared ssh config and key files, i.e. completely remove any 
> > need for symlinks; and
> 
> that's the evaluate part further below.

This is for another thread apparently, the issue is that this proposed 
known_hosts fix e.g. still does not do anything for the duplicates in 
authorized_keys.
 
> > 2.  instead initialise each node at join (first one at cluster creation) 
> > into their respective node-local files with ssh certs; whilst
> 
> versus just copying the host key, which is far simpler?

I will leave the certs discussion out of this thread to not have it convoluted, 
the issue with certs in current PVE architectural model is that the CA resides 
on all nodes anyhow. Initialising here meant nothing else than "HostCertificate 
/etc/ssh/clusterhostkey-cert.pub" > /etc/ssh/sshd_config.d/clustercerts.conf.

> > 3.  keeping the setup maintenance-free, since any single key 
> > addition/refresh does not need to propagate any individual keys around the 
> > cluster; meanwhile
> 
> yes it does, changing the key would entail revoking the old one (and 
> distributing that!) and signing the new one.

This is not fair counterpoint due to this fix still not addressing bug report 
#4670 [2] re authorized_keys. Revoking is an append to a file rather than 
deleting entries from existing, so simpler.

[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4670

> > 4.  no requirement for additional -o options;
> 
> we already need -o options anyway, so there is no downside to adding 
> additional ones

See above detailed in bug #5170 [1], I just wish this does not exacetbate it, 
as it is an opportunity to fix 2in1.

> > 5.  no requirement for sshd config appends, just drop-ins;
> 
> there's no need for sshd config changes either with the patches here?

The approach as originally proposed (without KnownHostsCommand) is changing 
existing functionality - the keys are gone from the machine and it requires 
non-intuitive manual -o's. So no config changes, but behaviour change that 
would need documenting at least in the release notes.

> > 6.  existing X509 CA key can be reused for ssh PKI as well.
> 
> that might no actually be the best of ideas ;)

Out of scope of this thread, but I take this as a snide remark as in this 
particular case it's literally format conversion of what's already there with 
no security implications.

openssl x509 -in /etc/pve/pve-root-ca.pem -inform pem -pubkey -noout | 
ssh-keygen -f /dev/stdin -i -m PKCS8 > /etc/pve/pve-root-ca.pub

> > > -- to allow `ssh-keygen -f .. -R ..` to work properly again
> > 
> > Will always work with local-only files. In the other approach, the -R will 
> > still not work with a file stored on pmxcfs.
> 
> yes, the -R will work with a file stored on pmxcfs. just not with a symlink, 
> no matter where it points. which is why the next step above lists unmerging 
> the known hosts to get rid of the symlink.

-R does not work if fs does not support hardlinks [3], but it can be also 
addressed with the KnownHostsCommand, otherwise if you make the snippet files 
multiline we are back to the same problem should someone need to run it on 
pmxcfs

[3] 
https://github.com/openssh/openssh-portable/blob/64e0600f23c6dec36c3875392ac95b8a9100c2d6/ssh-keygen.c#L1368-L1382

> > > -- existing keys would still be preserved for not-yet-upgraded nodes, so 
> > > this
> > >    should be do-able without waiting for a major release..
> > 
> > With the ssh certs, the old (non-cert) keys could be safely left behind in 
> > the pmxcfs location, upgraded nodes would append the previously shared 
> > content into local files, old nodes would not make use of the new keys 
> > until upgraded, but will keep working with the old to the extent they used 
> > to work. Universal remedy for any legacy issues would be to upgrade the 
> > node.
> 
> the same is true for the patches here:
> - updated nodes publish their own key, and use published keys if available
> - non-updated nodes will still have the symlink and use the "old" merged file

Yes, this was mostly meant as the certs are not any worse or more disruptive 
than iterative approach.

> > 
> > > - evaluate whether we want to split out
> > > -- the client config (we currently force a cipher order there)
> > > -- the client key (could live in /etc/pve/priv instead?)
> > > -- or even the sshd instance altogether (would allow not touching the
> > >    regular sshd config at all)
> > 
> > Non-issue in the ssh certs approach.
> 
> on the contrary, all of the above are also valid for cert-based auth..

I will keep this out of scope here, but happy to discuss this if interested - I 
find it to be adding complexity where it is not needed.

> > What's the counter-argument to ssh certs, given the simlicity in comparison 
> > with both the old approach and the intially suggested one here?
> 
> the one sentence summary - it doesn't get us closer to where we want to end 
> up (either getting rid of SSH entirely, or full disentangling PVE-internal 
> SSH use from the regular, default sshd instance), but adds more complexity 
> instead.

I could have provided POC which would be equivalent to ~6 shell lines that need 
to be executed upon cluster join addressing both known_hosts and 
authorized_keys with no mandatory argv[], but I suspect the patches are well on 
the way now and unless it disrupts the components that now make use of the new 
helper - everyone prefers what they know and what has worked so far. I happen 
to known ssh certs, so hence my original take. I believe the certs are also 
future proof (without extra effort) when it comes to e.g. keys rotation 
(expiries built in), something PVE does not take into account as of today and 
tracked separately as bug report #5174 [4].

[4] https://bugzilla.proxmox.com/show_bug.cgi?id=5174



ONE FINAL POINT:

The original merging, juggling, accounting for hashed known_hosts, etc. 
elaborate codepath - I humbly believe - was all done in the interest of making 
SSH transparent to the user within the cluster. The proposed approach here is 
much cleaner, but it takes out that original objective off the feature list. I 
suspect it was important enough then to make it worth of all the efforts going 
into this. It would be a pity if that is lost when it does not have to be. The 
certs have that built-in, but the KnownHostsCommand also retains that without 
any additional risk.

--- End Message ---

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC cluster 0/2] fix #4886: improve SSH handling

Reply via email to