--- Begin Message ---
On Tue, Jan 09, 2024 at 09:57:30AM +0100, Fabian Grünbichler wrote:
> > Esi Y via pve-devel <pve-devel@lists.proxmox.com> hat am 09.01.2024 06:01
> > CET geschrieben:
> > On Thu, Dec 21, 2023 at 10:53:11AM +0100, Fabian Grünbichler wrote:
> > > RFC since this would be a bigger change in how we approach intra-cluster
> > > SSH access.
> > >
> > > there are still a few parts that currently don't use SSHInfo, but
> > > would need to be switched over if we want to pursue this approach:
> > >
> > > - get_vnc_connection_info in PVE::API2::Nodes
> > > - 'upload' API endpoint in PVE::API2::Storage::Status
> > > - SSH proxy in pvesh
> > >
> > > these changes would need to happen coordinated with the patches from
> > > this RFC series!
> >
> > Not necessarily.
>
> if we do the unmerge as well, then yes - else a node with unmerged known host
> would fail to connect to nodes that joined the cluster after unmerging.
First of all, I have just seen the barrage of related patches re the helper, so
let me say I think it's good to refactor and bring into consistency invoking
ssh by that means.
I apologise for late reply, it belongs to here though and I believe is still
relevant with your currently pursued solution as well.
PVE (and unsuspected users) would not need to depend on the helper if it had
not depended on options such as -o HostKeyAlias. For the sake of brewity I
filed bug report #5170 [1] re details of that, here I would just pitch that
dropping in a config file into /etc/ssh/ssh_config.d/ would be much better than
including options into each and every cluster SSH, a config file from which
user connections would benefit as well and which would not run the risk of
keeping stale entries in "remote" known_hosts partial blocks. The solution is
to allow GlobalKnownHostsFile and UserKnownHostsFile as-is, but to add
KnownHostsCommand to the client config. This allows for:
1. All entries to be considered alongside with cluster ones;
2. Check what is being sought for with %H and adapt approprietaly - no more
worries about all the configs being in sync;
3. Log whenever something when wrong;
3. User never worrying about getting different ssh conn from a command-line and
botching any one particulars machine local ones.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5170
>
> > > next steps afterwards:
> > > - unmerge known hosts in `pvecm updatecerts`, instead of merging
> > > -- to disentangle regular ssh from intra-cluster SSH
> >
> > Both of these could be accomplished with codebase/complexity reduction in
> > an approach to:
>
> I am not sure what "both" means here? there is only a single thing quoted ;)
Fair enough, I expected there will still be some merging necessary, now those
are just snippets. The current name could be confusing to a user, e.g. would be
better called self_known_hosts or such, but see above, no need for that at all.
> > 1. eliminate shared ssh config and key files, i.e. completely remove any
> > need for symlinks; and
>
> that's the evaluate part further below.
This is for another thread apparently, the issue is that this proposed
known_hosts fix e.g. still does not do anything for the duplicates in
authorized_keys.
> > 2. instead initialise each node at join (first one at cluster creation)
> > into their respective node-local files with ssh certs; whilst
>
> versus just copying the host key, which is far simpler?
I will leave the certs discussion out of this thread to not have it convoluted,
the issue with certs in current PVE architectural model is that the CA resides
on all nodes anyhow. Initialising here meant nothing else than "HostCertificate
/etc/ssh/clusterhostkey-cert.pub" > /etc/ssh/sshd_config.d/clustercerts.conf.
> > 3. keeping the setup maintenance-free, since any single key
> > addition/refresh does not need to propagate any individual keys around the
> > cluster; meanwhile
>
> yes it does, changing the key would entail revoking the old one (and
> distributing that!) and signing the new one.
This is not fair counterpoint due to this fix still not addressing bug report
#4670 [2] re authorized_keys. Revoking is an append to a file rather than
deleting entries from existing, so simpler.
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4670
> > 4. no requirement for additional -o options;
>
> we already need -o options anyway, so there is no downside to adding
> additional ones
See above detailed in bug #5170 [1], I just wish this does not exacetbate it,
as it is an opportunity to fix 2in1.
> > 5. no requirement for sshd config appends, just drop-ins;
>
> there's no need for sshd config changes either with the patches here?
The approach as originally proposed (without KnownHostsCommand) is changing
existing functionality - the keys are gone from the machine and it requires
non-intuitive manual -o's. So no config changes, but behaviour change that
would need documenting at least in the release notes.
> > 6. existing X509 CA key can be reused for ssh PKI as well.
>
> that might no actually be the best of ideas ;)
Out of scope of this thread, but I take this as a snide remark as in this
particular case it's literally format conversion of what's already there with
no security implications.
openssl x509 -in /etc/pve/pve-root-ca.pem -inform pem -pubkey -noout |
ssh-keygen -f /dev/stdin -i -m PKCS8 > /etc/pve/pve-root-ca.pub
> > > -- to allow `ssh-keygen -f .. -R ..` to work properly again
> >
> > Will always work with local-only files. In the other approach, the -R will
> > still not work with a file stored on pmxcfs.
>
> yes, the -R will work with a file stored on pmxcfs. just not with a symlink,
> no matter where it points. which is why the next step above lists unmerging
> the known hosts to get rid of the symlink.
-R does not work if fs does not support hardlinks [3], but it can be also
addressed with the KnownHostsCommand, otherwise if you make the snippet files
multiline we are back to the same problem should someone need to run it on
pmxcfs
[3]
https://github.com/openssh/openssh-portable/blob/64e0600f23c6dec36c3875392ac95b8a9100c2d6/ssh-keygen.c#L1368-L1382
> > > -- existing keys would still be preserved for not-yet-upgraded nodes, so
> > > this
> > > should be do-able without waiting for a major release..
> >
> > With the ssh certs, the old (non-cert) keys could be safely left behind in
> > the pmxcfs location, upgraded nodes would append the previously shared
> > content into local files, old nodes would not make use of the new keys
> > until upgraded, but will keep working with the old to the extent they used
> > to work. Universal remedy for any legacy issues would be to upgrade the
> > node.
>
> the same is true for the patches here:
> - updated nodes publish their own key, and use published keys if available
> - non-updated nodes will still have the symlink and use the "old" merged file
Yes, this was mostly meant as the certs are not any worse or more disruptive
than iterative approach.
> >
> > > - evaluate whether we want to split out
> > > -- the client config (we currently force a cipher order there)
> > > -- the client key (could live in /etc/pve/priv instead?)
> > > -- or even the sshd instance altogether (would allow not touching the
> > > regular sshd config at all)
> >
> > Non-issue in the ssh certs approach.
>
> on the contrary, all of the above are also valid for cert-based auth..
I will keep this out of scope here, but happy to discuss this if interested - I
find it to be adding complexity where it is not needed.
> > What's the counter-argument to ssh certs, given the simlicity in comparison
> > with both the old approach and the intially suggested one here?
>
> the one sentence summary - it doesn't get us closer to where we want to end
> up (either getting rid of SSH entirely, or full disentangling PVE-internal
> SSH use from the regular, default sshd instance), but adds more complexity
> instead.
I could have provided POC which would be equivalent to ~6 shell lines that need
to be executed upon cluster join addressing both known_hosts and
authorized_keys with no mandatory argv[], but I suspect the patches are well on
the way now and unless it disrupts the components that now make use of the new
helper - everyone prefers what they know and what has worked so far. I happen
to known ssh certs, so hence my original take. I believe the certs are also
future proof (without extra effort) when it comes to e.g. keys rotation
(expiries built in), something PVE does not take into account as of today and
tracked separately as bug report #5174 [4].
[4] https://bugzilla.proxmox.com/show_bug.cgi?id=5174
ONE FINAL POINT:
The original merging, juggling, accounting for hashed known_hosts, etc.
elaborate codepath - I humbly believe - was all done in the interest of making
SSH transparent to the user within the cluster. The proposed approach here is
much cleaner, but it takes out that original objective off the feature list. I
suspect it was important enough then to make it worth of all the efforts going
into this. It would be a pity if that is lost when it does not have to be. The
certs have that built-in, but the KnownHostsCommand also retains that without
any additional risk.
--- End Message ---