Ian Jackson <[email protected]> [13/Mar  8:45pm GMT] wrote:
> 1. Extraneous copy of the key in tag2upload-builder-01:~tag2upload-builder
>
> The image rebuilder script copies the public key from its ~
> into ~builder in the image.
>
> AFAICT there is no other reason for ~tag2upload-builder to contain
> a copy of the public key.  It would probably be better if the
> image rebuilder script got the key from somewhere else, and we
> deleted the copy in ~tag2upload-builder.

ACK, sounds good.  Also we'd want to update the instructions saying to
put a copy of the key there.

> 2. We did not detect this impending breakage.
>
> In my experience, this kind of lossage is very common in systems
> involving expiry times.  It's all very well having a cron job detect
> the need for renewal, and a human process that is supposed to sort the
> thing out, but IME that is usually not sufficient.  Such things are
> IME prone to various kinds of ailure.  A backstop is needed.
>
> IMO we should have separate checks for each place a copy of the key
> exists, that alert for any failures or omissio of the key expiry
> arrangements.

I don't mind more cron jobs/e-mails so long as they are deployed cleanly
and are consistent with each other.

> Based on my experience maintaining systems where things can expire
> (eg, Let's Encrypt certificates, domain names) I suggest:
>
> * Every location which has a copy of the key, that anything relies on,
>   should be checked daily by cron.  I think the locations where
>   an expired key would break the system, or break downstreams, are
>        - tag2upload builder VM (~builder)
>        - oracle (~tag2upload-oracle)
>        - Debian archive tag2upload keyring .deb
>        - dak
>        - dgit-repos
>        - the copy on the wiki
>
> * There should be one central cron job which is responsible for
>   sending an email at least once a day if any of these are going to
>   expire within the next (say) 21 days.
>
> * The site-specific information could be collected by push or by pull.
>   For example, supposing the central copy is on the manager, oracle
>   and builder could ssh to the manager daily to deposit copies of
>   their keys.  A cron job on the manager could wget the wiki.
>
> * Arguably the cron job which sends the emails should *not* run on the
>   manager, since the manager already has the normal cron job that is
>   supposed to prompt us to do the updates.  If for some reason cron
>   jobs on the manager don't run or can't email us, we'd miss the memo.
>
>   The daily "thing is wrong" cron job could however *retrieve* the
>   information from the manager (over public https) and check it.

Tbh I think this is overengineering.  Why not just add separate cron
jobs for all of the places?

-- 
Sean Whitton

Reply via email to