On 11-Mar-19 03:52, Mark Andrews wrote: > Because you removed the key from disk before it was removed from the zone. > Presumably named > was logging other error messages before you removed the key from disk or the > machine was off > for a period or you mismanaged the key roll and named keep the key alive. > > Named’s re-signing strategy is different to when you are signing the whole > zone at once as > you are signing it incrementally. You should be allowing most of the > sig-validity interval > before you delete the DNSKEY after you inactive it. One should check that > there are no RRSIGs > still present in the zone before deleting the DNSKEY from the zone. > Inactivating it stops the > DNSKEY being used to generate new signatures but it needs to stay around > until all those RRSIGs > have expired from caches which only happens after new replacement signatures > have been generated.
There are a lot of these "administrator should know" events and timeouts in DNSSEC. One could argue that these complexities are one of the barriers to adoption. It seems worth considering ways to make life easier, for administrators and automation alike. A few thoughts come immediately to mind - no doubt there are more: - Rather than documenting "wait for n TTLs (or sig-validity interval)", have bind log events that require/enable administrator actions (at non-debug levels), such as: "key (keyid) /foo/bar/.. no longer required and can be removed" - issue at inactivation + max TTL of any RRSIG is signed. Allows an admin (or script) to know when it's safe rather than requiring research and/or math. "key (keyid) /foo/baz... is now signing zone(s) example.net,example.org. It expires on <> and will be removed on <>" - Provide an "obsolete-keys" directory - have named move keys that are no longer required there. (Or delete the files. But emptying obsolete-keys, like emptying /tmp, can be automated, and deleting a key might be a problem if forensics - or audits - is required.) The key idea is that an admin never removes a file from "keys". And that should prevent mistakes. - Rather than relying on the keys directory for signing, use it only to import/update keys. Once named starts using a key, put a copy (or move it) to ".active-keys" - or a database file - that persists as long is the protocol requires it. If the file in the keys directory is updated with new dates, generate the appropriate events - but work from .active-keys. If the file disappears from "keys" before it should, use .active-keys to restore it -- and add a comment explaining why. ("# Restored by named at 1-apr-2411: sig-validity interval for lost.example.net (internal) extends to 15-may-2412") - Provide an rndc show class command (or stats channel output) that explains the status/fate of each signing key. Perhaps a table: Key Zone view State created publish active deactivate remove next_event key (keyid) /foo/baz... example.net external Published 1-jan-2000 1-jun-2000 1-Jul-2000 31-dec-2000 1-feb-2001 activate 1-Jun-2000 # Assumes today is 11-Mar-2000 key (keyid) /foo/baz... example.org external Published 1-jan-2000 1-jun-2000 1-Jul-2000 31-dec-2000 1-feb-2001 activate 1-Jun-2000 # Same key, different zone - Think more about what admins want to do, rather than how named (and the protocols) do it. E.g. "sign a zone", "roll key now|every month", "use latest|specified|safest signature algorithm | key length", "enable/disable nsec|nsec3", "unsign zone"... Provide scripts and/or named primitives that do this. "dnssec settime -xyz" doesn't do a good job of specifying intent - one has to do a lot of math, and the intent isn't logged - just the date change. I'm aware of the dnssec keymgr effort - it's still more oriented to timeouts and e.g. coverage periods than to what one wants to accomplish. (As far as I can tell, it also doesn't support multiple views - which makes it unusable for me. I don't think this is an unusual configuration...) If you look at validate() in policy.py.in, there are 6 different errors for conditions involving timer relationships. [And the errors are reported in seconds, not even as something vaguely human - such as 57w2d1h30m12s.] Why not (by default) adjust the timers & log the result? I'm sure someone will opine that for every case, there's a choice between shrinking one timer and extending the other. This is undoubtedly true. But better to pick a strategy that is consistent with safe practice than to kick back each error to an admin. An admin who has particular requirements can read the log. But for those who "just want things to work", I suspect that we can identify a driver (I nominate key lifetime) & adjust everything else to fit... I'm sure there are some challenges in the details - but I hope the message is clear. Avoid blaming the admin for trying to make things work. Instead, package actions at admin-oriented levels of abstraction. Guard data that named needs, and avoid having the admin manipulate live files (where mistakes can be made). I do want to acknowledge the considerable efforts already made to make DNSSEC more usable. They have helped, but as evidenced by the exchange that precipitated this noted, the level of abstraction is still too low.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users