On 2022-06-11 14:05, Havard Eidnes via Opendnssec-user wrote:
I had occasion to do a base OS upgrade on my long-running
OpenDNSSEC signer host. This was a minor kernel version upgrade,
so should be uneventful. For the OS itself, that turned out to
be true, but for OpenDNSSEC, "not so much":
Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: key
e9128a1b42447b79b3ea79883fd14e71 not found
Jun 11 12:24:35 signer-host ods-signerd: [hsm] hsm_get_dnskey(): Got
NULL key
Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: hsm
failed to create dnskey
Jun 11 12:24:35 signer-host ods-signerd: [zone] unable to prepare
signing keys for zone 0.37.158.in-addr.arpa: error getting dnskey
Jun 11 12:24:35 signer-host ods-signerd: [worker[3]] CRITICAL: failed
to sign zone 0.37.158.in-addr.arpa: General error
What exactly might be the cause of this error?
These errors are generic to not being able to load the keys from the
HSM,
so we only need to look at the OpenDNSSEC connection to SoftHSM and
downwards.
My /var/db/softhsm/ae4e912f-a382-8ebc-9d53-7710d137beac appears
to be intact and contains some 3531 files. (Yes, using the
"files" backend.) About half of them are .lock files (is that
normal?)
Files backend is fine, and having that many files and lock files
is a pretty sane setup. The lock files keep being present even
if there is no lock and the keys are distributed over many files.
However, there doesn't appear to exist any CLI tools in
OpenDNSSEC or SoftHSM2 which might assist me in closing in on why
exactly ods-signerd can't find this key anymore. I've found
softhsm2-dump-file, but that utility doesn't make it possible to
identify the Key ID and compare it to what OpenDNSSEC logs.
There isn't a specific tool for SoftHSM, though other PKCS#11
tools will work. But there is ods-hsmutil which does not rely
on the inner workings and only used the conf.xml file and is
able to list all keys (regardless of status and whether for OpenDNSSEC
or not). Try:
ods-hsmutil list
Or better something like:
ods-hsmutil list | grep 84830d6a2a949f7e633924375b612d7d
It should output a line with the key. If it outputs a line with the
key there's something wrong with OpenDNSSEC. If it outputs
nothing there's something wrong with either the setup/reference
to SoftHSM, a permissions problem or a deep problem in SoftHSM.
But there could also be a line stating:
hsm_session_init(): PKCS#11 module load failed:
Indicating that the SoftHSMv2 isn't loading properly anymore.
From the next message I don't think that is the case anymore.
This zone earlier used RSA keys, but has been rolled over to use
ECDSA keys.
However, the file in
/var/opendnssec/tmp/5.39.158.in-addr.arpa.backup2 contains this
ID:
5.39.158.in-addr.arpa.backup2:;;Key: locator
9215be91f53b9559ba585e59c28f89a2 algorithm 13 flags 256 publish 0 ksk
0 zsk 0 keytag 53312
While these 'tmp' files are temporary, I suspect the "locator"
actually comes from somewhere else. signconf, perhaps? Yep, the
signconf/5.39.158.in-addr.arpa.xml file contains
Correct there tmp reference are based on the signconfs.
but that doesn't jive with what "ods-enforcer key list" above
gave me(!)
So there has been a change over time in how long which keys
are left behind as a reference in the zone. This could lead
to issues as also tackled by issue OPENDNSSEC-956. This probably
isn't the case here (as per your next item), but ods-enforcer key
list isn't necessarily 100% in sync with the signconf.
So ... what might the actual root cause of this error be?
Hm, I found "ods-enforcer signconf", and did that and restarted
OpenDNSSEC, and the xml file above now contains
and there are no new CRITICAL log messages.
(Sigh of releif!)
So ... why does it look like OpenDNSSEC doesn't keep its key list
(via its signconf) up to date? I didn't think "ods-enforcer
signconf" was among those things the operator needs to ensure is
performed periodically?
After writing my initial response I saw at the bottom that your
running OpenDNSSEC 2.1.8, whilst in version 2.1.10 there has been
a fix to an issus that suspiciously looks like the thing your
running into. https://issues.opendnssec.org/browse/SUPPORT-265
That issue also can lead to signconf not being updated.
If the issue re-appears after upgrading, please let me know but
for now I'm assuming you have run into that issue. It occurs
on some edge (but existing) cases where keys are being removed
while also the signconf needed updating. This issue did not
present itself on much earlier issues due to some other changes.
So ... the only new log messages on the console are
Jun 11 13:34:59 signer-host ods-signerd: [backup] bad ixfr journal:
trailing RRs after final SOA
Jun 11 13:35:32 signer-host syslogd[197]: last message repeated 180
times
I'll just note that the only one who writes that ixfr journal is
OpenDNSSEC itself, so this is obviously a bug in OpenDNSSEC, in
that it either can't properly write the log or read it in again.
Additionally, the message doesn't mention which zone this error
is logged for, so it is practically useless for the operator.
So that will take me a bit more time to look into you, and I'd agree
with you that this isn't useful not to know which zone this references.
At least that should change, not the bad journal could be a result
from the earlier problem (still no good, but not that bad..).
Jun 11 13:47:05 signer-host ods-enforcerd:
[hsm_key_factory_delete_key] looking for keys to purge from HSM
Why is that logged with so high severity? Only "err" or higher
is logged to the console (via my syslog.conf).
I'll change that. Probably someone did that because it is such
a irreversible thing to do, removing keys from an HSM.
This is with OpenDNSSEC 2.1.8 and SoftHSM2 2.6.1. (I see I am
a couple of versions behind the curent OpenDNSSEC.)
Which distribution are you on? I am moving to add binary packaged
to some distributions.
Best regards,
Berry van Halderen
_______________________________________________
Opendnssec-user mailing list
Opendnssec-user@lists.opendnssec.org
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user