Hi, I had occasion to do a base OS upgrade on my long-running OpenDNSSEC signer host. This was a minor kernel version upgrade, so should be uneventful. For the OS itself, that turned out to be true, but for OpenDNSSEC, "not so much":
Jun 11 12:24:34 signer-host ods-signerd: [zone] zone 0.39.128.in-addr.arpa set soa serial to 2022061106 Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: key e9128a1b42447b79b3ea79883fd14e71 not found Jun 11 12:24:35 signer-host ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: hsm failed to create dnskey Jun 11 12:24:35 signer-host ods-signerd: [zone] unable to prepare signing keys for zone 0.37.158.in-addr.arpa: error getting dnskey Jun 11 12:24:35 signer-host ods-signerd: [worker[3]] CRITICAL: failed to sign zone 0.37.158.in-addr.arpa: General error What? This appears to be repeated for a lot of my other zones. What exactly might be the cause of this error? My /var/db/softhsm/ae4e912f-a382-8ebc-9d53-7710d137beac appears to be intact and contains some 3531 files. (Yes, using the "files" backend.) About half of them are .lock files (is that normal?) However, there doesn't appear to exist any CLI tools in OpenDNSSEC or SoftHSM2 which might assist me in closing in on why exactly ods-signerd can't find this key anymore. I've found softhsm2-dump-file, but that utility doesn't make it possible to identify the Key ID and compare it to what OpenDNSSEC logs. I've done a system call trace of ods-signerd, and it appears to open each and every .object file in the storage directory, as well as the "generation" file, but other than that I wasn't able to narrow in on eaxtly what the error might be. Sigh! The generation file contains ods @ signer-host: {17} od -c generation 0000000 \0 \0 \0 \0 \0 \0 \0 001 0000010 ods @ signer-host: {18} Another example: Jun 11 13:10:55 signer-host ods-signerd: [hsm] unable to get key: key 9215be91f53b9559ba585e59c28f89a2 not found Jun 11 13:10:55 signer-host ods-signerd: [hsm] unable to get key: hsm failed to create dnskey Jun 11 13:10:55 signer-host ods-signerd: [worker[4]] CRITICAL: failed to sign zone 5.39.158.in-addr.arpa: General error However, that key "CKA_ID" isn't visible in the output for "ods-enforcer key list -v" for that zone (or in general): ods @ signer-host: {17} ods-enforcer key list -v -z 5.39.158.in-addr.arpa Keys: Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag: 5.39.158.in-addr.arpa KSK active 2022-06-11 15:00:02 2048 13 8e803f96e9eb78352ae987216b00edb6 SoftHSM 53679 5.39.158.in-addr.arpa ZSK active 2022-06-11 15:00:02 1536 13 84830d6a2a949f7e633924375b612d7d SoftHSM 61387 ods @ signer-host: {18} ods @ signer-host: {18} ods-enforcer key list -v | grep 9215be91f53b9559ba585e59c28f89a2 ods @ signer-host: {19} This zone earlier used RSA keys, but has been rolled over to use ECDSA keys. However, the file in /var/opendnssec/tmp/5.39.158.in-addr.arpa.backup2 contains this ID: 5.39.158.in-addr.arpa.backup2:;;Key: locator 9215be91f53b9559ba585e59c28f89a2 algorithm 13 flags 256 publish 0 ksk 0 zsk 0 keytag 53312 While these 'tmp' files are temporary, I suspect the "locator" actually comes from somewhere else. signconf, perhaps? Yep, the signconf/5.39.158.in-addr.arpa.xml file contains <Keys> <TTL>PT1H</TTL> <Key> <Flags>256</Flags> <Algorithm>13</Algorithm> <Locator>9215be91f53b9559ba585e59c28f89a2</Locator> </Key> <Key> <Flags>257</Flags> <Algorithm>13</Algorithm> <Locator>8e803f96e9eb78352ae987216b00edb6</Locator> <KSK/> <Publish/> </Key> <Key> <Flags>256</Flags> <Algorithm>13</Algorithm> <Locator>84830d6a2a949f7e633924375b612d7d</Locator> <ZSK/> <Publish/> </Key> </Keys> but that doesn't jive with what "ods-enforcer key list" above gave me(!) So ... what might the actual root cause of this error be? Hm, I found "ods-enforcer signconf", and did that and restarted OpenDNSSEC, and the xml file above now contains <Keys> <TTL>PT1H</TTL> <Key> <Flags>257</Flags> <Algorithm>13</Algorithm> <Locator>8e803f96e9eb78352ae987216b00edb6</Locator> <KSK/> <Publish/> </Key> <Key> <Flags>256</Flags> <Algorithm>13</Algorithm> <Locator>84830d6a2a949f7e633924375b612d7d</Locator> <ZSK/> <Publish/> </Key> </Keys> and there are no new CRITICAL log messages. (Sigh of releif!) So ... why does it look like OpenDNSSEC doesn't keep its key list (via its signconf) up to date? I didn't think "ods-enforcer signconf" was among those things the operator needs to ensure is performed periodically? So ... the only new log messages on the console are Jun 11 13:34:59 signer-host ods-signerd: [backup] bad ixfr journal: trailing RRs after final SOA Jun 11 13:35:32 signer-host syslogd[197]: last message repeated 180 times I'll just note that the only one who writes that ixfr journal is OpenDNSSEC itself, so this is obviously a bug in OpenDNSSEC, in that it either can't properly write the log or read it in again. Additionally, the message doesn't mention which zone this error is logged for, so it is practically useless for the operator. ...and the occasional Jun 11 13:47:05 signer-host ods-enforcerd: [hsm_key_factory_delete_key] looking for keys to purge from HSM Why is that logged with so high severity? Only "err" or higher is logged to the console (via my syslog.conf). This is with OpenDNSSEC 2.1.8 and SoftHSM2 2.6.1. (I see I am a couple of versions behind the curent OpenDNSSEC.) Regards, - HÃ¥vard _______________________________________________ Opendnssec-user mailing list Opendnssec-user@lists.opendnssec.org https://lists.opendnssec.org/mailman/listinfo/opendnssec-user