Hi!

We use Bind with inline-signing as "bump-in-the-wire". We started with Bind 9.9, used 9.10 (several versions) and recently we switched to 9.11.0-P2.

All of them showed the same 2 problems:

1. Bind is in a signing loop and consumes memory until killed by Linux' OOM-killer 2. Bind produces broken zones (signatures not updates, invalid signatures, missing RRSIGs ..)

Problem 1 was already reported in detail to bind-b...@isc.org but we never received an answer.

So, I will describe the problems in more detail below. It would be great if you can give us some advice how we can track this down.

ad 1) Bind endlessly resigns a zone. In the logs this is shown as "sending NOTIFYs" due to the increased SOA and slaves fetching the zone. Bind itself slaves the zone from a hidden master. But the zone on the hidden master is not updated:

20:38:09 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): sending notifies (serial 5691271) 20:38:10 named[3374]: client @0x7fe570031500 11.22.34.27#53632 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR started
 (serial 5691289)
20:38:10 named[3374]: client @0x7fe570031500 11.22.34.27#53632 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR ended 20:38:10 named[3374]: client @0x7fe5780cb530 11.22.34.29#57629 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR started
 (serial 5691302)
20:38:10 named[3374]: client @0x7fe5780cb530 11.22.34.29#57629 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR ended 20:38:14 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): sending notifies (serial 5691381) 20:38:15 named[3374]: client @0x7fe578496d60 11.22.34.27#36770 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR started
 (serial 5691416)
20:38:15 named[3374]: client @0x7fe578496d60 11.22.34.27#36770 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR ended 20:38:15 named[3374]: client @0x7fe570031500 11.22.34.29#45449 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR started
 (serial 5691421)
20:38:15 named[3374]: client @0x7fe570031500 11.22.34.29#45449 (klaus-dev.dnssec-signiert.at): transfer of 'klaus-dev.dnssec-signiert.at/IN': AXFR ended 20:38:19 named[3374]: zone klaus-dev.dnssec-signiert.at/IN (signed): sending notifies (serial 5691509)

While doing this Bind consumes more and more memory until killed by OOM killer. After restarting Bind it is running fine again.

On our production server we have this issue every 2 or 3 month. On our development server we have this issue every second day. The difference are the ZSK rollover timings: prod: ZSK rollover every 90 days, sig-validity-interval=30days, ~350 zones dev: ZSK rollover every 2 days, sig-validity-interval=1day, ~10 zones, dnssec-dnskey-kskonly

On the dev system we have multiple published and active keys which is for sure an untypical setup, but nevertheless Bind should not endlessly resign the zone.


ad 2) Before we deploy the signed zone on the public name servers we verify the zone with validns, dnssec-verify and ldns-verify. When receiving an NOTIFY from Bind we AXFR the zone and then let the tools inspect the zone. Once a month we have a broken zone (reported identically by all 3 tools). Typical errors are (here the validns reports)
no corresponding NSEC3 found for ...
NSEC3 mentions RRSIG, but no such record found for ...
NSEC3 without a corresponding record (or empty non-terminal)
bad SHA-256 hash length
broken NSEC3 chain, expected ... but found ...
NSEC3 mentions NSEC3PARAM, but no such record found for

Sometimes we are lucky and we can solve the problem with "rndc sign ..." or "rndc retransfer ...". Most of the time all this tricks do not work and even a Bind restart does not help. In such a case we have to stop Bind, delete the zone file and the journal file, and then start Bind (causing a fresh new incoming AXFR and signing). We do have archived this broken zone files for inspection.


We are willing to spend time debugging these issues (when they happen again) if you can give us some advice what we should check in case of an error.

Thanks
Klaus





_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to