To our users: Last week, reacting to reports from several users concerning assertion failures in BIND 9.10.4, we took the unusual step of deprecating that release while we investigated the problem: internal checks detecting a state in the cache data structure that should have been impossible.
Thanks to several users who shared their crash data with us, our developers have identified a problem. In the April 28 maintenance releases, the internal representation and packing of the 'node' structure used in the BIND cache was changed to reduce memory usage and increase performance. The packing change caused some single-bit flag values that were protected by one lock to share the same word in physical memory with flag values protected by a different lock. This creates the potential for a race condition: two threads can modify the same flag value simultaneously, leading to the inconsistent state that triggers the assertion failures. Though this flaw can occur with any compiler, it's substantially more likely to lead to a crash when BIND is compiled on the x86_64 platform using the 'clang' compiler and a difference in the node structure between BIND 9.9 and 9.10 makes the failure more likely to occur in BIND 9.10. However, operators who are running one of the affected versions (BIND 9.9.9, BIND 9.10.4, or BIND 9.9.9-S1) should replace those versions as soon as updated releases are available. Having identified what we believe to be the root cause, we are currently, with the help of some volunteers who were previously experiencing crashes in their operational environments, testing a candidate fix with (so far) good results. If no further failures occur, we expect to issue patch releases for all of the April 28 releases (BIND 9.9.9, BIND 9.10.4, and BIND 9.9.9-S1) If you're wondering how this affects you, we hope this summary may help: + Nothing we have seen so far suggests that this issue is a deliberately exploitable security vulnerability. + Completely authoritative servers are at extremely low risk (approaching zero) from this defect. Only recursive servers are at significant risk. If you are operating an authoritative server which does not perform recursion for clients, you can probably safely wait for replacement versions to be released and upgrade when convenient. + We have only received reports of INSIST exceptions in BIND 9.10.4. + The change which exposed the race condition exists in BIND 9.9.9 and BIND 9.9.9-S1 as well, but we have received no reports of INSIST errors occurring in those versions. They are possible but have a much lower probability of occurrence. + If you are running a recursive resolver on an affected version of BIND, you are at moderate risk unless you are running BIND 9.10.4 and your named binaries have been compiled with clang, in which case you are at higher risk. You have several options, including: - revert to BIND 9.9.8-P4, 9.10.3-P4, or 9.9.8-S6 until the replacement versions are officially released - retrieve and compile the current 9_9 or 9_10 branch from the ISC public git repository, which will contain the candidate fix which we expect to release next week or contact ISC Support for assistance with a patch if you are a customer with a support contract. - use a watchdog process to manage 'named' and restart it if it exits; upgrade when replacement versions are released. We'd like to once again thank the users who helped us to track this down and apologize for the inconvenience it has caused to our users. Michael McNally ISC Support _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users