In message <4de9045c.2050...@anl.gov>, Barry Finkel writes: > I have a problem with BIND 9.7.x on Ubuntu. > I have two servers that are running 9.7.3. > They slave 332 zones, and they also master 213,750 > malware/spyware zones that we have defined to reroute these > domains to a local machine. > > When I was upgrading the BIND to 9.7.3-P1 yesterday, an > > ./rndc stop > > command ran over 8 minutes, and named did not stop. > A "kill" command did not work; I had to revert to a > "kill -9" command. What was BIND doing? Gracefully > closing all of the zones?
Most probably. "rndc stop" ensures that masterfiles are up-to-date before exiting. "rndc halt" does not try to flush master files before exiting. There could also have been a reference leak causing named to not stop. > BIND 9.7.3-P1 came up fine, but there are two things that concern me: > > 1) After BIND began responding to queries, it was using > 100% of the CPU for about three minutes. I am not sure what > BIND was doing. This is not major because BIND was handling > customer queries, and after the three minutes the CPU usage > dropped to a normal 1%. > > 2) Two zones reported serial number decreases. This is bad. > > I did some research on the two zones - both Microsoft > Active Directory zones (one _tcp and one _udp) that are mastered > on a Windows Domain Controller and slaved on my BIND boxes. > I have around 44 AD zones I slave, and only these two reported > problems - on my two internal Ubuntu slaves and my two Solaris 10 > slaves. The two Solaris 10 slaves do not run the spyware zones, > so I had no problem with "./rndc stop". I therefore am not sure > that the serial number problems are due to the "kill -9". They shouldn't be. The handling of master files and journals is designed to have the power be pull at anytime provided the filesystem supports atomic replacement of files. > I looked at the serial number issue on these two zones in detail; > I capture the serial numbers on all the AD zones each morning at > 6:10. Here is information for the _tcp zone: > > Date Zone Mast Slav Slav > 20 Oct 2010 _tcp. 1233 1233 1233 > 21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial. > ... > 09 Nov 2010 _tcp. 1239 1239 1239 > 10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch > 11 Nov 2010 _tcp. 1238 1238 1238 > ... > 03 Dec 2010 _tcp. 1238 1238 1238 > 04 Dec 2010 _tcp. 1238 1238 1239 ?? > 05 Dec 2010 _tcp. 1238 1239 1238 ?? > 06 Dec 2010 _tcp. 1238 1238 1238 > ... > 09 Dec 2010 _tcp. 1238 1238 1238 > 10 Dec 2010 _tcp. 1238 1238 1239 ?? > 11 Dec 2010 _tcp. 1238 1239 1238 ?? > 12 Dec 2010 _tcp. 1238 1238 1238 > ... > 05 Jan 2011 _tcp. 1238 1238 1238 > 06 Jan 2011 _tcp. 1238 1239 1239 ?? > 07 Jan 2011 _tcp. 1238 1238 1238 > ... > 02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3 > 03 Mar 2011 _tcp. 1238 1239 1239 > 04 Mar 2011 _tcp. 1238 1238 1238 > ... > 16 Apr 2011 _tcp. 1238 1238 1238 > 17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added. > ... > 02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1 > 03 Jun 2011 _tcp. 1238 1239 1239 1239 1239 > > Both Ubuntu slaves have been up for 149 days (reboot around Jan 15). > The zone serial was 1239 until a MS patch run on the Domain > Controller decreased the serial by one on the evening of Nov 9. > I did nothing to correct the problem; I waited for the two zones > to expire, and then new zones were transferred from the Windows > master server. The serial number was 1238 on the master and > slaves. On a few days, the serial on the slaves increased > by one, and I am not sure what happened on those days. > > On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the > serial numbers on the two upgraded BIND slaves reverted to the > higher 1239 serial. Again, I did no fixup, and on Mar 04 > the serials were the same at the lower value. I think that the > serial number decrease was temporary during the patch run. > On Apr 17 I added the two Solaris 10 slaves to my morning report, and > all five serials were contant at 1238 until I upgraded BIND Tuesday (on > the Solaris 10 boxes) and yesterday (on the Ubuntu boxes). Immediately > after the upgrade BIND reported the serial number problem on these two > zones. The other AD zones have had no serial number problems. > > I have no idea why BIND would remember the increased 1239 > serial number, when the serial number for the zone has been constant > at 1238 since Mar 04. I have to assume that between Mar 04 and > Jun 03 BIND would have written the zone to disk, either in the > base zone file or a .jnl file. > > -- > ---------------------------------------------------------------------- > Barry S. Finkel > Computing and Information Systems Division > Argonne National Laboratory Phone: +1 (630) 252-7277 > 9700 South Cass Avenue Facsimile:+1 (630) 252-4601 > Building 240, Room 5.B.8 Internet: bsfin...@anl.gov > Argonne, IL 60439-4828 IBMMAIL: I1004994 > _______________________________________________ > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users