> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Vinny Abello > Sent: Friday, September 05, 2008 5:20 PM > To: [EMAIL PROTECTED] > Cc: bind-users@isc.org > Subject: RE: BIND 9.4.2-P2-W1 stops responding > > > -----Original Message----- > > From: Danny Mayer [mailto:[EMAIL PROTECTED] > > Sent: Friday, September 05, 2008 4:18 PM > > To: Vinny Abello > > Cc: bind-users@isc.org > > Subject: Re: BIND 9.4.2-P2-W1 stops responding > > > > Vinny Abello wrote: > > > OK, this happened again. This time I noticed that BIND was not > > responding on the primary IP bound to the server that it usually > would > > previously respond on. It kept answering queries on a secondary IP > > bound to the NIC however. Again, nothing in the logs indicating any > > type of problem that I can see. Perhaps this is related to having > > multiple IP's bound to the machine. I restarted the service and it > > started working again on both IP addresses. > > > > > > Any ideas? > > > > > >> -----Original Message----- > > >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > On > > >> Behalf Of Vinny Abello > > >> Sent: Friday, September 05, 2008 1:33 PM > > >> To: bind-users@isc.org > > >> Subject: BIND 9.4.2-P2-W1 stops responding > > >> > > >> I just upgraded from BIND 9.4.2 to BIND 9.4.2-P2-W1 on Windows > > Server > > >> 2003. The service no longer crashes like it did in P1 and P2, > > however > > >> after about 12 hours of load, named just stops responding to > queries > > >> completely. The service appears that it is still running but will > > not > > >> respond to any type of query. I've restarted it and it came back > to > > >> life again. I'm going to watch it more carefully to look for any > > other > > >> types of symptoms. I checked the log files and nothing out of the > > >> ordinary was in the logs. In fact, according to the logs, it > appears > > >> that zone transfers were still happily taking place while it was > not > > >> responding to queries. > > >> > > >> I don't know if these have anything to do with the issue, but > there > > are > > >> a few odd errors I noted after starting it back up that are > > appearing > > >> in the logs. They are: > > >> > > >> 05-Sep-2008 13:19:26.827 dispatch: dispatch 03E25098: shutting > down > > due > > >> to TCP receive error: <unknown address, family 48830>: network > > >> unreachable > > >> > > >> 05-Sep-2008 13:20:38.171 general: .\socket.c:2340: unexpected > error: > > >> 05-Sep-2008 13:20:38.171 general: unable to convert errno to > > >> isc_result: 121: The semaphore timeout period has expired. > > >> > > >> 05-Sep-2008 13:21:14.733 dispatch: dispatch 03E288B0: shutting > down > > due > > >> to TCP receive error: <unknown address, family 48830>: network > > >> unreachable > > >> > > >> 05-Sep-2008 13:21:44.122 general: .\socket.c:2340: unexpected > error: > > >> 05-Sep-2008 13:21:44.122 general: unable to convert errno to > > >> isc_result: 121: The semaphore timeout period has expired. > > >> > > >> 05-Sep-2008 13:23:35.351 general: .\socket.c:2340: unexpected > error: > > >> 05-Sep-2008 13:23:35.351 general: unable to convert errno to > > >> isc_result: 121: The semaphore timeout period has expired. > > >> > > >> 05-Sep-2008 13:24:41.300 general: .\socket.c:2340: unexpected > error: > > >> 05-Sep-2008 13:24:41.300 general: unable to convert errno to > > >> isc_result: 121: The semaphore timeout period has expired. > > >> > > >> > > >> There are other normal messages in between those errors. I just > > picked > > >> them out. > > >> > > >> Some possible information that might help with this server's > > >> configuration. This server has multiple IPv4 IP addresses bound to > > the > > >> same network and same NIC. There is no IPv6 stack installed on the > > >> server. This server currently does recursion and also hosts some > > >> secondary zones as well. > > >> > > >> > > >> -Vinny > > > > Try setting max-cache and see if that helps with the queries. Don't > > worry about those other error messages. They're harmless. > > > > Danny > > OK, I've added the avoid-v4-udp-ports to my named.conf with all the UDP > ports I could identify were being used by other applications including > my RADIUS service. I've restarted and I'll see if this helps at all.
Well, that had no effect. Still seems to die pretty frequently. I can't easily catch and restart BIND every 30 minutes so I'm going to have to replace this server with a different one running an operating system that behaves better with BIND. I already did this on my other two name servers. If you have any other ideas or reasons I shouldn't abandon BIND on Windows, let me know while I can still test it. -Vinny