On 08/05/13 19:15, Tom Sommer wrote: > > On 5/8/13 12:25 PM, Cathy Almond wrote: >> On 08/05/13 08:26, Tom Sommer wrote: >>> Hi, >>> >>> I have a problem with one of 3 slave servers, all set up the exact same >>> way, with the exact same bind version and configuration. >>> >>> One slave has a problem transfering zones from the master. >>> >>> The logfiles are flooded with "received notify for zone" .. "refresh in >>> progress, refresh check queued" lines and "rndc status" returns a >>> constant high number of "soa queries in progress". >>> After a few hours the zones are transfers, so the connection to the >>> master is working, but there is a major delay. I tried resetting the >>> slave and transfering ALL slave zones again, which worked fine >>> instantly. The problem still appeared again after a few hours though. >>> >>> The master has three network-paths, one on external IP, one on internal >>> IP and one on IPv6. All 3 paths work fine, because the transfers happen >>> after an hour or so. >>> >>> There is no hints in the master's log. >>> The other two slaves are running perfectly, no errors or delays what so >>> ever. >>> >>> Bind version 9.9.2-P2 (recently upgraded to). >>> >>> Any hints would be appreciated, as I feel like I've exhausted most >>> options. >>> >>> Thank you. >> Have a look at this KB article (you'll need to register to view - but >> registration is open to all): >> >> https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html >> >> >> Also - and this isn't covered in that article (yet) - if you're using >> views, then use-alt-transfer-source defaults to 'yes'. You might want >> to set it explicitly to 'no' or to define alt-transfer-source >> and/or alt-transfer-source-v6. >> > Thank you, great resource. I think I solved it with raising > serial-query-limit, it's just odd that it's not required on the other > two servers. > > Another issue has arisen now though, the logfile is filled with lots of > named[5596]: zone example.com/IN: refresh: failure trying master > 1.2.3.4#53 (source 0.0.0.0#0): operation canceled > > But if I do a "dig example.com @1.2.3.4" it's working just fine. Same > server as with the previous issue. > > Any thoughts? Thank you. > > // Tom
I don't think you solved the problem - I think you moved it (or made it happen faster...) The refresh errors indicate that the master isn't responding to your slave for some reason. That's what you'll need to investigate. I would suggest auditing the differences between this slave and the others in their named configurations as well as their configured IP interfaces and routing tables. A pair of network packet traces (slave and the non-responding auth server) might also point you in the right direction. Cathy _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users