2010/3/22 Cathy Almond <cat...@isc.org> > Fabien Seisen wrote: > > yes, max-cache-size 512M but named process takes ~900MB > > The extra memory is for keeping track of recursive clients (i.e. > in-progress client queries). >
ok This doesn't sound like a hugely loaded server, exact, on my own test (with "real life" queries), the server can handle ~70000 queries/s with response time ~1ms at 70% cpu and no packet lost. else it's somewhat throttled (not particularly large cache and probably > default limit on recursive clients). What kind of query rates do you have? Do you > get > any logging that suggests resource problems? If so, you might need to > increase some of the limits. > We have a pool of several more or less identicals servers with a load-balancer in front. On average, each server gets 1800 queries/s and 4000 at peak. The problem occurs every few weeks and never on all servers at a time. Recursive clients config is not modified (rndc status: recursive clients: 188/2900/3000) and we have - on avg: 200 recursive clients - at peak 600 It's intriguing that you're seeing the same issues on two bind versions > and two OS (and that other people's experience is different from yours) > only Solaris 10 - Solaris 10 U6 with bind 9.5.1-P3 with threads compiled with SUNSpro 12 - Solaris 10 U6 with bind 9.6.2 with threads compiled with gcc > - it suggests to me that it's specific to your configuration or client > base/queries or your environment. > we gets real life queries from customers (evil?). A simple "rndc flush" revives named. Perhaps, a bad formated packet freeze named or create a cache dead lock Can something go wrong in the cache ? I am not fluent with core files but i have got one in my pocket. For troubleshooting I'd start by looking at the logging output - if > you've got any categories going to null, un-suppress them temporarily; > and add query-errors (see 9.6.2 ARM). Then perhaps do some sampling of > network traffic (perhaps there's a UDP message size/fragmentation issue) > to see what's happening (or not). > all category to non-null and we do not use specific 9.6.2 configuration. I did not noticied weird log message (beside regular: shutting down due to TCP receive error: 202.96.209.6#53: connection reset) here is our log config: category client { client.log; }; category config { config.log; default_syslog; }; category database { database.log; default_syslog; }; category default { default.log; default_syslog; }; category delegation-only { delegation-only.log; }; category dispatch { dispatch.log; }; category general { default.log; }; category lame-servers { lamers.log; }; category network { network.log; }; category notify { notify.log; default_syslog; }; category queries { queries.log; }; category resolver { resolver.log; }; category security { security; }; category unmatched { unmatched.log; }; category update { update.log; }; category xfer-in { xfer-in.log; default_syslog; }; category xfer-out { xfer-out.log; default_syslog; }; -- Fabien
_______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users