Hi Stas, I've raised a bug ticket (#21479) with your report below. In general, for a problem like this, if it doesn't already appear in bind-users with any explanation, then send email to bind9-bugs to report the problem.
9.5 introduced LRU cache - this is most likely why you are seeing a difference between 9.4 and 9.6. We'll be in touch via the bug ticket report. Kind regards, Cathy Stas Pirogov wrote: > Hello, > > first let me apologize for the length of this message. > I will try to be as short as I can. > > Today we have around 20 servers running bind 9.4 and 9.6 (latest versions) > on CentOS 5.x (between 5.2 and 5.5) with 2.6 64bit kernel. > > Our servers have around 35000 zones with overall of 250M of disk space > used for them. > > On load bind takes around 900M of memory. > > For bind 9.4 we used 1000M max-cache which allowed us having named grow > to up to 2.3G of resourses in memory. > > Since bind 9.6 (I didn't try this on 9.5) we have trouble managing amount > of memory that bind will use. Even having max-cache of default 2M will > eventually bring named to more than 3G of resources and at this point > strange things begin to happen: > > 1. With non-multithreaded bind the 'rndc flush' (which we run once a day) > will crash bind and produce following log entry: > > 05-Jun-2010 05:10:03.684 general: info: received control channel command > 'flush' > 05-Jun-2010 05:10:03.684 general: critical: cache.c:978: fatal error: > 05-Jun-2010 05:10:03.684 general: critical: > RUNTIME_CHECK(((*((&cache->cleaner.lock)))++ == 0 ? 0 : 34) == 0) failed > 05-Jun-2010 05:10:03.684 general: critical: exiting (due to fatal error in > library) > > This is from bind 9.7.0-P2. The cache.c line 978 contains: > > LOCK(&cache->cleaner.lock); > > 2. With threaded bind the 'rndc flush' will create situation at which the > named is still running, but there's no service. > > Here are some outputs of such hanging process from bind 9.6.2-P1: > > ps auxww: > > root 2248 25.3 74.8 3153312 3029568 ? Ssl May17 7918:25 > /usr/local/sbin/named -4 -n 2 > root 15281 0.0 0.0 39292 1456 ? Ssl 05:09 0:00 > /usr/local/sbin/rndc flush > > pstack: > > Thread 5 (Thread 0x41206940 (LWP 2249)): > #0 0x000000377fc0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000560c2a in run () > #2 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 > #3 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x41c07940 (LWP 2250)): > #0 0x000000377fc0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x000000377fc08e1a in _L_lock_1034 () from /lib64/libpthread.so.0 > #2 0x000000377fc08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x00000000004564d7 in water () > #4 0x0000000000554820 in isc__mem_get () > #5 0x0000000000493a8b in createiterator () > #6 0x000000000045633a in dns_cache_flush () > #7 0x000000000050698d in dns_view_flushcache () > #8 0x000000000041e1bf in ns_server_flushcache () > #9 0x000000000040b720 in ns_control_docommand () > #10 0x000000000040e718 in control_recvmessage () > #11 0x0000000000560d9c in run () > #12 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 > #13 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x42711940 (LWP 2251)): > #0 0x000000377fc0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000573c00 in isc_condition_waituntil () > #2 0x0000000000562df9 in run () > #3 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 > #4 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x43112940 (LWP 2252)): > #0 0x000000377f4d4108 in epoll_wait () from /lib64/libc.so.6 > #1 0x0000000000570b8d in watcher () > #2 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 > #3 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x2ae3ff041530 (LWP 2248)): > #0 0x000000377f4307bf in sigsuspend () from /lib64/libc.so.6 > #1 0x000000000056426e in isc_app_run () > #2 0x00000000004124eb in main () > > pmap: > > 2248: /usr/local/sbin/named -4 -n 2 > Address Kbytes RSS Dirty Mode Mapping > 0000000000400000 1860 1428 0 r-x-- named > 00000000007d0000 56 48 28 rw--- named > 00000000007de000 8 8 8 rw--- [ anon ] > 0000000012b40000 750300 750084 750084 rw--- [ anon ] > 0000000040806000 4 0 0 ----- [ anon ] > 0000000040807000 10240 36 36 rw--- [ anon ] > 0000000041207000 4 0 0 ----- [ anon ] > 0000000041208000 10240 36 36 rw--- [ anon ] > 0000000041d11000 4 0 0 ----- [ anon ] > 0000000041d12000 10240 8 8 rw--- [ anon ] > 0000000042712000 4 0 0 ----- [ anon ] > 0000000042713000 10240 8 8 rw--- [ anon ] > 000000377f000000 112 48 0 r-x-- ld-2.5.so > 000000377f21b000 4 4 4 r---- ld-2.5.so > 000000377f21c000 4 4 4 rw--- ld-2.5.so > 000000377f400000 1336 432 0 r-x-- libc-2.5.so > 000000377f54e000 2044 0 0 ----- libc-2.5.so > 000000377f74d000 16 16 8 r---- libc-2.5.so > 000000377f751000 4 4 4 rw--- libc-2.5.so > 000000377f752000 20 16 16 rw--- [ anon ] > 000000377f800000 8 0 0 r-x-- libdl-2.5.so > 000000377f802000 2048 0 0 ----- libdl-2.5.so > 000000377fa02000 4 4 4 r---- libdl-2.5.so > 000000377fa03000 4 4 4 rw--- libdl-2.5.so > 000000377fc00000 88 64 0 r-x-- libpthread-2.5.so > 000000377fc16000 2044 0 0 ----- libpthread-2.5.so > 000000377fe15000 4 4 4 r---- libpthread-2.5.so > 000000377fe16000 4 4 4 rw--- libpthread-2.5.so > 000000377fe17000 16 4 4 rw--- [ anon ] > 0000003780000000 520 8 0 r-x-- libm-2.5.so > 0000003780082000 2044 0 0 ----- libm-2.5.so > 0000003780281000 4 4 4 r---- libm-2.5.so > 0000003780282000 4 4 4 rw--- libm-2.5.so > 0000003780400000 80 4 0 r-x-- libz.so.1.2.3 > 0000003780414000 2044 0 0 ----- libz.so.1.2.3 > 0000003780613000 4 4 4 rw--- libz.so.1.2.3 > 0000003781c00000 1228 12 0 r-x-- libxml2.so.2.6.26 > 0000003781d33000 2048 0 0 ----- libxml2.so.2.6.26 > 0000003781f33000 36 20 16 rw--- libxml2.so.2.6.26 > 0000003781f3c000 4 0 0 rw--- [ anon ] > 0000003782c00000 12 4 0 r-x-- libcap.so.1.10 > 0000003782c03000 2048 0 0 ----- libcap.so.1.10 > 0000003782e03000 4 4 4 rw--- libcap.so.1.10 > 00002aaaaaacc000 188 188 188 rw--- [ anon ] > 00002aaaaaafc000 85540 85540 85540 rw--- [ anon ] > 00002aaaafe86000 18460 18444 18444 rw--- [ anon ] > 00002aaab10f6000 264 260 260 rw--- [ anon ] > 00002aaab11a1000 260 260 260 rw--- [ anon ] > 00002aaab11e3000 11440 11440 11440 rw--- [ anon ] > 00002aaab1d10000 3120 3112 3112 rw--- [ anon ] > 00002aaab201d000 13260 13232 13232 rw--- [ anon ] > 00002aaab2d11000 6760 6744 6744 rw--- [ anon ] > 00002aaab33ac000 5200 5196 5196 rw--- [ anon ] > 00002aaab38c1000 3380 3372 3372 rw--- [ anon ] > 00002aaab3c0f000 5200 5156 5156 rw--- [ anon ] > 00002aaab4124000 13260 13184 13184 rw--- [ anon ] > 00002aaab4e18000 520 520 520 rw--- [ anon ] > 00002aaab4e9b000 23660 23656 23656 rw--- [ anon ] > 00002aaab65b7000 7280 7276 7276 rw--- [ anon ] > 00002aaab6cd4000 780 780 780 rw--- [ anon ] > 00002aaab6d98000 7280 7276 7276 rw--- [ anon ] > 00002aaab74b5000 2860 2852 2852 rw--- [ anon ] > 00002aaab7781000 1820 1820 1820 rw--- [ anon ] > 00002aaab7949000 8320 8320 8320 rw--- [ anon ] > 00002aaab816a000 22880 22868 22868 rw--- [ anon ] > 00002aaab97c3000 1820 1820 1820 rw--- [ anon ] > 00002aaab998b000 10660 10656 10656 rw--- [ anon ] > 00002aaaba3f5000 10400 10400 10400 rw--- [ anon ] > 00002aaabae1e000 23660 23644 23644 rw--- [ anon ] > 00002aaabc53a000 1040 1036 1036 rw--- [ anon ] > 00002aaabc63f000 780 776 776 rw--- [ anon ] > 00002aaabc703000 780 776 776 rw--- [ anon ] > 00002aaabc7c7000 1560 1556 1556 rw--- [ anon ] > 00002aaabc94e000 2600 2584 2584 rw--- [ anon ] > 00002aaabcbd9000 1040 1032 1032 rw--- [ anon ] > 00002aaabccde000 1560 1556 1556 rw--- [ anon ] > 00002aaabce65000 780 776 776 rw--- [ anon ] > 00002aaabcf29000 780 776 776 rw--- [ anon ] > 00002aaabcfed000 780 776 776 rw--- [ anon ] > 00002aaabd0b1000 520 516 516 rw--- [ anon ] > 00002aaabd134000 780 772 772 rw--- [ anon ] > 00002aaabd1f8000 520 520 520 rw--- [ anon ] > 00002aaabd27b000 1560 1560 1560 rw--- [ anon ] > 00002aaabd402000 780 780 780 rw--- [ anon ] > 00002aaabd4c6000 1040 1040 1040 rw--- [ anon ] > 00002aaabd5cb000 780 776 776 rw--- [ anon ] > 00002aaabd68f000 1820 1816 1816 rw--- [ anon ] > 00002aaabd857000 780 780 780 rw--- [ anon ] > 00002aaabd91b000 780 776 776 rw--- [ anon ] > 00002aaabd9df000 1040 1036 1036 rw--- [ anon ] > 00002aaabdae4000 1040 1032 1032 rw--- [ anon ] > 00002aaabdbe9000 1300 1300 1300 rw--- [ anon ] > 00002aaabdd2f000 780 776 776 rw--- [ anon ] > 00002aaabddf3000 520 520 520 rw--- [ anon ] > 00002aaabde76000 1820 1812 1812 rw--- [ anon ] > 00002aaabe03e000 780 776 776 rw--- [ anon ] > 00002aaabe102000 1300 1292 1292 rw--- [ anon ] > 00002aaabe248000 520 520 520 rw--- [ anon ] > 00002aaabe2cb000 1300 1300 1300 rw--- [ anon ] > 00002aaabe411000 780 780 780 rw--- [ anon ] > 00002aaabe4d5000 1300 1296 1296 rw--- [ anon ] > 00002aaabe61b000 780 776 776 rw--- [ anon ] > 00002aaabe6df000 1560 1560 1560 rw--- [ anon ] > 00002aaabe866000 520 520 520 rw--- [ anon ] > 00002aaabe8e9000 520 520 520 rw--- [ anon ] > 00002aaabe96c000 520 516 516 rw--- [ anon ] > 00002aaabe9ef000 780 780 780 rw--- [ anon ] > 00002aaabeab3000 780 776 776 rw--- [ anon ] > 00002aaabeb77000 1560 1556 1556 rw--- [ anon ] > 00002aaabecfe000 520 520 520 rw--- [ anon ] > 00002aaabed81000 520 520 520 rw--- [ anon ] > 00002aaabee04000 1300 1288 1288 rw--- [ anon ] > 00002aaabef4a000 1040 1036 1036 rw--- [ anon ] > 00002aaabf04f000 1040 1040 1040 rw--- [ anon ] > 00002aaabf154000 2340 2328 2328 rw--- [ anon ] > 00002aaabf39e000 2860 2852 2852 rw--- [ anon ] > 00002aaabf66a000 1820 1808 1808 rw--- [ anon ] > 00002aaabf832000 520 520 520 rw--- [ anon ] > 00002aaabf8b5000 1820 1816 1816 rw--- [ anon ] > 00002aaabfa7d000 1040 1036 1036 rw--- [ anon ] > 00002aaabfb82000 1820 1820 1820 rw--- [ anon ] > 00002aaabfd4a000 780 780 780 rw--- [ anon ] > 00002aaabfe0e000 1820 1808 1808 rw--- [ anon ] > 00002aaabffd6000 2080 2060 2060 rw--- [ anon ] > 00002aaac01df000 520 520 520 rw--- [ anon ] > 00002aaac0262000 520 520 520 rw--- [ anon ] > 00002aaac02e5000 1300 1292 1292 rw--- [ anon ] > 00002aaac042b000 520 516 516 rw--- [ anon ] > 00002aaac04ae000 780 776 776 rw--- [ anon ] > 00002aaac0572000 520 520 520 rw--- [ anon ] > 00002aaac05f5000 1300 1296 1296 rw--- [ anon ] > 00002aaac073b000 1040 1036 1036 rw--- [ anon ] > 00002aaac0840000 780 772 772 rw--- [ anon ] > 00002aaac0904000 1560 1556 1556 rw--- [ anon ] > 00002aaac0a8b000 780 780 780 rw--- [ anon ] > 00002aaac0b4f000 1300 1296 1296 rw--- [ anon ] > 00002aaac0c95000 1560 1552 1552 rw--- [ anon ] > 00002aaac0e1c000 520 516 516 rw--- [ anon ] > 00002aaac0e9f000 780 780 780 rw--- [ anon ] > 00002aaac0f63000 2080 2072 2072 rw--- [ anon ] > 00002aaac116c000 780 776 776 rw--- [ anon ] > 00002aaac1230000 520 516 516 rw--- [ anon ] > 00002aaac12b3000 2340 2324 2324 rw--- [ anon ] > 00002aaac14fd000 780 780 780 rw--- [ anon ] > 00002aaac15c1000 1560 1560 1560 rw--- [ anon ] > 00002aaac1748000 780 776 776 rw--- [ anon ] > 00002aaac180c000 1820 1816 1816 rw--- [ anon ] > 00002aaac19d4000 1820 1812 1812 rw--- [ anon ] > 00002aaac1b9c000 1040 1040 1040 rw--- [ anon ] > 00002aaac1ca1000 2600 2600 2600 rw--- [ anon ] > 00002aaac1f2c000 10660 10608 10608 rw--- [ anon ] > 00002aaac29f1000 20280 20168 20168 rw--- [ anon ] > 00002aaac3ed1000 1024 1024 1024 rw--- [ anon ] > 00002aaac3fe3000 65056 65056 65056 rw--- [ anon ] > 00002aaac8000000 65508 65340 65340 rw--- [ anon ] > 00002aaacbff9000 28 0 0 ----- [ anon ] > 00002aaacc000000 65480 65480 65480 rw--- [ anon ] > 00002aaacfff2000 56 0 0 ----- [ anon ] > 00002aaad0000000 63504 63504 63504 rw--- [ anon ] > 00002aaad4000000 65332 65332 65332 rw--- [ anon ] > 00002aaad7fcd000 204 0 0 ----- [ anon ] > 00002aaad8000000 65356 65356 65356 rw--- [ anon ] > 00002aaadbfd3000 180 0 0 ----- [ anon ] > 00002aaadc000000 65420 65420 65420 rw--- [ anon ] > 00002aaadffe3000 116 0 0 ----- [ anon ] > 00002aaae0000000 61648 61648 61648 rw--- [ anon ] > 00002aaae4000000 65500 65500 65500 rw--- [ anon ] > 00002aaae7ff7000 36 0 0 ----- [ anon ] > 00002aaae8000000 64428 64428 64428 rw--- [ anon ] > 00002aaaebeeb000 1108 0 0 ----- [ anon ] > 00002aaaec000000 64156 64156 64156 rw--- [ anon ] > 00002aaaefea7000 1380 0 0 ----- [ anon ] > 00002aaaf0000000 64896 64896 64896 rw--- [ anon ] > 00002aaaf4000000 65340 64116 64116 rw--- [ anon ] > 00002aaaf7fcf000 196 0 0 ----- [ anon ] > 00002aaaf8000000 64840 64840 64840 rw--- [ anon ] > 00002aaafc000000 64820 64820 64820 rw--- [ anon ] > 00002aaafff4d000 716 0 0 ----- [ anon ] > 00002aab00000000 64780 64780 64780 rw--- [ anon ] > 00002aab04000000 65292 65292 65292 rw--- [ anon ] > 00002aab07fc3000 244 0 0 ----- [ anon ] > 00002aab08000000 65452 65452 65452 rw--- [ anon ] > 00002aab0bfeb000 84 0 0 ----- [ anon ] > 00002aab0c000000 65252 65252 65252 rw--- [ anon ] > 00002aab0ffb9000 284 0 0 ----- [ anon ] > 00002aab10000000 63060 62212 62212 rw--- [ anon ] > 00002aab13d95000 2476 0 0 ----- [ anon ] > 00002aab14000000 65488 64704 64704 rw--- [ anon ] > 00002aab17ff4000 48 0 0 ----- [ anon ] > 00002aab18000000 256296 256036 256036 rw--- [ anon ] > 00002aab28000000 65372 65372 65372 rw--- [ anon ] > 00002aab2bfd7000 164 0 0 ----- [ anon ] > 00002aab2c000000 61408 61152 61152 rw--- [ anon ] > 00002aab30000000 63468 63468 63468 rw--- [ anon ] > 00002aab33dfb000 2068 0 0 ----- [ anon ] > 00002aab34000000 47816 47540 47540 rw--- [ anon ] > 00002aab38000000 33204 14516 14516 rw--- [ anon ] > 00002aab3a06d000 32332 0 0 ----- [ anon ] > 00002ae3ff02d000 4 4 4 rw--- [ anon ] > 00002ae3ff03e000 276 276 276 rw--- [ anon ] > 00007fff8fa4a000 84 20 20 rw--- [ stack ] > ffffffffff600000 8192 0 0 ----- [ anon ] > ---------------- ------ ------ ------ > total kB 3161504 3029568 3027536 > > strace -fp: > > Process 2248 attached with 5 threads - interrupt to quit > [pid 2252] epoll_wait(7, <unfinished ...> > [pid 2251] clock_gettime(CLOCK_REALTIME, <unfinished ...> > [pid 2250] futex(0x2aaab104c088, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> > [pid 2249] futex(0x2ae3ff047084, FUTEX_WAIT_PRIVATE, 4239917443, NULL > <unfinished ...> > [pid 2248] rt_sigsuspend([] <unfinished ...> > [pid 2251] <... clock_gettime resumed> {1275976570, 97551000}) = 0 > [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569205, {0, > 301251000}) = -1 ETIMEDOUT (Connection timed out) > [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 400051000}) = 0 > [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569207, {0, > 252521000}) = -1 ETIMEDOUT (Connection timed out) > [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 654023000}) = 0 > [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569209, {0, > 75751000}) = -1 ETIMEDOUT (Connection timed out) > [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 731031000}) = 0 > [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569211, {0, > 155742000}) = -1 ETIMEDOUT (Connection timed out) > [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 > >>From what I can understand the threads are hanging waiting for lock and > nothing happens afterwards. > > Without running 'rndc flush' the bind will eventually reach 4G and crash > with some other error which I currently don't have. > > Up to now we tried different max-cache settings and threaded/non-threaded > compilations without much difference. > > In all situations the named is 64-bit executable. > > The problem never happens with bind 9.4.3-P5 that we run (nor with older > version of 9.4), so it seems that from 9.6 (maybe even 9.5) the memory > management changed. I also tried tests with 9.7.0-P1/P2 with same outcome. > > Any help on the issue will be greatly appreciated. I'm open to any > suggestions. > > Thanks in advance. > > Stas Pirogov > 013 Netvision > _______________________________________________ > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users