I have a mail and Samba PDC server that, after a couple of days, runs out of RAM then swap then freaks out with oom killer kicking in, at which point it becomes very unresponsive and needs rebooting.
Does anyone have any ideas of what could be causing this please? The server has a Pentium 4 era Intel Xeon CPU with 2GB RAM and 2GB swap. It runs Debian 4.0 Etch with volatile updates. The issue happens with Debian 4.0 Etch kernel 2.6.18 or EtchnHalf kernel 2.6.24. This server has operated fine for the past 2 years on Debian 3.1 Sarge, only having been upgraded to Etch 2 months ago. The mail server is setup as per http://workaround.org/articles/ispmail-etch/; the Samba setup is as per http://thegoldenear.org/toolbox/unices/samba-3-pdc-print-server-debian-etch.html. The symptom is that all available physical RAM is used up (as reported by free), then swap is used up, then the server freaks out, giving out of memory errors continually to the screen (see the actual message below). The server seems to operate at its maximum capacity of physical RAM for a day or two before eating into swap. Initially when I was seeing this, much of the time, but not all of the time, there were a _lot_ of 'imap' instances running. But now I don't see that at all and yet it crashes all the same. There are around 10 concurrent users, each with up to 5 mailboxes open in Thunderbird 2. Thunderbird's concurrent 'Maximum number of server connections to cache' is set to 1 (previously 5 but I've gotten this turned down recently, though, according to http://kb.mozillazine.org/IMAP_servers, should be OK with dovecot). I would have thought 2GB of RAM would be plenty, is that true? I ran memtest86+ and the memory tested OK. Here's the output of various tools and syslog, from back when I was getting lots of instances of imap: #top: 11598 vmail 18 0 4700 352 132 R 6 0.0 1:16.50 imap 11309 vmail 18 0 4696 316 116 R 6 0.0 1:19.11 imap 16184 vmail 18 0 63144 500 276 D 4 0.0 1:05.17 imap 23445 vmail 18 0 63144 7112 276 R 4 0.3 0:42.32 imap 27232 vmail 18 0 63144 7004 112 D 4 0.3 0:28.84 imap 20573 vmail 18 0 63144 524 276 D 3 0.0 0:45.58 imap 11865 vmail 18 0 4696 240 48 R 3 0.0 1:17.85 imap 13167 vmail 18 0 5868 520 276 D 3 0.0 1:15.14 imap 14976 vmail 18 0 63320 544 280 D 3 0.0 1:11.70 imap 16272 vmail 18 0 63148 520 276 D 3 0.0 1:09.01 imap 25847 vmail 18 0 63144 7172 276 D 3 0.3 0:31.96 imap 5260 root 16 0 2892 1316 456 R 3 0.1 0:02.92 top 4534 vmail 18 0 4824 452 276 D 3 0.0 2:04.29 imap 4633 vmail 18 0 4824 460 276 D 3 0.0 1:57.42 imap 13094 vmail 18 0 4700 480 276 D 3 0.0 1:11.33 imap 23244 vmail 18 0 63148 488 276 D 3 0.0 0:45.67 imap 24505 vmail 18 0 63148 7240 276 D 3 0.3 0:40.54 imap 26400 vmail 18 0 63148 7156 276 D 3 0.3 0:32.45 imap 27212 vmail 18 0 63144 7184 276 D 3 0.3 2:18.90 imap 27226 vmail 18 0 63148 7232 276 D 3 0.3 0:28.54 imap 2724 amavis 18 0 55196 10m 44 R 2 0.5 6:32.14 amavisd-new 3751 root 18 0 9668 516 136 D 2 0.0 7:02.11 miniserv.pl 6401 vmail 18 0 4828 576 276 D 2 0.0 1:46.62 imap 23341 vmail 18 0 63148 7068 276 D 2 0.3 0:39.53 imap 28622 vmail 18 0 63148 7080 112 D 2 0.3 0:26.98 imap 4563 vmail 18 0 4824 288 112 D 2 0.0 1:56.91 imap 9561 vmail 18 0 4700 308 112 D 2 0.0 1:31.07 imap 21819 vmail 18 0 63144 364 112 D 2 0.0 0:46.23 imap 25949 vmail 18 0 63148 6924 112 D 2 0.3 0:33.42 imap 9885 vmail 18 0 4700 324 112 D 1 0.0 1:26.72 imap 10127 vmail 18 0 4700 308 112 D 1 0.0 1:23.44 imap # free -m total used free shared buffers cached Mem: 2027 1976 50 0 226 612 -/+ buffers/cache: 1138 889 Swap: 2572 0 2572 # vmstat -s 2076388 total memory 2023448 used memory 1395056 active memory 444268 inactive memory 52940 free memory 241696 buffer memory 500268 swap cache 2634620 total swap 56 used swap 2634564 free swap 33797 non-nice user cpu ticks 5477 nice user cpu ticks 731695 system cpu ticks 19954325 idle cpu ticks 378484 IO-wait cpu ticks 1368 IRQ cpu ticks 4493 softirq cpu ticks 0 stolen cpu ticks 3196946 pages paged in 6119050 pages paged out 0 pages swapped in 14 pages swapped out 17992684 interrupts 10515665 CPU context switches 1237926809 boot time 25277 forks >From /var/log/syslog: Mar 14 12:58:20 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0 Mar 14 12:58:20 localhost kernel: [<c01444e3>] out_of_memory+0x25/0x13a Mar 14 12:58:20 localhost kernel: [<c014594c>] __alloc_pages+0x1f5/0x275 Mar 14 12:58:20 localhost kernel: [<c0146afb>] __do_page_cache_readahead+0xc8/0x1e8 Mar 14 12:58:20 localhost kernel: [<c0141674>] find_get_page+0x18/0x38 Mar 14 12:58:20 localhost kernel: [<c015bb3b>] __find_get_block_slow+0xfb/0x105 Mar 14 12:58:20 localhost kernel: [<c015be64>] __find_get_block+0x15c/0x166 Mar 14 12:58:20 localhost kernel: [<c0146c61>] blockable_page_cache_readahead+0x46/0x99 Mar 14 12:58:20 localhost kernel: [<c0146df3>] page_cache_readahead+0xb2/0x177 Mar 14 12:58:20 localhost kernel: [<c0141d09>] do_generic_mapping_read+0x137/0x42a Mar 14 12:58:20 localhost kernel: [<c014283b>] __generic_file_aio_read+0x16b/0x1b2 Mar 14 12:58:20 localhost kernel: [<c0141524>] file_read_actor+0x0/0xca Mar 14 12:58:20 localhost kernel: [<c01428bd>] generic_file_aio_read+0x3b/0x42 Mar 14 12:58:20 localhost kernel: [<c0159e7f>] do_sync_read+0xb6/0xf1 Mar 14 12:58:20 localhost kernel: [<c012d8b1>] autoremove_wake_function+0x0/0x2d Mar 14 12:58:20 localhost kernel: [<c0159dc9>] do_sync_read+0x0/0xf1 Mar 14 12:58:20 localhost kernel: [<c015a788>] vfs_read+0x9f/0x141 Mar 14 12:58:20 localhost kernel: [<c015aca2>] sys_pread64+0x44/0x5a Mar 14 12:58:20 localhost kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79 Mar 14 12:58:20 localhost kernel: Mem-info: Mar 14 12:58:20 localhost kernel: DMA per-cpu: Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 0, batch 1 used:0 Mar 14 12:58:20 localhost kernel: DMA32 per-cpu: empty Mar 14 12:58:20 localhost kernel: Normal per-cpu: Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:0 Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:54 Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:27 Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:61 Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:49 Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:59 Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:8 Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:19 Mar 14 12:58:20 localhost kernel: HighMem per-cpu: Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:69 Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:48 Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:170 Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:15 Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:162 Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:12 Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:174 Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:49 Mar 14 12:58:20 localhost kernel: Free pages: 45488kB (496kB HighMem) Mar 14 12:58:20 localhost kernel: Active:263348 inactive:216164 dirty:0 writeback:0 unstable:0 free:11372 slab:14025 mapped:144 pagetables:11530 Mar 14 12:58:20 localhost kernel: DMA free:8192kB min:68kB low:84kB high:100kB active:2968kB inactive:2160kB present:16384kB pages_scanned:6530 all_unreclaimable? yes Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031 Mar 14 12:58:20 localhost kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031 Mar 14 12:58:20 localhost kernel: Normal free:36800kB min:3756kB low:4692kB high:5632kB active:402344kB inactive:338216kB present:901120kB pages_scanned:806676 all_unreclaimable? yes Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 9212 Mar 14 12:58:20 localhost kernel: HighMem free:496kB min:512kB low:1740kB high:2968kB active:648080kB inactive:524280kB present:1179136kB pages_scanned:1275911 all_unreclaimable? yes Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 0 Mar 14 12:58:20 localhost kernel: DMA: 6*4kB 219*8kB 123*16kB 23*32kB 2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8192kB Mar 14 12:58:20 localhost kernel: DMA32: empty Mar 14 12:58:20 localhost kernel: Normal: 2772*4kB 296*8kB 1151*16kB 22*32kB 2*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 36800kB Mar 14 12:58:20 localhost kernel: HighMem: 0*4kB 14*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB Mar 14 12:58:20 localhost kernel: Swap cache: add 948264, delete 948263, find 97950/132027, race 0+1030 Mar 14 12:58:20 localhost kernel: Free swap = 0kB Mar 14 12:58:20 localhost kernel: Total swap = 2634620kB Mar 14 12:58:20 localhost kernel: Free swap: 0kB Mar 14 12:58:20 localhost kernel: 524160 pages of RAM Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM Mar 14 12:58:20 localhost kernel: 5063 reserved pages Mar 14 12:58:20 localhost kernel: 22710 pages shared Mar 14 12:58:20 localhost kernel: 1 pages swap cached Mar 14 12:58:20 localhost kernel: 0 pages dirty Mar 14 12:58:20 localhost kernel: 0 pages writeback Mar 14 12:58:20 localhost kernel: 144 pages mapped Mar 14 12:58:20 localhost kernel: 14025 pages slab Mar 14 12:58:20 localhost kernel: 11530 pages pagetables Mar 14 12:58:20 localhost kernel: 524160 pages of RAM Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM Mar 14 12:58:20 localhost kernel: 5063 reserved pages Mar 14 12:58:20 localhost kernel: 22715 pages shared Mar 14 12:58:20 localhost kernel: 1 pages swap cached Mar 14 12:58:20 localhost kernel: 0 pages dirty Mar 14 12:58:20 localhost kernel: 0 pages writeback Mar 14 12:58:20 localhost kernel: 144 pages mapped Mar 14 12:58:20 localhost kernel: 14025 pages slab Mar 14 12:58:20 localhost kernel: 11530 pages pagetables Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 29845 (mysqld) score 16080 and children. Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 29845 (mysqld). Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410 (dovecot) score 8662 and children. Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 3511 (pop3-login). Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410 (dovecot) score 8661 and children. Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 2925 (imap). Mar 14 12:58:22 localhost mysqld_safe[5237]: Number of processes running now: 0 Mar 14 12:58:22 localhost mysqld_safe[5239]: restarted Mar 14 12:58:33 localhost mysqld[5242]: 090314 12:58:32 InnoDB: Database was not shut down normally! Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Starting crash recovery. Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Reading tablespace information from the .ibd files... Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Restoring possible half-written data pages from the doublewrite Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: buffer... Mar 14 12:58:37 localhost fetchmail[3695]: awakened at Sat 14 Mar 2009 12:58:34 GMT Mar 14 12:58:38 localhost mysqld[5242]: 090314 12:58:36 InnoDB: Starting log scan based on checkpoint at Mar 14 12:58:38 localhost mysqld[5242]: InnoDB: log sequence number 0 68717. Mar 14 12:58:48 localhost mysqld[5242]: InnoDB: Doing recovery: scanned up to log sequence number 0 68717 qMar 14 12:58:56 localhost dovecot: auth-worker(default): mysql: Connect failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on '127.0.0.1' (111) - waiting for 1 seconds before retry Mar 14 12:58:56 localhost dovecot: auth-worker(default): sql(lett...@domain.org,IPADDRESS.111): Password query failed: Not connected to database Mar 14 12:58:57 localhost fetchmail[3695]: 145 messages (144 seen) for i...@domain.org at mail.MAILHOST.co.uk (154739969 octets). Mar 14 12:58:57 localhost dovecot: auth-worker(default): sql(lett...@domain.org,IPADDRESS.111): Password query failed: Not connected to database Mar 14 12:58:59 localhost fetchmail[3695]: reading message i...@domain.org@mail.MAILHOST.co.uk:145 of 145 (2501 octets) (log message incomplete) Mar 14 12:58:59 localhost fetchmail[3695]: connection to localhost:smtp [127.0.0.1/25] failed: Connection refused. Mar 14 12:58:59 localhost fetchmail[3695]: SMTP connect to localhost failed Mar 14 12:58:59 localhost fetchmail[3695]: SMTP transaction error while fetching from i...@domain.org@mail.MAILHOST.co.uk and delivering to SMTP host localhost Mar 14 12:59:00 localhost fetchmail[3695]: Query status=10 (SMTP) Mar 14 12:59:00 localhost fetchmail[3695]: sleeping at Sat 14 Mar 2009 12:59:00 GMT for 60 seconds Mar 14 12:59:00 localhost mysqld[5242]: InnoDB: Last MySQL binlog file position 0 232, file name /var/log/mysql/mysql-bin.000902 Mar 14 12:59:01 localhost mysqld[5242]: 090314 12:59:01 InnoDB: Started; log sequence number 0 68717 Mar 14 12:59:09 localhost mysqld[5242]: 090314 12:59:04 [Note] Recovering after a crash using /var/log/mysql/mysql-bin Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Starting crash recovery... Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Crash recovery finished. Mar 14 12:59:26 localhost dovecot: auth-worker(default): mysql: Connect failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on '127.0.0.1' (111) - waiting for 75 seconds before retry -- Pete Boyd Open Plan IT - http://openplanit.co.uk The Golden Ear - http://thegoldenear.org -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org