Hello, I am running a fileserver with Ubuntu 12.04 Server amd64 with 3.2.0-25-generic on an Intel Xeon E5520 with 6GB RAM. Installed services include nfs-kernel-server, samba, drbd, and corosync/pacemaker. The DRBD device and corosync/pacemaker configuration are running but not actively used right now (planned for a future upgrade). The fileserver servers files over NFS and CIFS from a hardware RAID5 array. The / partition is on a separate hardware RAID1 array. This morning the server's load was very high, around 15-20, yet all processes in userspace totaled around 40MB RAM used with practically no load on the CPU. The output of free revealed that most of the RAM (5670MB out of 5960MB) was in-fact used:
# free -m total used free shared buffers cached Mem: 5960 5795 165 0 57 67 -/+ buffers/cache: 5670 290 Swap: 11659 136 11523 Looking at slabtop reveals that idr_layer_cache appears to be consuming most of this memory: # slabtop -s C -o Active / Total Objects (% used) : 10649155 / 10684888 (99.7%) Active / Total Slabs (% used) : 351256 / 351256 (100.0%) Active / Total Caches (% used) : 72 / 108 (66.7%) Active / Total Size (% used) : 5437600.23K / 5448341.04K (99.8%) Minimum / Average / Maximum Object : 0.01K / 0.51K / 8.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 10077291 10077103 99% 0.53K 338718 30 5419488K idr_layer_cache 71936 71936 100% 0.02K 281 256 1124K kmalloc-16 70958 70066 98% 0.12K 2087 34 8348K fsnotify_event 47794 47544 99% 0.17K 1039 46 8312K vm_area_struct 47515 47515 100% 0.05K 559 85 2236K shared_policy_node 46950 46835 99% 0.13K 1565 30 6260K ext4_allocation_context 36352 33791 92% 0.01K 71 512 284K kmalloc-8 32682 29590 90% 0.10K 838 39 3352K buffer_head Am I correct in that calculating the total usage of idr_layer_cache is 10077103 * 0.5KB? If so, the total is 4900MB, the majority of the 5670MB used as reported by free. For a period of time there appeared to be a lot of disk I/O on the / partition, but then it returned to normal with the RAM usage still remaining this high. /var/log/kern.log contains the following type of messages repeated constantly: kernel: [2513998.894176] Mem-Info: kernel: [2513998.894178] Node 0 DMA per-cpu: kernel: [2513998.894180] CPU 0: hi: 0, btch: 1 usd: 0 kernel: [2513998.894181] CPU 1: hi: 0, btch: 1 usd: 0 kernel: [2513998.894186] CPU 2: hi: 0, btch: 1 usd: 0 kernel: [2513998.894188] CPU 3: hi: 0, btch: 1 usd: 0 kernel: [2513998.894191] CPU 4: hi: 0, btch: 1 usd: 0 kernel: [2513998.894193] CPU 5: hi: 0, btch: 1 usd: 0 kernel: [2513998.894195] CPU 6: hi: 0, btch: 1 usd: 0 kernel: [2513998.894197] CPU 7: hi: 0, btch: 1 usd: 0 kernel: [2513998.894198] Node 0 DMA32 per-cpu: kernel: [2513998.894201] CPU 0: hi: 186, btch: 31 usd: 55 kernel: [2513998.894203] CPU 1: hi: 186, btch: 31 usd: 0 kernel: [2513998.894205] CPU 2: hi: 186, btch: 31 usd: 0 kernel: [2513998.894207] CPU 3: hi: 186, btch: 31 usd: 0 kernel: [2513998.894209] CPU 4: hi: 186, btch: 31 usd: 0 kernel: [2513998.894211] CPU 5: hi: 186, btch: 31 usd: 0 kernel: [2513998.894213] CPU 6: hi: 186, btch: 31 usd: 0 kernel: [2513998.894215] CPU 7: hi: 186, btch: 31 usd: 0 kernel: [2513998.894216] Node 0 Normal per-cpu: kernel: [2513998.894221] CPU 0: hi: 186, btch: 31 usd: 0 kernel: [2513998.894223] CPU 1: hi: 186, btch: 31 usd: 0 kernel: [2513998.894224] CPU 2: hi: 186, btch: 31 usd: 0 kernel: [2513998.894226] CPU 3: hi: 186, btch: 31 usd: 0 kernel: [2513998.894228] CPU 4: hi: 186, btch: 31 usd: 0 kernel: [2513998.894229] CPU 5: hi: 186, btch: 31 usd: 30 kernel: [2513998.894231] CPU 6: hi: 186, btch: 31 usd: 0 kernel: [2513998.894233] CPU 7: hi: 186, btch: 31 usd: 0 kernel: [2513998.894236] active_anon:128809 inactive_anon:31335 isolated_anon:0 kernel: [2513998.894237] active_file:17654 inactive_file:115777 isolated_file:0 kernel: [2513998.894238] unevictable:0 dirty:4783 writeback:4009 unstable:0 kernel: [2513998.894239] free:39852 slab_reclaimable:13595 slab_unreclaimable:1055766 kernel: [2513998.894240] mapped:103026 shmem:2673 pagetables:15245 bounce:0 kernel: [2513998.894242] Node 0 DMA free:15896kB min:168kB low:208kB high:252kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15640kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes kernel: [2513998.894250] lowmem_reserve[]: 0 3495 6015 6015 kernel: [2513998.894253] Node 0 DMA32 free:107828kB min:39172kB low:48964kB high:58756kB active_anon:333064kB inactive_anon:68768kB active_file:8720kB inactive_file:399040kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3579648kB mlocked:0kB dirty:16756kB writeback:15676kB mapped:303640kB shmem:20kB slab_reclaimable:33512kB slab_unreclaimable:2345264kB kernel_stack:1376kB pagetables:11076kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no kernel: [2513998.894262] lowmem_reserve[]: 0 0 2520 2520 kernel: [2513998.894264] Node 0 Normal free:35684kB min:28236kB low:35292kB high:42352kB active_anon:182172kB inactive_anon:56572kB active_file:61896kB inactive_file:64068kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2580480kB mlocked:0kB dirty:2376kB writeback:360kB mapped:108464kB shmem:10672kB slab_reclaimable:20868kB slab_unreclaimable:1877800kB kernel_stack:1856kB pagetables:49904kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no kernel: [2513998.894273] lowmem_reserve[]: 0 0 0 0 kernel: [2513998.894276] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB kernel: [2513998.894283] Node 0 DMA32: 26131*4kB 0*8kB 0*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 107852kB kernel: [2513998.894291] Node 0 Normal: 7770*4kB 138*8kB 3*16kB 4*32kB 3*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 36136kB kernel: [2513998.894298] 144474 total pagecache pages kernel: [2513998.894299] 8391 pages in swap cache kernel: [2513998.894301] Swap cache stats: add 1024199, delete 1015808, find 3527748/3678208 kernel: [2513998.894303] Free swap = 11457968kB kernel: [2513998.894304] Total swap = 11939836kB kernel: [2513998.910512] 1572848 pages RAM kernel: [2513998.910514] 46845 pages reserved kernel: [2513998.910516] 200778 pages shared kernel: [2513998.910517] 1343875 pages non-shared kernel: [2513998.950135] kworker/3:1: page allocation failure: order:2, mode:0x4020 kernel: [2513998.950141] Pid: 28366, comm: kworker/3:1 Not tainted 3.2.0-25-generic #40-Ubuntu kernel: [2513998.950144] Call Trace: kernel: [2513998.950146] <IRQ> [<ffffffff8111cf16>] warn_alloc_failed+0xf6/0x150 kernel: [2513998.950160] [<ffffffff81120d7b>] __alloc_pages_nodemask+0x64b/0x820 kernel: [2513998.950176] [<ffffffffa0022ee4>] ? ixgbe_xmit_frame+0x24/0x30 [ixgbe] kernel: [2513998.950182] [<ffffffff8165d62e>] ? _raw_spin_lock+0xe/0x20 kernel: [2513998.950187] [<ffffffff81648e63>] kmalloc_large_node+0x57/0x85 kernel: [2513998.950193] [<ffffffff81165bd5>] __kmalloc_node_track_caller+0x195/0x1e0 kernel: [2513998.950199] [<ffffffff8153298b>] ? __alloc_skb+0x4b/0x240 kernel: [2513998.950203] [<ffffffff81533004>] ? __netdev_alloc_skb+0x24/0x50 kernel: [2513998.950207] [<ffffffff815329b8>] __alloc_skb+0x78/0x240 kernel: [2513998.950212] [<ffffffff81533004>] __netdev_alloc_skb+0x24/0x50 kernel: [2513998.950219] [<ffffffffa001e909>] ixgbe_alloc_rx_buffers+0x289/0x350 [ixgbe] kernel: [2513998.950223] [<ffffffff815333a6>] ? __kfree_skb+0x26/0x30 kernel: [2513998.950228] [<ffffffff815333ed>] ? consume_skb+0x3d/0xb0 kernel: [2513998.950234] [<ffffffffa001f1bb>] ixgbe_clean_rx_irq+0x7eb/0x8a0 [ixgbe] kernel: [2513998.950242] [<ffffffffa001f9ee>] ixgbe_poll+0xae/0x1a0 [ixgbe] kernel: [2513998.950247] [<ffffffff815417d4>] net_rx_action+0x134/0x290 kernel: [2513998.950254] [<ffffffff8106ea58>] __do_softirq+0xa8/0x210 kernel: [2513998.950260] [<ffffffff8165d62e>] ? _raw_spin_lock+0xe/0x20 kernel: [2513998.950264] [<ffffffff81667eac>] call_softirq+0x1c/0x30 kernel: [2513998.950268] [<ffffffff81015305>] do_softirq+0x65/0xa0 kernel: [2513998.950271] [<ffffffff8106ee3e>] irq_exit+0x8e/0xb0 kernel: [2513998.950274] [<ffffffff81668763>] do_IRQ+0x63/0xe0 kernel: [2513998.950276] [<ffffffff8165daee>] common_interrupt+0x6e/0x6e kernel: [2513998.950278] <EOI> [<ffffffff814fdb18>] ? cpufreq_notify_transition+0x88/0x1c0 kernel: [2513998.950284] [<ffffffff815038b0>] ? cpufreq_get_measured_perf+0xa0/0xa0 kernel: [2513998.950287] [<ffffffff815043e1>] acpi_cpufreq_target+0x121/0x2a0 kernel: [2513998.950290] [<ffffffff814fd2d2>] __cpufreq_driver_target+0x42/0x50 kernel: [2513998.950292] [<ffffffff81501445>] dbs_check_cpu+0x2f5/0x330 kernel: [2513998.950295] [<ffffffff81501480>] ? dbs_check_cpu+0x330/0x330 kernel: [2513998.950298] [<ffffffff81501521>] do_dbs_timer+0xa1/0x110 kernel: [2513998.950301] [<ffffffff81084f9a>] process_one_work+0x11a/0x480 kernel: [2513998.950304] [<ffffffff81085d44>] worker_thread+0x164/0x370 kernel: [2513998.950306] [<ffffffff81085be0>] ? manage_workers.isra.29+0x130/0x130 kernel: [2513998.950309] [<ffffffff8108a59c>] kthread+0x8c/0xa0 kernel: [2513998.950312] [<ffffffff81667db4>] kernel_thread_helper+0x4/0x10 kernel: [2513998.950314] [<ffffffff8108a510>] ? flush_kthread_worker+0xa0/0xa0 kernel: [2513998.950317] [<ffffffff81667db0>] ? gs_change+0x13/0x13 I have stopped DRBD and restarted nfs-kernel-server and samba to no improvement. What can I try to correct this problem and bring memory usage under control? Thanks, Andrew Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/