Hi, As we had (quite) some "missing data" messages, I was excited to see 1.7.0 makes ZeroMQ config very easy, since we where still running with the howe-grown circular buffer.
So last week, I've upgraded our systems from pmacct 1.6.1 to 1.7.0 Since then I experience issues on 2 out of 7 systems : oom-killer on one of the pmacct MySQL plugins [Mon Nov 6 09:54:38 2017] pmacctd invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [Mon Nov 6 09:54:38 2017] pmacctd cpuset=/ mems_allowed=0-1 [Mon Nov 6 09:54:38 2017] CPU: 15 PID: 44049 Comm: pmacctd Tainted: G OE ------------ 3.10.0-693.5.2.el7.x86_64 #1 [Mon Nov 6 09:54:38 2017] Hardware name: Dell Inc. PowerEdge R630, BIOS 2.4.3 01/17/2017 [Mon Nov 6 09:54:38 2017] ffff88103dd4dee0 0000000019a0de94 ffff880036adf5f0 ffffffff816a3e51 [Mon Nov 6 09:54:38 2017] ffff880036adf680 ffffffff8169f246 ffff880036adf688 ffffffff812b7d1b [Mon Nov 6 09:54:38 2017] ffff88203c336e68 0000000000000202 ffffffff00000202 fffeefff00000000 [Mon Nov 6 09:54:38 2017] Call Trace: [Mon Nov 6 09:54:38 2017] [<ffffffff816a3e51>] dump_stack+0x19/0x1b [Mon Nov 6 09:54:38 2017] [<ffffffff8169f246>] dump_header+0x90/0x229 [Mon Nov 6 09:54:38 2017] [<ffffffff812b7d1b>] ? cred_has_capability+0x6b/0x120 [Mon Nov 6 09:54:38 2017] [<ffffffff811863a4>] oom_kill_process+0x254/0x3d0 [Mon Nov 6 09:54:38 2017] [<ffffffff812b7efe>] ? selinux_capable+0x2e/0x40 [Mon Nov 6 09:54:38 2017] [<ffffffff81186be6>] out_of_memory+0x4b6/0x4f0 [Mon Nov 6 09:54:38 2017] [<ffffffff8169fd4a>] __alloc_pages_slowpath+0x5d6/0x724 [Mon Nov 6 09:54:38 2017] [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420 [Mon Nov 6 09:54:38 2017] [<ffffffff811d40a5>] alloc_pages_vma+0xb5/0x200 [Mon Nov 6 09:54:38 2017] [<ffffffff811b2350>] handle_mm_fault+0xb60/0xfa0 [Mon Nov 6 09:54:38 2017] [<ffffffff810c8f28>] ? __enqueue_entity+0x78/0x80 [Mon Nov 6 09:54:38 2017] [<ffffffff816b0074>] __do_page_fault+0x154/0x450 [Mon Nov 6 09:54:38 2017] [<ffffffff816b03a5>] do_page_fault+0x35/0x90 [Mon Nov 6 09:54:38 2017] [<ffffffff816ac5c8>] page_fault+0x28/0x30 [Mon Nov 6 09:54:38 2017] [<ffffffff81330379>] ? copy_user_enhanced_fast_string+0x9/0x20 [Mon Nov 6 09:54:38 2017] [<ffffffff81336a4a>] ? memcpy_toiovec+0x4a/0x90 [Mon Nov 6 09:54:38 2017] [<ffffffff815796e8>] skb_copy_datagram_iovec+0x128/0x280 [Mon Nov 6 09:54:38 2017] [<ffffffff815d88aa>] tcp_recvmsg+0x24a/0xb50 [Mon Nov 6 09:54:38 2017] [<ffffffff81606aea>] inet_recvmsg+0x7a/0xa0 [Mon Nov 6 09:54:38 2017] [<ffffffff8156a88f>] sock_recvmsg+0xbf/0x100 [Mon Nov 6 09:54:38 2017] [<ffffffff815da029>] ? tcp_poll+0x219/0x230 [Mon Nov 6 09:54:38 2017] [<ffffffff8124b859>] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0 [Mon Nov 6 09:54:38 2017] [<ffffffff8156aa08>] SYSC_recvfrom+0xe8/0x160 [Mon Nov 6 09:54:38 2017] [<ffffffff8156b2fe>] SyS_recvfrom+0xe/0x10 [Mon Nov 6 09:54:38 2017] [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b [Mon Nov 6 09:54:38 2017] Mem-Info: [Mon Nov 6 09:54:38 2017] active_anon:31102734 inactive_anon:1375631 isolated_anon:64 active_file:61 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:200 unstable:0 slab_reclaimable:10481 slab_unreclaimable:34483 mapped:10650 shmem:9528 pagetables:66634 bounce:0 free:88657 free_pcp:30 free_cma:0 [Mon Nov 6 09:54:38 2017] Node 0 DMA free:15864kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Mon Nov 6 09:54:38 2017] lowmem_reserve[]: 0 1690 64141 64141 [Mon Nov 6 09:54:38 2017] Node 0 DMA32 free:250920kB min:1184kB low:1480kB high:1776kB active_anon:1096172kB inactive_anon:365444kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1985264kB managed:1733112kB mlocked:0kB dirty:0kB writeback:0kB mapped:484kB shmem:488kB slab_reclaimable:456kB slab_unreclaimable:3256kB kernel_stack:224kB pagetables:2600kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Mon Nov 6 09:54:38 2017] lowmem_reserve[]: 0 0 62451 62451 [Mon Nov 6 09:54:38 2017] Node 0 Normal free:42772kB min:43740kB low:54672kB high:65608kB active_anon:60618136kB inactive_anon:2525396kB active_file:264kB inactive_file:0kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:65011712kB managed:63949968kB mlocked:0kB dirty:0kB writeback:376kB mapped:21788kB shmem:21608kB slab_reclaimable:20352kB slab_unreclaimable:65620kB kernel_stack:5856kB pagetables:129356kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:403 all_unreclaimable? yes [Mon Nov 6 09:54:38 2017] lowmem_reserve[]: 0 0 0 0 [Mon Nov 6 09:54:38 2017] Node 1 Normal free:45072kB min:45172kB low:56464kB high:67756kB active_anon:62696628kB inactive_anon:2611684kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:67108864kB managed:66046872kB mlocked:0kB dirty:0kB writeback:424kB mapped:20328kB shmem:16016kB slab_reclaimable:21116kB slab_unreclaimable:69024kB kernel_stack:4976kB pagetables:134580kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1984 all_unreclaimable? yes [Mon Nov 6 09:54:38 2017] lowmem_reserve[]: 0 0 0 0 [Mon Nov 6 09:54:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15864kB [Mon Nov 6 09:54:38 2017] Node 0 DMA32: 220*4kB (UE) 253*8kB (UE) 182*16kB (UEM) 64*32kB (UEM) 76*64kB (UEM) 45*128kB (UEM) 38*256kB (UEM) 39*512kB (UEM) 32*1024kB (UE) 29*2048kB (U) 27*4096kB (UM) = 250936kB [Mon Nov 6 09:54:38 2017] Node 0 Normal: 2520*4kB (UE) 1980*8kB (UEM) 788*16kB (UEM) 82*32kB (UEM) 24*64kB (UEM) 8*128kB (UEM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43968kB [Mon Nov 6 09:54:38 2017] Node 1 Normal: 2140*4kB (UE) 4637*8kB (UM) 82*16kB (UM) 1*32kB (M) 2*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47128kB [Mon Nov 6 09:54:38 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [Mon Nov 6 09:54:38 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Mon Nov 6 09:54:38 2017] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [Mon Nov 6 09:54:38 2017] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Mon Nov 6 09:54:38 2017] 13517 total pagecache pages [Mon Nov 6 09:54:38 2017] 4044 pages in swap cache [Mon Nov 6 09:54:38 2017] Swap cache stats: add 4249745, delete 4245701, find 35957085/35970537 [Mon Nov 6 09:54:38 2017] Free swap = 0kB [Mon Nov 6 09:54:38 2017] Total swap = 4194300kB [Mon Nov 6 09:54:38 2017] 33530455 pages RAM [Mon Nov 6 09:54:38 2017] 0 pages HighMem/MovableOnly [Mon Nov 6 09:54:38 2017] 593993 pages reserved [Mon Nov 6 09:54:38 2017] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Mon Nov 6 09:54:38 2017] [ 771] 0 771 17483 7670 41 51 0 systemd-journal [Mon Nov 6 09:54:38 2017] [ 802] 0 802 11959 25 24 735 -1000 systemd-udevd [Mon Nov 6 09:54:38 2017] [ 2444] 0 2444 13863 10 26 101 -1000 auditd [Mon Nov 6 09:54:38 2017] [ 2463] 0 2463 5468 89 15 80 0 irqbalance [Mon Nov 6 09:54:38 2017] [ 2467] 81 2467 8153 62 18 49 -900 dbus-daemon [Mon Nov 6 09:54:38 2017] [ 2482] 0 2482 6051 43 17 32 0 systemd-logind [Mon Nov 6 09:54:38 2017] [ 2483] 998 2483 133561 96 58 1532 0 polkitd [Mon Nov 6 09:54:38 2017] [ 2484] 0 2484 75472 3998 66 835 0 rsyslogd [Mon Nov 6 09:54:38 2017] [ 2554] 0 2554 31558 26 18 132 0 crond [Mon Nov 6 09:54:38 2017] [ 2578] 0 2578 27511 1 10 31 0 agetty [Mon Nov 6 09:54:38 2017] [ 2585] 997 2585 25108 30 20 62 0 chronyd [Mon Nov 6 09:54:38 2017] [ 3055] 0 3055 26499 13 55 232 -1000 sshd [Mon Nov 6 09:54:38 2017] [ 3057] 0 3057 140598 106 88 2614 0 tuned [Mon Nov 6 09:54:38 2017] [ 3524] 0 3524 22504 4 44 275 0 master [Mon Nov 6 09:54:38 2017] [ 3526] 89 3526 22547 14 45 260 0 qmgr [Mon Nov 6 09:54:38 2017] [ 3730] 0 3730 247949 257 67 4361 0 dsm_sa_datamgrd [Mon Nov 6 09:54:38 2017] [ 3803] 0 3803 75246 92 40 126 0 dsm_sa_eventmgr [Mon Nov 6 09:54:38 2017] [ 3828] 0 3828 111461 494 51 879 0 dsm_sa_snmpd [Mon Nov 6 09:54:38 2017] [ 3834] 0 3834 180364 6 59 4326 0 dsm_sa_datamgrd [Mon Nov 6 09:54:38 2017] [ 3877] 0 3877 158222 21 41 672 0 dsm_om_shrsvcd [Mon Nov 6 09:54:38 2017] [44029] 0 44029 96018 3183 46 547 0 pmacctd [Mon Nov 6 09:54:38 2017] [44030] 0 44030 3861827 3769206 7400 287 0 pmacctd [Mon Nov 6 09:54:38 2017] [44037] 0 44037 29678700 28615469 57953 997921 0 pmacctd [Mon Nov 6 09:54:38 2017] [44038] 0 44038 112024 72059 184 4966 0 pmacctd [Mon Nov 6 09:54:38 2017] [44045] 0 44045 46356 2918 49 6122 0 pmacctd [Mon Nov 6 09:54:38 2017] [44046] 0 44046 46389 3147 49 5705 0 pmacctd [Mon Nov 6 09:54:38 2017] [58219] 89 58219 22530 272 44 0 0 pickup [Mon Nov 6 09:54:38 2017] [59874] 0 59874 47222 3116 51 6044 0 pmacctd [Mon Nov 6 09:54:38 2017] [59875] 0 59875 47225 3665 53 5494 0 pmacctd [Mon Nov 6 09:54:38 2017] Out of memory: Kill process 44037 (pmacctd) score 846 or sacrifice child [Mon Nov 6 09:54:38 2017] Killed process 44037 (pmacctd) total-vm:118714800kB, anon-rss:114461768kB, file-rss:104kB, shmem-rss:4kB All 7 systems are identical in terms of config, they only receive different traffic and have a slightly different HW config (CPU, R630 vs R720, 64G - 128G memory) Each system runs CentOS 7.4.1708 64-bit, fully updated, dual-port Intel X520 10G NIC and runs a pmacctd instance for 10G NIC1, and one for 10G NIC2 Per instance, traffic is split out over an IPv4 MySQL plugin and an IPv6 MySQL plugin. Data is stored to an external MySQL (/ Percona) server As CentOS 7.x EPEL comes with ZeroMQ 4.1, and pmacct likes >= 4.2, I installed ZeroMQ 4.2.2 from the ZeroMQ yum repository. Eager as I was, I installed PR_RING 7.0.0 (non ZC to start with) as well in the same change from the ntop repository After some time running, I observed the oom-killer issue on the two machines. I suspected PF_RING at first, was running with the following config : options pf_ring enable_tx_capture=0 quick_mode=1 Then I reduced that on those 2 machines to : options pf_ring enable_tx_capture=0 Seems to work _so_ far on 1 of them, but on the other .... no change. Then I removed PF_RING completely from that system, recompiled pmacct, and made sure pmacctd was now linked to libpcap.so.1 again, and no longer against libpfring.so.1 This morning, another crash..... so it does not seem (fully) related to PF_RING or it's config So the only other change for 1.6.1 <-> 1.7.0 on this machines was pmacct now compiled with the additional option "--enable-zmq" And in the config I replaced plugin_buffer_size & plugin_pipe_size with : plugin_pipe_zmq: true plugin_pipe_zmq_profile: large I will probably recompile once more without ZeroMQ, and revert the config change, and see how that goes. But it would be nice to get to a stable system with all features enabled, so if anyone has good hints, what to check, etc.. Any help/insight is appreciated :) Regards, Wouter _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
