Logs attached. Jim
On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote: > Uhm, hard to say, could you provide also dmesg? > > Alfredo > >> On 14 Oct 2016, at 18:07, Jim Hranicky <[email protected]> wrote: >> >> And one more, sorry. I tried to stop zbalance_ipc to move to >> 32 queues and am getting this error: >> >> Message from syslogd@host at Oct 14 12:05:23 ... >> kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237] >> >> Message from syslogd@host at Oct 14 12:05:23 ... >> kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496] >> >> Message from syslogd@host at Oct 14 12:05:26 ... >> kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157] >> >> Message from syslogd@host at Oct 14 12:05:27 ... >> kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217] >> >> kill -9 has no effect. Is this a result of useing too many queues? >> >> Jim >> >> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote: >>> Hi Jim >>> please note that when using distribution to multiple applications (using a >>> comma-separated list in -n), >>> the fan-out API is used which supports up to 32 egress queues total, in >>> your case you are using 73 queues, >>> thus I guess only the first 32 instances are receiving traffic (and maybe >>> duplicated traffic due to a wrong >>> egress mask) . I will add a check for this in zbalance_ipc to avoid this >>> kind of misconfigurations. >>> >>> Alfredo >>> >>>> On 13 Oct 2016, at 22:35, Jim Hranicky <[email protected]> wrote: >>>> >>>> I'm testing out a new server (36 cores, 72 with HT) using >>>> zbalance_ipc, and it seems occasionally some packets are >>>> getting sent to multiple processes. >>>> >>>> I'm currently running zbalance_ipc like so: >>>> >>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1 >>>> >>>> with 72 snorts like so: >>>> >>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \ >>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \ >>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1) >>>> >>>> I've got a custom HTTP rule to catch GETs with a particular >>>> user-agent. I run 100 GETs, and each GET request has the run >>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc) >>>> and this is what I end up getting when I check the GETs : >>>> >>>> 1 GET /11 >>>> 1 GET /2 >>>> 1 GET /30 >>>> 1 GET /34 >>>> 1 GET /37 >>>> 1 GET /5 >>>> 1 GET /59 >>>> 1 GET /62 >>>> 1 GET /70 >>>> 1 GET /8 >>>> 1 GET /83 >>>> 1 GET /84 >>>> 1 GET /9 >>>> 1 GET /90 >>>> 1 GET /94 >>>> 1 GET /95 >>>> 16 GET /97 >>>> 20 GET /12 >>>> 20 GET /38 >>>> >>>> Obviously I'm still running into packet loss, but several of the >>>> GETs are getting sent to multiple processes: >>>> >>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1 >>>> >>>> Is this an issue with the zbalance_ipc hash? I tried using >>>> >>>> -m 1 >>>> >>>> but it seemed like I ended up dropping even more packets. >>>> >>>> Any advice/pointers appreciated. >>>> >>>> -- >>>> Jim Hranicky >>>> Data Security Specialist >>>> UF Information Technology >>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826 >>>> 352-273-1341 >>>> _______________________________________________ >>>> Ntop-misc mailing list >>>> [email protected] >>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >>> >>> _______________________________________________ >>> Ntop-misc mailing list >>> [email protected] >>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >>> >> _______________________________________________ >> Ntop-misc mailing list >> [email protected] >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc > > _______________________________________________ > Ntop-misc mailing list > [email protected] > http://listgateway.unipi.it/mailman/listinfo/ntop-misc >
Oct 14 12:01:23 ewansens1 kernel: BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496] Oct 14 12:01:23 ewansens1 kernel: Modules linked in: ixgbe(OE) pf_ring(OE) dca nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter vfat fat ext4 mbcache jbd2 intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper sb_edac cryptd iTCO_wdt iTCO_vendor_support ipmi_devintf cdc_ether pcspkr mxm_wmi i2c_i801 usbnet lpc_ich mii ipmi_ssif Oct 14 12:01:23 ewansens1 kernel: mfd_core sg edac_core shpchp mei_me ipmi_si acpi_pad wmi mei ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ttm ahci drm libahci tg3 libata i2c_core ptp pps_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod vxlan ip6_udp_tunnel udp_tunnel [last unloaded: pf_ring] Oct 14 12:01:23 ewansens1 kernel: CPU: 34 PID: 6496 Comm: zbalance_ipc Tainted: G OE ------------ 3.10.0-327.36.1.el7.x86_64 #1 Oct 14 12:01:23 ewansens1 kernel: Hardware name: LENOVO System x3650 M5: -[5462AC1]-/01GR451, BIOS -[TCE122WUS-2.01]- 04/27/2016 Oct 14 12:01:23 ewansens1 kernel: task: ffff881018b95c00 ti: ffff881010470000 task.ti: ffff881010470000 Oct 14 12:01:23 ewansens1 kernel: RIP: 0010:[<ffffffff81301a2c>] [<ffffffff81301a2c>] __write_lock_failed+0xc/0x20 Oct 14 12:01:23 ewansens1 kernel: RSP: 0018:ffff881010473da0 EFLAGS: 00000216 Oct 14 12:01:23 ewansens1 kernel: RAX: 0000000000000001 RBX: 00007f4f081c6000 RCX: 0000000000000000 Oct 14 12:01:23 ewansens1 kernel: RDX: 0000000000000000 RSI: 00007ffe7625f308 RDI: ffff88202476c8c4 Oct 14 12:01:23 ewansens1 kernel: RBP: ffff881010473da0 R08: 0000000000000018 R09: 0000000000000000 Oct 14 12:01:23 ewansens1 kernel: R10: 00000000000008b4 R11: 0000000000000000 R12: ffffea0000000000 Oct 14 12:01:23 ewansens1 kernel: R13: ffff881018b95c00 R14: ffff881010473d70 R15: 00007f4f081c5fff Oct 14 12:01:23 ewansens1 kernel: FS: 00007f4f097d4740(0000) GS:ffff88203f000000(0000) knlGS:0000000000000000 Oct 14 12:01:23 ewansens1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 14 12:01:23 ewansens1 kernel: CR2: 00000000082d8c40 CR3: 000000201457b000 CR4: 00000000001407e0 Oct 14 12:01:23 ewansens1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 14 12:01:23 ewansens1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 14 12:01:23 ewansens1 kernel: Stack: Oct 14 12:01:23 ewansens1 kernel: ffff881010473db0 ffffffff8163d9f7 ffff881010473dd8 ffffffffa078c21e Oct 14 12:01:23 ewansens1 kernel: 00007ffe7625f2f0 ffff882024360f00 ffff881010473e60 ffff881010473f30 Oct 14 12:01:23 ewansens1 kernel: ffffffffa079afe1 ffffffff81632be5 ffff881010473e10 0000000181193525 Oct 14 12:01:23 ewansens1 kernel: Call Trace: Oct 14 12:01:23 ewansens1 kernel: [<ffffffff8163d9f7>] _raw_write_lock+0x17/0x20 Oct 14 12:01:23 ewansens1 kernel: [<ffffffffa078c21e>] pfring_release_zc_dev+0x3e/0x1d0 [pf_ring] Oct 14 12:01:23 ewansens1 kernel: [<ffffffffa079afe1>] ring_setsockopt+0x1861/0x2870 [pf_ring] Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81632be5>] ? __slab_free+0x10e/0x277 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff8119b7a2>] ? unmap_region+0xe2/0x130 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81288a75>] ? sock_has_perm+0x75/0x90 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff811c0d02>] ? kmem_cache_free+0xd2/0x1e0 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81289d70>] ? selinux_socket_setsockopt+0x40/0x50 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81512370>] SyS_setsockopt+0x80/0xf0 Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b Oct 14 12:01:23 ewansens1 kernel: Code: 89 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 90 83 3f 01 <75> f9 f0 ff 0f 75 f1 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 55
_______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
