Hi,
And now:
- 1.log = scenario in your point 1, including top, zbalance output, and
nprobe stats.
El mié., 27 jun. 2018 a las 17:41, David Notivol (<[email protected]>)
escribió:
> Hi Alfredo,
>
> Sorry, I forgot to attach the files as you said. I sent them awhile ago,
> but it seems the mail size is over the limit and get held for approval. I'm
> trying now deleting some info from my first email and pasting one file at a
> time.
>
> - 0.log = top output for the scenario in my fist email.
>
>
>
> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<
> [email protected]>) escribió:
>
>> Hi David
>>
>> On 27 Jun 2018, at 14:20, David Notivol <[email protected]> wrote:
>>
>> Hi Alfredo,
>> Thanks for your recommendations.
>>
>> I tested using core affinity as you suggested, and the in drops
>> disappeared in zbalance. The output drops persist, but the absolute drops
>> are less than before.
>> Actually I had tested the core affinity, but I didn't have in mind the
>> physical cores. Now I put zbalance in one physical core, and 10 nprobe
>> instances not sharing the physical core with zbalance.
>>
>> About your point 2, by using zc drivers, how could I run several nprobe
>> instances to share the load? I'm testing with one instance: -i
>> zc:p2p1,zc:p2p2
>>
>>
>> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS
>> (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)
>>
>> Attached you can find:
>> - 0.log = top output for the scenario in my previous email.
>> - 1.log = scenario in your point 1, including top, zbalance output, and
>> nprobe stats.
>>
>>
>>
>> I do not see the attachments, did you forget to enclose them?
>>
>> Alfredo
>>
>>
>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
>> [email protected]>) escribió:
>>
>>> Hi David
>>> it seems that you have packet loss both on zbalance and nprobe,
>>> I recommend you to:
>>> 1. set the core affinity for both zbalance_ipc and the nprobe instances,
>>> trying to
>>> use a different core for each (at least do not share the zbalance_ipc
>>> physical core
>>> with nprobe instances)
>>> 2. did you try using zc drivers for capturing traffic from the
>>> interfaces? (zc:p2p1,zc:p2p2)
>>> Please also provide the top output (press 1 to see all cored) with the
>>> current configuration,
>>> I guess kernel is using some of the available cpu with this
>>> configuration.
>>>
>>> Alfredo
>>>
>>> On 26 Jun 2018, at 16:31, David Notivol <[email protected]> wrote:
>>>
>>> Hi Alfredo,
>>> Thanks for replying.
>>> This is an excerpt of the zbalance and nprobe statistics:
>>>
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
>>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
>>> (19'157'949 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
>>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX
>>> 77050882 pkts Dropped 1127883 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX
>>> 70722562 pkts Dropped 756409 pkts (1.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX
>>> 76092418 pkts Dropped 1017335 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX
>>> 75088386 pkts Dropped 896678 pkts (1.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX
>>> 91991042 pkts Dropped 2114739 pkts (2.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX
>>> 81384450 pkts Dropped 1269385 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX
>>> 84310018 pkts Dropped 1801848 pkts (2.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX
>>> 84554242 pkts Dropped 1487329 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX
>>> 84090370 pkts Dropped 1482864 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX
>>> 73642498 pkts Dropped 732237 pkts (1.0 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX
>>> 76481026 pkts Dropped 1000496 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX
>>> 72496642 pkts Dropped 929049 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX
>>> 79386626 pkts Dropped 1122169 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX
>>> 79418370 pkts Dropped 1187172 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX
>>> 80284162 pkts Dropped 1195559 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX
>>> 79143426 pkts Dropped 1036797 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
>>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>>
>>>
>>> # cat /proc/net/pf_ring/stats/*
>>> ClusterId: 1
>>> TotQueues: 16
>>> Applications: 1
>>> App0Queues: 16
>>> Duration: 0:00:41:18:386
>>> Packets: 1191477340
>>> Forwarded: 1174033613
>>> Processed: 1173893301
>>> IFPackets: 1191477364
>>> IFDropped: 1036448041
>>>
>>> Duration: 0:00:41:15:587
>>> Bytes: 42626434538
>>> Packets: 71510530
>>> Dropped: 845465
>>>
>>> [removed to make the mail smaller]
>
>>
>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
>>> [email protected]>) escribió:
>>>
>>>> Hi David
>>>> please also provide statistics from zbalance_ipc (output or log file)
>>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>>
>>>> Thank you
>>>> Alfredo
>>>>
>>>> On 26 Jun 2018, at 15:32, David Notivol <[email protected]> wrote:
>>>>
>>>> Hello list,
>>>>
>>>> We're using nProbe to export flows information to kafka. We're
>>>> listening from two 10Gb interfaces that we merge with zbalance_ipc, and we
>>>> split them into 16 queues to have 16 nprobe instances.
>>>>
>>>> The problem is we are seeing about 40% packet drops reported by
>>>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>>>> processing all the traffic. The CPU usage is really high, and the load
>>>> average is over 25-30.
>>>>
>>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
>>>> packets / second; and we're using i40e_zc driver.
>>>>
>>>> Do you have any advice to try to improve this performance?
>>>> Does it make sense we're having packet drops with this amount of
>>>> traffic, and we're reaching the server limits? Or is any configuration we
>>>> could tune up to improve it?
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> -- System:
>>>>
>>>> nProbe: nProbe v.8.5.180625 (r6185)
>>>> System RAM: 64GB
>>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>>>> cores, 2 threads per core)
>>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> -- zbalance configuration:
>>>>
>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l
>>>> /var/tmp/zbalance.log -v -w
>>>>
>>>> -- nProbe configuration:
>>>>
>>>> --interface=zc:1@0
>>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>>>> --collector=none
>>>> --idle-timeout=60
>>>> --snaplen=128
>>>>
>>>> [removed to make the mail smaller]
>
>
--
Saludos,
David Notivol
[email protected]
zbalance_ipc -i p2p1,p2p2 -c 1 -n 10 -m 4 -a -p -l /var/tmp/zbalance.log -v -w -g 11
27/Jun/2018 14:33:04 [zbalance_ipc.c:265] =========================
27/Jun/2018 14:33:04 [zbalance_ipc.c:266] Absolute Stats: Recv 312'755'138 pkts (0 drops) - Forwarded 282'359'020 pkts (30'396'118 drops)
27/Jun/2018 14:33:04 [zbalance_ipc.c:305] p2p1,p2p2 RX 312755198 pkts Dropped 0 pkts (0.0 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 0 RX 32957769 pkts Dropped 3196523 pkts (8.8 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 1 RX 25135005 pkts Dropped 2637145 pkts (9.5 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 2 RX 27692523 pkts Dropped 1118341 pkts (3.9 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 3 RX 28231687 pkts Dropped 1627780 pkts (5.5 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 4 RX 24935314 pkts Dropped 4488152 pkts (15.3 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 5 RX 27020231 pkts Dropped 2552923 pkts (8.6 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 6 RX 29509239 pkts Dropped 2140307 pkts (6.8 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 7 RX 28360107 pkts Dropped 4398785 pkts (13.4 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 8 RX 30865560 pkts Dropped 3363963 pkts (9.8 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:319] Q 9 RX 27494887 pkts Dropped 4872206 pkts (15.1 %)
27/Jun/2018 14:33:04 [zbalance_ipc.c:338] Actual Stats: Recv 720'151.39 pps (0.00 drops) - Forwarded 616'671.81 pps (103'479.57 drops)
# cat /proc/net/pf_ring/stats/*
ClusterId: 1
TotQueues: 10
Applications: 1
App0Queues: 10
Duration: 0:00:08:25:166
Packets: 369098748
Forwarded: 330339871
Processed: 330182962
IFPackets: 369098784
IFDropped: 0
Duration: 0:00:08:22:039
Bytes: 17997633542
Packets: 38547352
Dropped: 3761565
Duration: 0:00:08:22:038
Bytes: 16185679150
Packets: 29351325
Dropped: 3118442
Duration: 0:00:08:22:036
Bytes: 17899818060
Packets: 32552395
Dropped: 1096040
Duration: 0:00:08:22:040
Bytes: 18407430325
Packets: 33280461
Dropped: 1803284
Duration: 0:00:08:22:040
Bytes: 15666471866
Packets: 28972181
Dropped: 5478400
Duration: 0:00:08:22:037
Bytes: 16521468829
Packets: 31688718
Dropped: 3115865
Duration: 0:00:08:22:044
Bytes: 17928871020
Packets: 34605306
Dropped: 2310576
Duration: 0:00:08:22:041
Bytes: 16629480283
Packets: 33113942
Dropped: 5446218
Duration: 0:00:08:22:036
Bytes: 18271343576
Packets: 35954325
Dropped: 4057233
Duration: 0:00:08:21:040
Bytes: 16909249023
Packets: 32021813
Dropped: 5913375
top - 14:35:00 up 7 days, 22:08, 4 users, load average: 19.10, 17.22, 17.44
Tasks: 209 total, 3 running, 206 sleeping, 0 stopped, 0 zombie
%Cpu0 : 67.8 us, 16.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 15.6 si, 0.0 st
%Cpu1 : 60.8 us, 15.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 23.6 si, 0.0 st
%Cpu2 : 65.8 us, 15.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 18.3 si, 0.0 st
%Cpu3 : 65.4 us, 17.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.9 si, 0.0 st
%Cpu4 : 60.9 us, 14.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 24.2 si, 0.0 st
%Cpu5 : 0.7 us, 1.0 sy, 0.0 ni, 83.7 id, 0.0 wa, 0.0 hi, 14.7 si, 0.0 st
%Cpu6 : 66.9 us, 16.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.2 si, 0.0 st
%Cpu7 : 61.1 us, 15.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 22.9 si, 0.0 st
%Cpu8 : 66.7 us, 15.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 18.0 si, 0.0 st
%Cpu9 : 60.7 us, 15.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 23.8 si, 0.0 st
%Cpu10 : 65.2 us, 15.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 19.2 si, 0.0 st
%Cpu11 : 83.8 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.2 si, 0.0 st
KiB Mem : 65688384 total, 14779708 free, 29991244 used, 20917432 buff/cache
KiB Swap: 6142972 total, 5945500 free, 197472 used. 34990588 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
193784 root 20 0 2220756 39464 37052 S 100.0 0.1 9:03.59 zbalance_ipc
193949 nobody 20 0 5674896 2.588g 6440 S 100.0 4.1 8:34.59 nprobe
194181 nobody 20 0 5543824 2.430g 6308 S 100.0 3.9 8:44.91 nprobe
194225 nobody 20 0 5609360 2.572g 6312 S 100.0 4.1 8:37.56 nprobe
193877 nobody 20 0 5609360 2.561g 6440 S 99.7 4.1 8:38.03 nprobe
193909 nobody 20 0 5543824 2.458g 6308 S 99.7 3.9 8:42.40 nprobe
194040 nobody 20 0 5543824 2.424g 6440 S 99.7 3.9 8:42.01 nprobe
194137 nobody 20 0 5609360 2.548g 6440 S 99.7 4.1 8:33.62 nprobe
193993 nobody 20 0 5674896 2.601g 6312 S 99.3 4.2 8:38.84 nprobe
194087 nobody 20 0 5609360 2.537g 6440 S 99.3 4.0 8:37.60 nprobe
194283 nobody 20 0 5543824 2.421g 6440 S 99.3 3.9 8:42.17 nprobe
1534 root 20 0 6472 484 484 R 2.3 0.0 223:50.90 rngd
33 root 20 0 0 0 0 S 0.3 0.0 31:15.92 ksoftirqd/5
38 root 20 0 0 0 0 S 0.3 0.0 33:01.42 ksoftirqd/6
43 root 20 0 0 0 0 S 0.3 0.0 29:18.97 ksoftirqd/7
58 root 20 0 0 0 0 S 0.3 0.0 27:49.71 ksoftirqd/10
2449 root 20 0 224096 3804 3208 S 0.3 0.0 11:54.68 snmpd
2658 root 20 0 5964404 503956 8160 S 0.3 0.8 10:52.23 java
186085 root 20 0 165076 11196 2476 S 0.3 0.0 0:18.57 perl
1 root 20 0 52424 3544 2160 S 0.0 0.0 0:36.06 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:05.01 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 33:06.79 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root rt 0 0 0 0 S 0.0 0.0 0:13.38 migration/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root 20 0 0 0 0 R 0.0 0.0 25:40.99 rcu_sched
10 root rt 0 0 0 0 S 0.0 0.0 0:01.78 watchdog/0
11 root rt 0 0 0 0 S 0.0 0.0 0:01.72 watchdog/1
12 root rt 0 0 0 0 S 0.0 0.0 0:17.13 migration/1
13 root 20 0 0 0 0 S 0.0 0.0 25:11.78 ksoftirqd/1
15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H
16 root rt 0 0 0 0 S 0.0 0.0 0:01.99 watchdog/2
17 root rt 0 0 0 0 S 0.0 0.0 0:20.30 migration/2
18 root 20 0 0 0 0 S 0.0 0.0 25:24.07 ksoftirqd/2
20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0H
21 root rt 0 0 0 0 S 0.0 0.0 0:01.70 watchdog/3
22 root rt 0 0 0 0 S 0.0 0.0 0:20.62 migration/3
_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc