Re: [Ntop-misc] nProbe performance, zbalance packet drops

David Notivol Wed, 11 Jul 2018 06:42:03 -0700

Hi Luca,

Sorry for the delay replying the list. Thanks for your advice.


I tested reducing the template to just the minimal fields, and the
performance was way better.

Right now, I left the template with ~50 fields (instead of ~150), and
leaving zbalance in a dedicated core (not sharing threads with any nProbe
instance). With this scenario, zbalance reports around 1-2% packet loss.
When I remove the output to kafka (in a different server), and export to
TCP, or without any export, the packet loss is at 0%.

However, I don't see any error saying the buffers are full, or pointing to
any Kafka bottleneck in the logs. (I'm pasting below stats from one of the
instances). I guess I am misreading the stats... Should the problem be
reflected on these stats?

By the way, I'm testing this in a server without any license yet. When I
export the output to TCP, it stops working after 25000 flows as expected;
but this doesn't happen when exporting to Kafka, it keeps working forever.

11/Jul/2018 03:54:35 [nprobe.c:3177] ---------------------------------
11/Jul/2018 03:54:35 [nprobe.c:3181] Average traffic: [73.50 K pps][All
Traffic 352.69 Mb/sec][IP Traffic 309.88 Mb/sec][ratio
0.88]
11/Jul/2018 03:54:35 [nprobe.c:3189] Current traffic: [19.48 K pps][82.17
Mb/sec]
11/Jul/2018 03:54:35 [nprobe.c:3196] Current flow export rate: [1100.4
flows/sec]
11/Jul/2018 03:54:35 [nprobe.c:3199] Flow drops: [export queue too
long=0][too many flows=0][ELK queue flow drops=0]
11/Jul/2018 03:54:35 [nprobe.c:3204] Export Queue: 0/1024000 [0.0 %]
11/Jul/2018 03:54:35 [nprobe.c:3209] Flow Buckets:
[active=88328][allocated=88328][toBeExported=0]
11/Jul/2018 03:54:35 [nprobe.c:3218] Kafka producer #0 [msgs produced:
1937609474][msgs delivered: 1937609474][bytes delivered:
 -481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures:
0][queue len: 2]
11/Jul/2018 03:54:35 [nprobe.c:3026] Processed packets: 55205507976 (max
bucket search: 4)
11/Jul/2018 03:54:35 [nprobe.c:3009] Fragment queue length: 0
11/Jul/2018 03:54:35 [nprobe.c:3035] Flow export stats: [0 bytes/0 pkts][0
flows/0 pkts sent]
11/Jul/2018 03:54:35 [nprobe.c:3045] Flow drop stats:   [4021444583
bytes/127743259 pkts][0 flows]
11/Jul/2018 03:54:35 [nprobe.c:3050] Total flow stats:  [4021444583
bytes/127743259 pkts][0 flows/0 pkts sent]
11/Jul/2018 03:54:35 [nprobe.c:3064] Kafka producer #0 [msgs produced:
1937609474][msgs delivered: 1937609474][bytes delivered:
 -481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures:
0][queue len: 2][1 msg == 1 flow]
11/Jul/2018 03:55:35 [nprobe.c:3177] ---------------------------------



Regards,
David.

El mié., 27 jun. 2018 a las 18:16, Luca Deri (<[email protected]>) escribió:

> Hi David
> your template is huge. Can you please omit (just for troubleshooting)
> "--flow-templ….” and report if you see changes in load?
>
> Thanks Luca
>
> On 27 Jun 2018, at 08:43, David Notivol <[email protected]> wrote:
>
> Hi,
> And now:
> - 1.log = scenario in your point 1, including top, zbalance output, and
> nprobe stats.
>
> El mié., 27 jun. 2018 a las 17:41, David Notivol (<[email protected]>)
> escribió:
>
>> Hi Alfredo,
>>
>> Sorry, I forgot to attach the files as you said. I sent them awhile ago,
>> but it seems the mail size is over the limit and get held for approval. I'm
>> trying now deleting some info from my first email and pasting one file at a
>> time.
>>
>> - 0.log = top output for the scenario in my fist email.
>>
>>
>>
>> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<
>> [email protected]>) escribió:
>>
>>> Hi David
>>>
>>> On 27 Jun 2018, at 14:20, David Notivol <[email protected]> wrote:
>>>
>>> Hi Alfredo,
>>> Thanks for  your recommendations.
>>>
>>> I tested using core affinity as you suggested, and the in drops
>>> disappeared in zbalance. The output drops persist, but the absolute drops
>>> are less than before.
>>> Actually I had tested the core affinity, but I didn't have in mind the
>>> physical cores. Now I put zbalance in one physical core, and 10 nprobe
>>> instances not sharing the physical core with zbalance.
>>>
>>> About your point 2, by using zc drivers, how could I run several nprobe
>>> instances to share the load? I'm testing with one instance: -i
>>> zc:p2p1,zc:p2p2
>>>
>>>
>>> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS
>>> (running nprobe on  -i zc:p2p1@<id>,zc:p2p2@<id>)
>>>
>>> Attached you can find:
>>> - 0.log = top output for the scenario in my previous email.
>>> - 1.log = scenario in your point 1, including top, zbalance output, and
>>> nprobe stats.
>>>
>>>
>>>
>>> I do not see the attachments, did you forget to enclose them?
>>>
>>> Alfredo
>>>
>>>
>>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
>>> [email protected]>) escribió:
>>>
>>>> Hi David
>>>> it seems that you have packet loss both on zbalance and nprobe,
>>>> I recommend you to:
>>>> 1. set the core affinity for both zbalance_ipc and the nprobe
>>>> instances, trying to
>>>> use a different core for each (at least do not share the zbalance_ipc
>>>> physical core
>>>> with nprobe instances)
>>>> 2. did you try using zc drivers for capturing traffic from the
>>>> interfaces? (zc:p2p1,zc:p2p2)
>>>> Please also provide the top output (press 1 to see all cored) with the
>>>> current configuration,
>>>> I guess kernel is using some of the available cpu with this
>>>> configuration.
>>>>
>>>> Alfredo
>>>>
>>>> On 26 Jun 2018, at 16:31, David Notivol <[email protected]> wrote:
>>>>
>>>> Hi Alfredo,
>>>> Thanks for replying.
>>>> This is an excerpt of the zbalance and nprobe statistics:
>>>>
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
>>>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
>>>> (19'157'949 drops)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305]                 p2p1,p2p2 RX
>>>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 0 RX
>>>> 77050882 pkts Dropped 1127883 pkts (1.4 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 1 RX
>>>> 70722562 pkts Dropped 756409 pkts (1.1 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 2 RX
>>>> 76092418 pkts Dropped 1017335 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 3 RX
>>>> 75088386 pkts Dropped 896678 pkts (1.2 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 4 RX
>>>> 91991042 pkts Dropped 2114739 pkts (2.2 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 5 RX
>>>> 81384450 pkts Dropped 1269385 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 6 RX
>>>> 84310018 pkts Dropped 1801848 pkts (2.1 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 7 RX
>>>> 84554242 pkts Dropped 1487329 pkts (1.7 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 8 RX
>>>> 84090370 pkts Dropped 1482864 pkts (1.7 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 9 RX
>>>> 73642498 pkts Dropped 732237 pkts (1.0 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 10 RX
>>>> 76481026 pkts Dropped 1000496 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 11 RX
>>>> 72496642 pkts Dropped 929049 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 12 RX
>>>> 79386626 pkts Dropped 1122169 pkts (1.4 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 13 RX
>>>> 79418370 pkts Dropped 1187172 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 14 RX
>>>> 80284162 pkts Dropped 1195559 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319]                 Q 15 RX
>>>> 79143426 pkts Dropped 1036797 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
>>>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>>>
>>>>
>>>> # cat /proc/net/pf_ring/stats/*
>>>> ClusterId:         1
>>>> TotQueues:         16
>>>> Applications:      1
>>>> App0Queues:        16
>>>> Duration:          0:00:41:18:386
>>>> Packets:           1191477340
>>>> Forwarded:         1174033613
>>>> Processed:         1173893301
>>>> IFPackets:         1191477364
>>>> IFDropped:         1036448041
>>>>
>>>> Duration: 0:00:41:15:587
>>>> Bytes:    42626434538
>>>> Packets:  71510530
>>>> Dropped:  845465
>>>>
>>>>  [removed to make the mail smaller]
>>
>>>
>>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
>>>> [email protected]>) escribió:
>>>>
>>>>> Hi David
>>>>> please also provide statistics from zbalance_ipc (output or log file)
>>>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>>>
>>>>> Thank you
>>>>> Alfredo
>>>>>
>>>>> On 26 Jun 2018, at 15:32, David Notivol <[email protected]> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>> We're using nProbe to export flows information to kafka. We're
>>>>> listening from two 10Gb interfaces that we merge with zbalance_ipc, and we
>>>>> split them into 16 queues to have 16 nprobe instances.
>>>>>
>>>>> The problem is we are seeing about 40% packet drops reported by
>>>>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>>>>> processing all the traffic. The CPU usage is really high, and the load
>>>>> average is over 25-30.
>>>>>
>>>>> Merging both interfaces we're getting up to 5.5 Gbps, and  1.2 million
>>>>> packets / second; and we're using i40e_zc driver.
>>>>>
>>>>> Do you have any advice to try to improve this performance?
>>>>> Does it make sense we're having packet drops with this amount of
>>>>> traffic, and we're reaching the server limits? Or is any configuration we
>>>>> could tune up to improve it?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> -- System:
>>>>>
>>>>> nProbe:          nProbe v.8.5.180625 (r6185)
>>>>> System RAM: 64GB
>>>>> System CPU:  Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>>>>> cores,  2 threads per core)
>>>>> System OS:    CentOS Linux release 7.4.1708 (Core)
>>>>> Linux Kernel:   3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>>>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> -- zbalance configuration:
>>>>>
>>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l
>>>>> /var/tmp/zbalance.log -v -w
>>>>>
>>>>> -- nProbe configuration:
>>>>>
>>>>> --interface=zc:1@0
>>>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>>>>> --collector=none
>>>>> --idle-timeout=60
>>>>> --snaplen=128
>>>>>
>>>>> [removed to make the mail smaller]
>>
>>
>
> --
> Saludos,
> David Notivol
> [email protected]
> <1.log>_______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc



-- 
Saludos,
David Notivol
[email protected]

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] nProbe performance, zbalance packet drops

Reply via email to