Hi Luca, Sorry for the delay replying the list. Thanks for your advice.
I tested reducing the template to just the minimal fields, and the performance was way better. Right now, I left the template with ~50 fields (instead of ~150), and leaving zbalance in a dedicated core (not sharing threads with any nProbe instance). With this scenario, zbalance reports around 1-2% packet loss. When I remove the output to kafka (in a different server), and export to TCP, or without any export, the packet loss is at 0%. However, I don't see any error saying the buffers are full, or pointing to any Kafka bottleneck in the logs. (I'm pasting below stats from one of the instances). I guess I am misreading the stats... Should the problem be reflected on these stats? By the way, I'm testing this in a server without any license yet. When I export the output to TCP, it stops working after 25000 flows as expected; but this doesn't happen when exporting to Kafka, it keeps working forever. 11/Jul/2018 03:54:35 [nprobe.c:3177] --------------------------------- 11/Jul/2018 03:54:35 [nprobe.c:3181] Average traffic: [73.50 K pps][All Traffic 352.69 Mb/sec][IP Traffic 309.88 Mb/sec][ratio 0.88] 11/Jul/2018 03:54:35 [nprobe.c:3189] Current traffic: [19.48 K pps][82.17 Mb/sec] 11/Jul/2018 03:54:35 [nprobe.c:3196] Current flow export rate: [1100.4 flows/sec] 11/Jul/2018 03:54:35 [nprobe.c:3199] Flow drops: [export queue too long=0][too many flows=0][ELK queue flow drops=0] 11/Jul/2018 03:54:35 [nprobe.c:3204] Export Queue: 0/1024000 [0.0 %] 11/Jul/2018 03:54:35 [nprobe.c:3209] Flow Buckets: [active=88328][allocated=88328][toBeExported=0] 11/Jul/2018 03:54:35 [nprobe.c:3218] Kafka producer #0 [msgs produced: 1937609474][msgs delivered: 1937609474][bytes delivered: -481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures: 0][queue len: 2] 11/Jul/2018 03:54:35 [nprobe.c:3026] Processed packets: 55205507976 (max bucket search: 4) 11/Jul/2018 03:54:35 [nprobe.c:3009] Fragment queue length: 0 11/Jul/2018 03:54:35 [nprobe.c:3035] Flow export stats: [0 bytes/0 pkts][0 flows/0 pkts sent] 11/Jul/2018 03:54:35 [nprobe.c:3045] Flow drop stats: [4021444583 bytes/127743259 pkts][0 flows] 11/Jul/2018 03:54:35 [nprobe.c:3050] Total flow stats: [4021444583 bytes/127743259 pkts][0 flows/0 pkts sent] 11/Jul/2018 03:54:35 [nprobe.c:3064] Kafka producer #0 [msgs produced: 1937609474][msgs delivered: 1937609474][bytes delivered: -481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures: 0][queue len: 2][1 msg == 1 flow] 11/Jul/2018 03:55:35 [nprobe.c:3177] --------------------------------- Regards, David. El mié., 27 jun. 2018 a las 18:16, Luca Deri (<[email protected]>) escribió: > Hi David > your template is huge. Can you please omit (just for troubleshooting) > "--flow-templ….” and report if you see changes in load? > > Thanks Luca > > On 27 Jun 2018, at 08:43, David Notivol <[email protected]> wrote: > > Hi, > And now: > - 1.log = scenario in your point 1, including top, zbalance output, and > nprobe stats. > > El mié., 27 jun. 2018 a las 17:41, David Notivol (<[email protected]>) > escribió: > >> Hi Alfredo, >> >> Sorry, I forgot to attach the files as you said. I sent them awhile ago, >> but it seems the mail size is over the limit and get held for approval. I'm >> trying now deleting some info from my first email and pasting one file at a >> time. >> >> - 0.log = top output for the scenario in my fist email. >> >> >> >> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (< >> [email protected]>) escribió: >> >>> Hi David >>> >>> On 27 Jun 2018, at 14:20, David Notivol <[email protected]> wrote: >>> >>> Hi Alfredo, >>> Thanks for your recommendations. >>> >>> I tested using core affinity as you suggested, and the in drops >>> disappeared in zbalance. The output drops persist, but the absolute drops >>> are less than before. >>> Actually I had tested the core affinity, but I didn't have in mind the >>> physical cores. Now I put zbalance in one physical core, and 10 nprobe >>> instances not sharing the physical core with zbalance. >>> >>> About your point 2, by using zc drivers, how could I run several nprobe >>> instances to share the load? I'm testing with one instance: -i >>> zc:p2p1,zc:p2p2 >>> >>> >>> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS >>> (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>) >>> >>> Attached you can find: >>> - 0.log = top output for the scenario in my previous email. >>> - 1.log = scenario in your point 1, including top, zbalance output, and >>> nprobe stats. >>> >>> >>> >>> I do not see the attachments, did you forget to enclose them? >>> >>> Alfredo >>> >>> >>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (< >>> [email protected]>) escribió: >>> >>>> Hi David >>>> it seems that you have packet loss both on zbalance and nprobe, >>>> I recommend you to: >>>> 1. set the core affinity for both zbalance_ipc and the nprobe >>>> instances, trying to >>>> use a different core for each (at least do not share the zbalance_ipc >>>> physical core >>>> with nprobe instances) >>>> 2. did you try using zc drivers for capturing traffic from the >>>> interfaces? (zc:p2p1,zc:p2p2) >>>> Please also provide the top output (press 1 to see all cored) with the >>>> current configuration, >>>> I guess kernel is using some of the available cpu with this >>>> configuration. >>>> >>>> Alfredo >>>> >>>> On 26 Jun 2018, at 16:31, David Notivol <[email protected]> wrote: >>>> >>>> Hi Alfredo, >>>> Thanks for replying. >>>> This is an excerpt of the zbalance and nprobe statistics: >>>> >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] ========================= >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv >>>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts >>>> (19'157'949 drops) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX >>>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX >>>> 77050882 pkts Dropped 1127883 pkts (1.4 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX >>>> 70722562 pkts Dropped 756409 pkts (1.1 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX >>>> 76092418 pkts Dropped 1017335 pkts (1.3 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX >>>> 75088386 pkts Dropped 896678 pkts (1.2 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX >>>> 91991042 pkts Dropped 2114739 pkts (2.2 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX >>>> 81384450 pkts Dropped 1269385 pkts (1.5 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX >>>> 84310018 pkts Dropped 1801848 pkts (2.1 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX >>>> 84554242 pkts Dropped 1487329 pkts (1.7 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX >>>> 84090370 pkts Dropped 1482864 pkts (1.7 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX >>>> 73642498 pkts Dropped 732237 pkts (1.0 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX >>>> 76481026 pkts Dropped 1000496 pkts (1.3 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX >>>> 72496642 pkts Dropped 929049 pkts (1.3 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX >>>> 79386626 pkts Dropped 1122169 pkts (1.4 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX >>>> 79418370 pkts Dropped 1187172 pkts (1.5 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX >>>> 80284162 pkts Dropped 1195559 pkts (1.5 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX >>>> 79143426 pkts Dropped 1036797 pkts (1.3 %) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51 >>>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops) >>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] ========================= >>>> >>>> >>>> # cat /proc/net/pf_ring/stats/* >>>> ClusterId: 1 >>>> TotQueues: 16 >>>> Applications: 1 >>>> App0Queues: 16 >>>> Duration: 0:00:41:18:386 >>>> Packets: 1191477340 >>>> Forwarded: 1174033613 >>>> Processed: 1173893301 >>>> IFPackets: 1191477364 >>>> IFDropped: 1036448041 >>>> >>>> Duration: 0:00:41:15:587 >>>> Bytes: 42626434538 >>>> Packets: 71510530 >>>> Dropped: 845465 >>>> >>>> [removed to make the mail smaller] >> >>> >>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (< >>>> [email protected]>) escribió: >>>> >>>>> Hi David >>>>> please also provide statistics from zbalance_ipc (output or log file) >>>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/) >>>>> >>>>> Thank you >>>>> Alfredo >>>>> >>>>> On 26 Jun 2018, at 15:32, David Notivol <[email protected]> wrote: >>>>> >>>>> Hello list, >>>>> >>>>> We're using nProbe to export flows information to kafka. We're >>>>> listening from two 10Gb interfaces that we merge with zbalance_ipc, and we >>>>> split them into 16 queues to have 16 nprobe instances. >>>>> >>>>> The problem is we are seeing about 40% packet drops reported by >>>>> zbalance_ipc, so it looks like nprobe is not capable of reading and >>>>> processing all the traffic. The CPU usage is really high, and the load >>>>> average is over 25-30. >>>>> >>>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million >>>>> packets / second; and we're using i40e_zc driver. >>>>> >>>>> Do you have any advice to try to improve this performance? >>>>> Does it make sense we're having packet drops with this amount of >>>>> traffic, and we're reaching the server limits? Or is any configuration we >>>>> could tune up to improve it? >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>>>> >>>>> -- System: >>>>> >>>>> nProbe: nProbe v.8.5.180625 (r6185) >>>>> System RAM: 64GB >>>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 >>>>> cores, 2 threads per core) >>>>> System OS: CentOS Linux release 7.4.1708 (Core) >>>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 >>>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> -- zbalance configuration: >>>>> >>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l >>>>> /var/tmp/zbalance.log -v -w >>>>> >>>>> -- nProbe configuration: >>>>> >>>>> --interface=zc:1@0 >>>>> --pid-file=/var/run/nprobe-zc1-00.pid >>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt >>>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic" >>>>> --collector=none >>>>> --idle-timeout=60 >>>>> --snaplen=128 >>>>> >>>>> [removed to make the mail smaller] >> >> > > -- > Saludos, > David Notivol > [email protected] > <1.log>_______________________________________________ > Ntop-misc mailing list > [email protected] > http://listgateway.unipi.it/mailman/listinfo/ntop-misc > > > _______________________________________________ > Ntop-misc mailing list > [email protected] > http://listgateway.unipi.it/mailman/listinfo/ntop-misc -- Saludos, David Notivol [email protected]
_______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
