Dear Rubina, Thanks for checking it!
yeah actually that patch was leaking the sessions in the session reuse path. I have got the setup in the lab locally yesterday and am working on a better way to do it... Will get back to you when I am happy with the way the code works.. --a On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote: > Dear Andrew > > I cleaned everything and created a new deb packages by your patch once > again. With your patch I never see deadlock again, but still I have > throughput problem in my scenario. > > -Per port stats table > ports | 0 | 1 > ----------------------------------------------------------------------------------------- > opackets | 474826597 | 452028770 > obytes | 207843848531 | 199591809555 > ipackets | 71010677 | 72028456 > ibytes | 31441646551 | 31687562468 > ierrors | 0 | 0 > oerrors | 0 | 0 > Tx Bw | 9.56 Gbps | 9.16 Gbps > > -Global stats enabled > Cpu Utilization : 88.4 % 7.1 Gb/core > Platform_factor : 1.0 > Total-Tx : 18.72 Gbps > Total-Rx : 59.30 Mbps > Total-PPS : 5.31 Mpps > Total-CPS : 79.79 Kcps > > Expected-PPS : 9.02 Mpps > Expected-CPS : 135.31 Kcps > Expected-BPS : 31.77 Gbps > > Active-flows : 88837 Clients : 252 Socket-util : 0.5598 % > Open-flows : 14708455 Servers : 65532 Socket : 88837 > Socket/Clients : 352.5 > Total_queue_full : 328355248 > drop-rate : 18.66 Gbps > current time : 180.9 sec > test duration : 99819.1 sec > > In best case (4 interface in one numa that only 2 of them has acl) my device > (HP DL380 G9) throughput is maximum (18.72Gbps) but in worst case (4 > interface in one numa that all of them has acl) my device throughput will > decrease from maximum to around 60Mbps. Actually patch just prevent deadlock > in my case but throughput is same as before. > > ________________________________ > From: Andrew 👽 Yourtchenko <ayour...@gmail.com> > Sent: Tuesday, May 29, 2018 10:11 AM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Rx stuck to 0 after a while > > Dear Rubina, > > thank you for quickly checking it! > > Judging by the logs the VPP quits, so I would say there should be a > core file, could you check ? > > If you find it (doublecheck by the timestamps that it is indeed the > fresh one), you can load it in gdb (using gdb 'path-to-vpp-binary' > 'path-to-core') and then get the backtrace using 'bt', this will give > more idea on what is going on. > > --a > > On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >> Dear Andrew >> >> I tested your patch and my problem still exist, but my service status >> changed and now there isn't any information about deadlock problem. Do >> you >> have any idea about how I can provide you more information? >> >> root@MYRB:~# service vpp status >> * vpp.service - vector packet processing engine >> Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor >> preset: >> enabled) >> Active: inactive (dead) >> >> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded >> plugin: udp_ping_test_plugin.so >> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded >> plugin: stn_test_plugin.so >> May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init >> args: >> -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w >> 0000:08:00.0 >> -w 0000:08:00.1 -w 0000:08 >> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n >> 4 >> --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w >> 0000:08:00.1 -w 0000:08:00.2 -w 000 >> May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough >> DPDK >> crypto resources, default to OpenSSL >> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received >> signal >> SIGCONT, PC 0x7fa535dfbac0 >> May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting... >> May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing >> engine... >> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received >> signal >> SIGTERM, PC 0x7fa534121867 >> May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing engine. >> >> >> ________________________________ >> From: Andrew 👽 Yourtchenko <ayour...@gmail.com> >> Sent: Monday, May 28, 2018 5:58 PM >> To: Rubina Bianchi >> Cc: vpp-dev@lists.fd.io >> Subject: Re: [vpp-dev] Rx stuck to 0 after a while >> >> Dear Rubina, >> >> Thanks for catching and reporting this! >> >> I suspect what might be happening is my recent change of using two >> unidirectional sessions in bihash vs. the single one triggered a race, >> whereby as the owning worker is deleting the session, >> the non-owning worker is trying to update it. That would logically >> explain the "BUG: .." line (since you don't change the interfaces nor >> moving the traffic around, the 5 tuples should not collide), and as >> well the later stop. >> >> To take care of this issue, I think I will split the deletion of the >> session in two stages: >> 1) deactivation of the bihash entries that steer the traffic >> 2) freeing up the per-worker session structure >> >> and have a little pause time inbetween these two so that the >> workers-in-progress could >> finish updating the structures. >> >> The below gerrit is the first cut: >> >> https://gerrit.fd.io/r/#/c/12770/ >> >> It passes the make test right now but I did not kick its tires too >> much yet, will do tomorrow. >> >> You can try this change out in your test setup as well and tell me how it >> feels. >> >> --a >> >> On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >>> Hi >>> >>> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl >>> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will >>> become 0 after a short while, about 300 sec in my machine. Here is vpp >>> status: >>> >>> root@MYRB:~# service vpp status >>> * vpp.service - vector packet processing engine >>> Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor >>> preset: >>> enabled) >>> Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; >>> 37s >>> ago >>> Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm >>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >>> Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf >>> (code=killed, signal=ABRT) >>> Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic >>> (code=exited, >>> status=0/SUCCESS) >>> Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm >>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >>> Main PID: 31754 (code=killed, signal=ABRT) >>> >>> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session >>> LSB16(sw_if_index) and 5-tuple collision! >>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >>> signal >>> SIGCONT, PC 0x7f1fb591cac0 >>> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting... >>> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing >>> engine... >>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >>> signal >>> SIGTERM, PC 0x7f1fb3c40867 >>> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: >>> worker >>> thread deadlock >>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited, >>> code=killed, status=6/ABRT >>> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing >>> engine. >>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state. >>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result >>> 'signal'. >>> >>> I attach my vpp configs to this email. I also run this test with the >>> same >>> config and added 4 interface instead of two. But in this case nothing >>> happened to vpp and it was functional for a long time. >>> >>> Thanks, >>> RB >>> >> > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#9453): https://lists.fd.io/g/vpp-dev/message/9453 View All Messages In Topic (6): https://lists.fd.io/g/vpp-dev/topic/20397310 Mute This Topic: https://lists.fd.io/mt/20397310/21656 New Topic: https://lists.fd.io/g/vpp-dev/post Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656 Group Home: https://lists.fd.io/g/vpp-dev Contact Group Owner: vpp-dev+ow...@lists.fd.io Terms of Service: https://lists.fd.io/static/tos Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub -=-=-=-=-=-=-=-=-=-=-=-