Dear Rubina, okay, I think I am reasonably happy with the change in https://gerrit.fd.io/r/#/c/12770/ - I also have rebased it onto the latest master so that it is ready to commit if it works for you.
Please give it a shot and let me know. Note that you might need to adjust the bihash memory - as I'm storing forward and reverse entry now explicitly (rather than per-packet calculating them). Please let me know how it works in your test setup. thanks, andrew On 5/30/18, Andrew Yourtchenko <ayour...@gmail.com> wrote: > Dear Rubina, > > Thanks for checking it! > > yeah actually that patch was leaking the sessions in the session reuse > path. I have got the setup in the lab locally yesterday and am working > on a better way to do it... > > Will get back to you when I am happy with the way the code works.. > > --a > > > > On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >> Dear Andrew >> >> I cleaned everything and created a new deb packages by your patch once >> again. With your patch I never see deadlock again, but still I have >> throughput problem in my scenario. >> >> -Per port stats table >> ports | 0 | 1 >> ----------------------------------------------------------------------------------------- >> opackets | 474826597 | 452028770 >> obytes | 207843848531 | 199591809555 >> ipackets | 71010677 | 72028456 >> ibytes | 31441646551 | 31687562468 >> ierrors | 0 | 0 >> oerrors | 0 | 0 >> Tx Bw | 9.56 Gbps | 9.16 Gbps >> >> -Global stats enabled >> Cpu Utilization : 88.4 % 7.1 Gb/core >> Platform_factor : 1.0 >> Total-Tx : 18.72 Gbps >> Total-Rx : 59.30 Mbps >> Total-PPS : 5.31 Mpps >> Total-CPS : 79.79 Kcps >> >> Expected-PPS : 9.02 Mpps >> Expected-CPS : 135.31 Kcps >> Expected-BPS : 31.77 Gbps >> >> Active-flows : 88837 Clients : 252 Socket-util : 0.5598 % >> Open-flows : 14708455 Servers : 65532 Socket : 88837 >> Socket/Clients : 352.5 >> Total_queue_full : 328355248 >> drop-rate : 18.66 Gbps >> current time : 180.9 sec >> test duration : 99819.1 sec >> >> In best case (4 interface in one numa that only 2 of them has acl) my >> device >> (HP DL380 G9) throughput is maximum (18.72Gbps) but in worst case (4 >> interface in one numa that all of them has acl) my device throughput will >> decrease from maximum to around 60Mbps. Actually patch just prevent >> deadlock >> in my case but throughput is same as before. >> >> ________________________________ >> From: Andrew 👽 Yourtchenko <ayour...@gmail.com> >> Sent: Tuesday, May 29, 2018 10:11 AM >> To: Rubina Bianchi >> Cc: vpp-dev@lists.fd.io >> Subject: Re: [vpp-dev] Rx stuck to 0 after a while >> >> Dear Rubina, >> >> thank you for quickly checking it! >> >> Judging by the logs the VPP quits, so I would say there should be a >> core file, could you check ? >> >> If you find it (doublecheck by the timestamps that it is indeed the >> fresh one), you can load it in gdb (using gdb 'path-to-vpp-binary' >> 'path-to-core') and then get the backtrace using 'bt', this will give >> more idea on what is going on. >> >> --a >> >> On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >>> Dear Andrew >>> >>> I tested your patch and my problem still exist, but my service status >>> changed and now there isn't any information about deadlock problem. Do >>> you >>> have any idea about how I can provide you more information? >>> >>> root@MYRB:~# service vpp status >>> * vpp.service - vector packet processing engine >>> Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor >>> preset: >>> enabled) >>> Active: inactive (dead) >>> >>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded >>> plugin: udp_ping_test_plugin.so >>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded >>> plugin: stn_test_plugin.so >>> May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init >>> args: >>> -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w >>> 0000:08:00.0 >>> -w 0000:08:00.1 -w 0000:08 >>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n >>> 4 >>> --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w >>> 0000:08:00.1 -w 0000:08:00.2 -w 000 >>> May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough >>> DPDK >>> crypto resources, default to OpenSSL >>> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received >>> signal >>> SIGCONT, PC 0x7fa535dfbac0 >>> May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting... >>> May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing >>> engine... >>> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received >>> signal >>> SIGTERM, PC 0x7fa534121867 >>> May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing >>> engine. >>> >>> >>> ________________________________ >>> From: Andrew 👽 Yourtchenko <ayour...@gmail.com> >>> Sent: Monday, May 28, 2018 5:58 PM >>> To: Rubina Bianchi >>> Cc: vpp-dev@lists.fd.io >>> Subject: Re: [vpp-dev] Rx stuck to 0 after a while >>> >>> Dear Rubina, >>> >>> Thanks for catching and reporting this! >>> >>> I suspect what might be happening is my recent change of using two >>> unidirectional sessions in bihash vs. the single one triggered a race, >>> whereby as the owning worker is deleting the session, >>> the non-owning worker is trying to update it. That would logically >>> explain the "BUG: .." line (since you don't change the interfaces nor >>> moving the traffic around, the 5 tuples should not collide), and as >>> well the later stop. >>> >>> To take care of this issue, I think I will split the deletion of the >>> session in two stages: >>> 1) deactivation of the bihash entries that steer the traffic >>> 2) freeing up the per-worker session structure >>> >>> and have a little pause time inbetween these two so that the >>> workers-in-progress could >>> finish updating the structures. >>> >>> The below gerrit is the first cut: >>> >>> https://gerrit.fd.io/r/#/c/12770/ >>> >>> It passes the make test right now but I did not kick its tires too >>> much yet, will do tomorrow. >>> >>> You can try this change out in your test setup as well and tell me how >>> it >>> feels. >>> >>> --a >>> >>> On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >>>> Hi >>>> >>>> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful >>>> acl >>>> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will >>>> become 0 after a short while, about 300 sec in my machine. Here is vpp >>>> status: >>>> >>>> root@MYRB:~# service vpp status >>>> * vpp.service - vector packet processing engine >>>> Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor >>>> preset: >>>> enabled) >>>> Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; >>>> 37s >>>> ago >>>> Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm >>>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >>>> Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf >>>> (code=killed, signal=ABRT) >>>> Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic >>>> (code=exited, >>>> status=0/SUCCESS) >>>> Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm >>>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >>>> Main PID: 31754 (code=killed, signal=ABRT) >>>> >>>> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session >>>> LSB16(sw_if_index) and 5-tuple collision! >>>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >>>> signal >>>> SIGCONT, PC 0x7f1fb591cac0 >>>> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting... >>>> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing >>>> engine... >>>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >>>> signal >>>> SIGTERM, PC 0x7f1fb3c40867 >>>> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: >>>> worker >>>> thread deadlock >>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited, >>>> code=killed, status=6/ABRT >>>> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing >>>> engine. >>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed >>>> state. >>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result >>>> 'signal'. >>>> >>>> I attach my vpp configs to this email. I also run this test with the >>>> same >>>> config and added 4 interface instead of two. But in this case nothing >>>> happened to vpp and it was functional for a long time. >>>> >>>> Thanks, >>>> RB >>>> >>> >> > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#9463): https://lists.fd.io/g/vpp-dev/message/9463 View All Messages In Topic (7): https://lists.fd.io/g/vpp-dev/topic/20397310 Mute This Topic: https://lists.fd.io/mt/20397310/21656 New Topic: https://lists.fd.io/g/vpp-dev/post Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656 Group Home: https://lists.fd.io/g/vpp-dev Contact Group Owner: vpp-dev+ow...@lists.fd.io Terms of Service: https://lists.fd.io/static/tos Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub -=-=-=-=-=-=-=-=-=-=-=-