Dear Rubina,

okay, I think I am reasonably happy with the change in
https://gerrit.fd.io/r/#/c/12770/ -
I also have rebased it onto the latest master so that it is ready to
commit if it works for you.

Please give it a shot and let me know. Note that you might need to
adjust the bihash
memory - as I'm storing forward and reverse entry now explicitly
(rather than per-packet calculating them).

Please let me know how it works in your test setup.

thanks,
andrew

On 5/30/18, Andrew Yourtchenko <ayour...@gmail.com> wrote:
> Dear Rubina,
>
> Thanks for checking it!
>
> yeah actually that patch was leaking the sessions in the session reuse
> path. I have got the setup in the lab locally yesterday and am working
> on a better way to do it...
>
> Will get back to you when I am happy with the way the code works..
>
> --a
>
>
>
> On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
>> Dear Andrew
>>
>> I cleaned everything and created a new deb packages by your patch once
>> again. With your patch I never see deadlock again, but still I have
>> throughput problem in my scenario.
>>
>> -Per port stats table
>>       ports |               0 |               1
>> -----------------------------------------------------------------------------------------
>>    opackets |       474826597 |       452028770
>>      obytes |    207843848531 |    199591809555
>>    ipackets |        71010677 |        72028456
>>      ibytes |     31441646551 |     31687562468
>>     ierrors |               0 |               0
>>     oerrors |               0 |               0
>>       Tx Bw |       9.56 Gbps |       9.16 Gbps
>>
>> -Global stats enabled
>>  Cpu Utilization : 88.4  %  7.1 Gb/core
>>  Platform_factor : 1.0
>>  Total-Tx        :      18.72 Gbps
>>  Total-Rx        :      59.30 Mbps
>>  Total-PPS       :       5.31 Mpps
>>  Total-CPS       :      79.79 Kcps
>>
>>  Expected-PPS    :       9.02 Mpps
>>  Expected-CPS    :     135.31 Kcps
>>  Expected-BPS    :      31.77 Gbps
>>
>>  Active-flows    :    88837  Clients :      252   Socket-util : 0.5598 %
>>  Open-flows      : 14708455  Servers :    65532   Socket :    88837
>> Socket/Clients :  352.5
>>  Total_queue_full : 328355248
>>  drop-rate       :      18.66 Gbps
>>  current time    : 180.9 sec
>>  test duration   : 99819.1 sec
>>
>> In best case (4 interface in one numa that only 2 of them has acl) my
>> device
>> (HP DL380 G9) throughput is maximum (18.72Gbps) but in worst case (4
>> interface in one numa that all of them has acl) my device throughput will
>> decrease from maximum to around 60Mbps. Actually patch just prevent
>> deadlock
>> in my case but throughput is same as before.
>>
>> ________________________________
>> From: Andrew 👽 Yourtchenko <ayour...@gmail.com>
>> Sent: Tuesday, May 29, 2018 10:11 AM
>> To: Rubina Bianchi
>> Cc: vpp-dev@lists.fd.io
>> Subject: Re: [vpp-dev] Rx stuck to 0 after a while
>>
>> Dear Rubina,
>>
>> thank you for quickly checking it!
>>
>> Judging by the logs the VPP quits, so I would say there should be a
>> core file, could you check ?
>>
>> If you find it (doublecheck by the timestamps that it is indeed the
>> fresh one), you can load it in gdb (using gdb 'path-to-vpp-binary'
>> 'path-to-core') and then get the backtrace using 'bt', this will give
>> more idea on what is going on.
>>
>> --a
>>
>> On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
>>> Dear Andrew
>>>
>>> I tested your patch and my problem still exist, but my service status
>>> changed and now there isn't any information about deadlock problem. Do
>>> you
>>> have any idea about how I can provide you more information?
>>>
>>> root@MYRB:~# service vpp status
>>> * vpp.service - vector packet processing engine
>>>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor
>>> preset:
>>> enabled)
>>>    Active: inactive (dead)
>>>
>>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded
>>> plugin: udp_ping_test_plugin.so
>>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded
>>> plugin: stn_test_plugin.so
>>> May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init
>>> args:
>>> -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w
>>> 0000:08:00.0
>>> -w 0000:08:00.1 -w 0000:08
>>> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n
>>> 4
>>> --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w
>>> 0000:08:00.1 -w 0000:08:00.2 -w 000
>>> May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough
>>> DPDK
>>> crypto resources, default to OpenSSL
>>> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received
>>> signal
>>> SIGCONT, PC 0x7fa535dfbac0
>>> May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting...
>>> May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing
>>> engine...
>>> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received
>>> signal
>>> SIGTERM, PC 0x7fa534121867
>>> May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing
>>> engine.
>>>
>>>
>>> ________________________________
>>> From: Andrew 👽 Yourtchenko <ayour...@gmail.com>
>>> Sent: Monday, May 28, 2018 5:58 PM
>>> To: Rubina Bianchi
>>> Cc: vpp-dev@lists.fd.io
>>> Subject: Re: [vpp-dev] Rx stuck to 0 after a while
>>>
>>> Dear Rubina,
>>>
>>> Thanks for catching and reporting this!
>>>
>>> I suspect what might be happening is my recent change of using two
>>> unidirectional sessions in bihash vs. the single one triggered a race,
>>> whereby as the owning worker is deleting the session,
>>> the non-owning worker is trying to update it. That would logically
>>> explain the "BUG: .." line (since you don't change the interfaces nor
>>> moving the traffic around, the 5 tuples should not collide), and as
>>> well the later stop.
>>>
>>> To take care of this issue, I think I will split the deletion of the
>>> session in two stages:
>>> 1) deactivation of the bihash entries that steer the traffic
>>> 2) freeing up the per-worker session structure
>>>
>>> and have a little pause time inbetween these two so that the
>>> workers-in-progress could
>>> finish updating the structures.
>>>
>>> The below gerrit is the first cut:
>>>
>>> https://gerrit.fd.io/r/#/c/12770/
>>>
>>> It passes the make test right now but I did not kick its tires too
>>> much yet, will do tomorrow.
>>>
>>> You can try this change out in your test setup as well and tell me how
>>> it
>>> feels.
>>>
>>> --a
>>>
>>> On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
>>>> Hi
>>>>
>>>> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful
>>>> acl
>>>> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will
>>>> become 0 after a short while, about 300 sec in my machine. Here is vpp
>>>> status:
>>>>
>>>> root@MYRB:~# service vpp status
>>>> * vpp.service - vector packet processing engine
>>>>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor
>>>> preset:
>>>> enabled)
>>>>    Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130;
>>>> 37s
>>>> ago
>>>>   Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm
>>>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>>>>   Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf
>>>> (code=killed, signal=ABRT)
>>>>   Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic
>>>> (code=exited,
>>>> status=0/SUCCESS)
>>>>   Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm
>>>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>>>>  Main PID: 31754 (code=killed, signal=ABRT)
>>>>
>>>> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session
>>>> LSB16(sw_if_index) and 5-tuple collision!
>>>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received
>>>> signal
>>>> SIGCONT, PC 0x7f1fb591cac0
>>>> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting...
>>>> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing
>>>> engine...
>>>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received
>>>> signal
>>>> SIGTERM, PC 0x7f1fb3c40867
>>>> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int:
>>>> worker
>>>> thread deadlock
>>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited,
>>>> code=killed, status=6/ABRT
>>>> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing
>>>> engine.
>>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed
>>>> state.
>>>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result
>>>> 'signal'.
>>>>
>>>> I attach my vpp configs to this email. I also run this test with the
>>>> same
>>>> config and added 4 interface instead of two. But in this case nothing
>>>> happened to vpp and it was functional for a long time.
>>>>
>>>> Thanks,
>>>> RB
>>>>
>>>
>>
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9463): https://lists.fd.io/g/vpp-dev/message/9463
View All Messages In Topic (7): https://lists.fd.io/g/vpp-dev/topic/20397310
Mute This Topic: https://lists.fd.io/mt/20397310/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to