Dear Rubina, On 8/20/18, Rubina Bianchi <r_bian...@outlook.com> wrote: > Hi dear Andrew > > What we were talked before was about "Worker Thread Deadlock".
We had that discussion in march or may. :-) The one I had in mind was another thread, starting with your mail on January 30. I forwarded to you unicast :-) > > I tried to test scenario as you explained and started with 1M entry and > after that I doubled it at each run. > When I test with 4M entry size, I logged two things: > 1. ps aux | grep vpp > 2. First 5 lines of "vppctl show acl-plugin session" > > At first, I've run VPP and configured it with script that I attached to > previous email. > After that I run my logger script. > Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml > -f cap2/sfr.yaml -m 50 -c 3 -d 10000 -p > After tracing VPP logs I found some signs of leakage. I mean in the logs of > VPP, RSS (6th parameter in ps aux command) is increasing continuously > (sometimes more and sometimes less) but on the other side, Trex Total-Rx is > decreasing at the same time. > After about 3000 seconds, I stopped Trex and wait until session table being > cleared. But no change in RSS happens. > Then, I run Trex again without any change and again I saw the increase of > RSS while the Trex Total-Rx is decreasing. Based on the counters, in this test we are continuously churning through the half-open sessions, because we are hitting the maximum session limit. Session creation is quite expensive (at least at this point, I did not optimize that code much yet). > > This is my ram status when vpp is stop: > root@debian-hp:~# free -m > total used free shared buffers cached > Mem: 129135 3414 125721 12 99 591 > -/+ buffers/cache: 2723 126412 > Swap: 2518 0 2518 > > I also attached my logs to this email. This logs are gathered every 20 > seconds. > > In 40M entry size I saw this behavior too, but It happens much faster than > 4M entry size. Yes, because you create more sessions and use more buckets, I think (though this is a speculation at this point, since we dont have the memory outputs). What i sthe maximum amount of simultaneous sessions on the T-rex and what is the connection per second rate ? > I also have a question about your phrase of "Using this method you can > arrive to the number of maximum connections that your memory configuration > can support". > Is there any formula to config init.conf in an efficient way? Because VPP > didn't return any error about misconfiguration. No, there is no formula, unfortunately - hence I can not print an error about a misconfiguration. You can use the "show acl memory" as I described in the other mail, to see what the memory usage in the session bihash is and what is the number of active elements - could you have a look at doing that ? --a > > Thanks, > Sincerely > > > > > ________________________________ > From: Andrew 👽 Yourtchenko <ayour...@gmail.com> > Sent: Sunday, August 19, 2018 8:28 AM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] VPP Memory usage > > Dear Rubina, > > The ACL plugin does all the necessary allocations at startup for all data > structures except the connection bihash. > > You would need to check the current number of the connections as your test > progresses. I believe we had a communication a while ago regarding the > gradual growth of background memory usage within the bihash data structure > as you churn through random addresses. Since then there were some changes > aimed to address this. Please verify what does the current total session > count look like in “show acl-plugin sessions” as your test progresses - > based on what you described I think it continuously increases. > > If the bihash memory requirement for active connections goes above of what > is available from the OS, then there is no feedback to the user code (acl > plugin) other than a full crash. > > The only safeguard I could come up against this situation is the maximum > connection count, which is checked before attempting to insert an entry into > the bihash. > > Your current value is 40 million which is quite a lot, while the hash table > heap size is 17 gigabytes. This might not be enough to hold all the 40 > million entries as the churn progresses and you need to create more > buckets. > > I suggest you keep all the other parameters as they are and start with the > value of maximum connections of 1 million and rerun the test, and monitor > the memory usage within the ACL plugin heap (“show acl-plugin memory”) - it > should stabilize over time at some value and there should be no crash. The > exact usage will depend on the distribution of session entries over bucket > (note that in the worst case you may have one entry per bucket which may > give a lot of overhead). Note that value. > > If you stop the traffic, as the session count goes down to zero, the memory > should get released. > > Then double the max conn count and recheck the behavior same as above - the > usage probably would be about double of the previous one. > > Using this method you can arrive to the number of maximum connections that > your memory configuration can support, and get a gauge of how much memory > you would need for the target amount of connections. > > If in the initial iteration test you observe the memory usage never > stabilizing or if you see that the memory is not being released as the > connection count goes down to zero, then it would be a bug, which we will > need to further troubleshoot - though from your description so far it seems > more a case of tuning the parameters. So please apply the method above and > let me know how it goes! Thanks! > > --a > > On 19 Aug 2018, at 07:26, Rubina Bianchi > <r_bian...@outlook.com<mailto:r_bian...@outlook.com>> wrote: > > > Hi dear VPP > > > I configured vpp stable/1807 and added permit+reflect acl on input and > output of my network interfaces. I configured vpp with 9 cpu (1 main and 8 > worker cpu). My init.conf is: > > > vppctl> > > set acl-plugin session table max-entries 40000000 > set acl-plugin session table hash-table-buckets 1000000 > set acl-plugin session table hash-table-memory 17179869184 > set acl-plugin session timeout udp idle 20 > set acl-plugin session timeout tcp idle 120 > set acl-plugin session timeout tcp transient 30 > > > vpp_api_test> > > acl_add_replace permit > acl_add_replace permit+reflect > > acl_interface_add_del TenGigabitEthernet3/0/0 add output acl 1 > acl_interface_add_del TenGigabitEthernet3/0/1 add output acl 1 > acl_interface_add_del TenGigabitEthernet3/0/0 add input acl 1 > acl_interface_add_del TenGigabitEthernet3/0/1 add input acl 1 > > exec set interface l2 bridge TenGigabitEthernet3/0/0 1 > exec set interface l2 bridge TenGigabitEthernet3/0/1 1 > exec set int state TenGigabitEthernet3/0/0 up > exec set int state TenGigabitEthernet3/0/1 up > > My startup.conf is pasted in this link: > https://paste.ubuntu.com/p/MhQDyqF6Xd/ > > > I used Trex as traffic generator as following: > > ./t-rex-64 --cfg cfg/trex_config.yaml -f cap2/sfr.yaml -m 50 -c 3 -d 3600 > -p > > > During execution of my test, Total-rx continuously decreased and after a > while, it reached to 0. I checked vpp status and it got SIGKILL signal from > OS. > > I monitored vpp memory and it was increasing until it crashed. > > Does acl_plugin session management have any memory leak problem? > > > Regards, > > Rubina > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#10213): https://lists.fd.io/g/vpp-dev/message/10213 > Mute This Topic: https://lists.fd.io/mt/24729023/675608 > Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io> > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub > [ayour...@gmail.com<mailto:ayour...@gmail.com>] > -=-=-=-=-=-=-=-=-=-=-=- >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10224): https://lists.fd.io/g/vpp-dev/message/10224 Mute This Topic: https://lists.fd.io/mt/24729023/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-