I used the usual 1. start traffic 2. clear run 3. wait n seconds (e.g. n == 10) 4. show run
Klement > On 13 Nov 2020, at 18:21, Marcos - Mgiga <mar...@mgiga.com.br> wrote: > > Understood. And what path did you take in order to analyse and monitor vector > rates ? Is there some specific command or log ? > > Thanks > > Marcos > > -----Mensagem original----- > De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de ksekera via [] > Enviada em: sexta-feira, 13 de novembro de 2020 14:02 > Para: Marcos - Mgiga <mar...@mgiga.com.br> > Cc: Elias Rudberg <elias.rudb...@bahnhof.net>; vpp-dev@lists.fd.io > Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size > NAT_FQ_NELTS to avoid congestion drops? > > Not completely idle, more like medium load. Vector rates at which I saw > congestion drops were roughly 40 for thread doing no work (just handoffs - I > hardcoded it this way for test purpose), and roughly 100 for thread picking > the packets doing NAT. > > What got me into infra investigation was the fact that once I was hitting > vector rates around 255, I did see packet drops, but no congestion drops. > > HTH, > Klement > >> On 13 Nov 2020, at 17:51, Marcos - Mgiga <mar...@mgiga.com.br> wrote: >> >> So you mean that this situation ( congestion drops) is most likely to occur >> when the system in general is idle than when it is processing a large amount >> of traffic? >> >> Best Regards >> >> Marcos >> >> -----Mensagem original----- >> De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de Klement >> Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 2020 >> 12:15 >> Para: Elias Rudberg <elias.rudb...@bahnhof.net> >> Cc: vpp-dev@lists.fd.io >> Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size >> NAT_FQ_NELTS to avoid congestion drops? >> >> Hi Elias, >> >> I’ve already debugged this and came to the conclusion that it’s the infra >> which is the weak link. I was seeing congestion drops at mild load, but not >> at full load. Issue is that with handoff, there is uneven workload. For >> simplicity’s sake, just consider thread 1 handing off all the traffic to >> thread 2. What happens is that for thread 1, the job is much easier, it just >> does some ip4 parsing and then hands packet to thread 2, which actually does >> the heavy lifting of hash inserts/lookups/translation etc. 64 element queue >> can hold 64 frames, one extreme is 64 1-packet frames, totalling 64 packets, >> other extreme is 64 255-packet frames, totalling ~16k packets. What happens >> is this: thread 1 is mostly idle and just picking a few packets from NIC and >> every one of these small frames creates an entry in the handoff queue. Now >> thread 2 picks one element from the handoff queue and deals with it before >> picking another one. If the queue has only 3-packet or 10-packet elements, >> then thread 2 can never really get into what VPP excels in - bulk processing. >> >> Q: Why doesn’t it pick as many packets as possible from the handoff queue? >> A: It’s not implemented. >> >> I already wrote a patch for it, which made all congestion drops which I saw >> (in above synthetic test case) disappear. Mentioned patch >> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit. >> >> Would you like to give it a try and see if it helps your issue? We >> shouldn’t need big queues under mild loads anyway … >> >> Regards, >> Klement >> >>> On 13 Nov 2020, at 16:03, Elias Rudberg <elias.rudb...@bahnhof.net> wrote: >>> >>> Hello VPP experts, >>> >>> We are using VPP for NAT44 and we get some "congestion drops", in a >>> situation where we think VPP is far from overloaded in general. Then >>> we started to investigate if it would help to use a larger handoff >>> frame queue size. In theory at least, allowing a longer queue could >>> help avoiding drops in case of short spikes of traffic, or if it >>> happens that some worker thread is temporarily busy for whatever >>> reason. >>> >>> The NAT worker handoff frame queue size is hard-coded in the >>> NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value >>> is 64. The idea is that putting a larger value there could help. >>> >>> We have run some tests where we changed the NAT_FQ_NELTS value from >>> 64 to a range of other values, each time rebuilding VPP and running >>> an identical test, a test case that is to some extent trying to mimic >>> our real traffic, although of course it is simplified. The test runs >>> many >>> iperf3 tests simultaneously using TCP, combined with some UDP traffic >>> chosen to trigger VPP to create more new sessions (to make the NAT >>> "slowpath" happen more). >>> >>> The following NAT_FQ_NELTS values were tested: >>> 16 >>> 32 >>> 64 <-- current value >>> 128 >>> 256 >>> 512 >>> 1024 >>> 2048 <-- best performance in our tests >>> 4096 >>> 8192 >>> 16384 >>> 32768 >>> 65536 >>> 131072 >>> >>> In those tests, performance was very bad for the smallest >>> NAT_FQ_NELTS values of 16 and 32, while values larger than 64 gave >>> improved performance. The best results in terms of throughput were >>> seen for NAT_FQ_NELTS=2048. For even larger values than that, we got >>> reduced performance compared to the 2048 case. >>> >>> The tests were done for VPP 20.05 running on a Ubuntu 18.04 server >>> with a 12-core Intel Xeon CPU and two Mellanox mlx5 network cards. >>> The number of NAT threads was 8 in some of the tests and 4 in some of >>> the tests. >>> >>> According to these tests, the effect of changing NAT_FQ_NELTS can be >>> quite large. For example, for one test case chosen such that >>> congestion drops were a significant problem, the throughput increased >>> from about 43 to 90 Gbit/second with the amount of congestion drops >>> per second reduced to about one third. In another kind of test, >>> throughput increased by about 20% with congestion drops reduced to >>> zero. Of course such results depend a lot on how the tests are >>> constructed. But anyway, it seems clear that the choice of >>> NAT_FQ_NELTS value can be important and that increasing it would be >>> good, at least for the kind of usage we have tested now. >>> >>> Based on the above, we are considering changing NAT_FQ_NELTS from 64 >>> to a larger value and start trying that in our production environment >>> (so far we have only tried it in a test environment). >>> >>> Were there specific reasons for setting NAT_FQ_NELTS to 64? >>> >>> Are there some potential drawbacks or dangers of changing it to a >>> larger value? >>> >>> Would you consider changing to a larger value in the official VPP >>> code? >>> >>> Best regards, >>> Elias >>> >>> >>> >>> >> >> > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18020): https://lists.fd.io/g/vpp-dev/message/18020 Mute This Topic: https://lists.fd.io/mt/78234440/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-