Just checked out the patch; you are compressing the frames on the receiving thread side. I didn't realize (i.e., look) that the code was copying the buffer indexes into a new frame anyway, as it is, I think this is a good fix!
Thanks, Chris. > On Nov 16, 2020, at 4:20 AM, Klement Sekera -X (ksekera - PANTHEON TECH SRO > at Cisco) <ksek...@cisco.com> wrote: > > That’s exactly what my patch improves. Coalescing small groups of packets > waiting in the handoff queue into a full(er) frame allows the downstream node > to do more “V” and achieve better performance. And that’s also what I’ve seen > when testing the patch. > > Thanks, > Klement > > ps. in case you missed the link: https://gerrit.fd.io/r/c/vpp/+/28980 > >> On 13 Nov 2020, at 22:47, Christian Hopps <cho...@chopps.org> wrote: >> >> FWIW, I too have hit this issue. Basically VPP is designed to process a >> packet from rx to tx in the same thread. When downstream nodes run slower, >> the upstream rx node doesn't run, so the vector size in each frame naturally >> increases, and then the downstream nodes can benefit from "V" (i.e., >> processing multiple packets in one go). >> >> This back-pressure from downstream does not occur when you hand-off from a >> fast thread to a slower thread, so you end up with many single packet frames >> and fill your hand-off queue. >> >> The quick fix one tries then is to increase the queue size; however, this is >> not a great solution b/c you are still not taking advantage of the "V" in >> VPP. To really fit this back into the original design one needs to somehow >> still be creating larger vectors in the hand-off frames. >> >> TBH I think the right solution here is to not hand-off frames, and instead >> switch to packet queues and then on the handed-off side the frames would get >> constructed from packet queues (basically creating another polling input >> node but on the new thread). >> >> Thanks, >> Chris. >> >>> On Nov 13, 2020, at 12:21 PM, Marcos - Mgiga <mar...@mgiga.com.br> wrote: >>> >>> Understood. And what path did you take in order to analyse and monitor >>> vector rates ? Is there some specific command or log ? >>> >>> Thanks >>> >>> Marcos >>> >>> -----Mensagem original----- >>> De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de ksekera via [] >>> Enviada em: sexta-feira, 13 de novembro de 2020 14:02 >>> Para: Marcos - Mgiga <mar...@mgiga.com.br> >>> Cc: Elias Rudberg <elias.rudb...@bahnhof.net>; vpp-dev@lists.fd.io >>> Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size >>> NAT_FQ_NELTS to avoid congestion drops? >>> >>> Not completely idle, more like medium load. Vector rates at which I saw >>> congestion drops were roughly 40 for thread doing no work (just handoffs - >>> I hardcoded it this way for test purpose), and roughly 100 for thread >>> picking the packets doing NAT. >>> >>> What got me into infra investigation was the fact that once I was hitting >>> vector rates around 255, I did see packet drops, but no congestion drops. >>> >>> HTH, >>> Klement >>> >>>> On 13 Nov 2020, at 17:51, Marcos - Mgiga <mar...@mgiga.com.br> wrote: >>>> >>>> So you mean that this situation ( congestion drops) is most likely to >>>> occur when the system in general is idle than when it is processing a >>>> large amount of traffic? >>>> >>>> Best Regards >>>> >>>> Marcos >>>> >>>> -----Mensagem original----- >>>> De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de Klement >>>> Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 2020 >>>> 12:15 >>>> Para: Elias Rudberg <elias.rudb...@bahnhof.net> >>>> Cc: vpp-dev@lists.fd.io >>>> Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size >>>> NAT_FQ_NELTS to avoid congestion drops? >>>> >>>> Hi Elias, >>>> >>>> I’ve already debugged this and came to the conclusion that it’s the infra >>>> which is the weak link. I was seeing congestion drops at mild load, but >>>> not at full load. Issue is that with handoff, there is uneven workload. >>>> For simplicity’s sake, just consider thread 1 handing off all the traffic >>>> to thread 2. What happens is that for thread 1, the job is much easier, it >>>> just does some ip4 parsing and then hands packet to thread 2, which >>>> actually does the heavy lifting of hash inserts/lookups/translation etc. >>>> 64 element queue can hold 64 frames, one extreme is 64 1-packet frames, >>>> totalling 64 packets, other extreme is 64 255-packet frames, totalling >>>> ~16k packets. What happens is this: thread 1 is mostly idle and just >>>> picking a few packets from NIC and every one of these small frames creates >>>> an entry in the handoff queue. Now thread 2 picks one element from the >>>> handoff queue and deals with it before picking another one. If the queue >>>> has only 3-packet or 10-packet elements, then thread 2 can never really >>>> get into what VPP excels in - bulk processing. >>>> >>>> Q: Why doesn’t it pick as many packets as possible from the handoff queue? >>>> A: It’s not implemented. >>>> >>>> I already wrote a patch for it, which made all congestion drops which I >>>> saw (in above synthetic test case) disappear. Mentioned patch >>>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit. >>>> >>>> Would you like to give it a try and see if it helps your issue? We >>>> shouldn’t need big queues under mild loads anyway … >>>> >>>> Regards, >>>> Klement >>>> >>>>> On 13 Nov 2020, at 16:03, Elias Rudberg <elias.rudb...@bahnhof.net> wrote: >>>>> >>>>> Hello VPP experts, >>>>> >>>>> We are using VPP for NAT44 and we get some "congestion drops", in a >>>>> situation where we think VPP is far from overloaded in general. Then >>>>> we started to investigate if it would help to use a larger handoff >>>>> frame queue size. In theory at least, allowing a longer queue could >>>>> help avoiding drops in case of short spikes of traffic, or if it >>>>> happens that some worker thread is temporarily busy for whatever >>>>> reason. >>>>> >>>>> The NAT worker handoff frame queue size is hard-coded in the >>>>> NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value >>>>> is 64. The idea is that putting a larger value there could help. >>>>> >>>>> We have run some tests where we changed the NAT_FQ_NELTS value from >>>>> 64 to a range of other values, each time rebuilding VPP and running >>>>> an identical test, a test case that is to some extent trying to mimic >>>>> our real traffic, although of course it is simplified. The test runs >>>>> many >>>>> iperf3 tests simultaneously using TCP, combined with some UDP traffic >>>>> chosen to trigger VPP to create more new sessions (to make the NAT >>>>> "slowpath" happen more). >>>>> >>>>> The following NAT_FQ_NELTS values were tested: >>>>> 16 >>>>> 32 >>>>> 64 <-- current value >>>>> 128 >>>>> 256 >>>>> 512 >>>>> 1024 >>>>> 2048 <-- best performance in our tests >>>>> 4096 >>>>> 8192 >>>>> 16384 >>>>> 32768 >>>>> 65536 >>>>> 131072 >>>>> >>>>> In those tests, performance was very bad for the smallest >>>>> NAT_FQ_NELTS values of 16 and 32, while values larger than 64 gave >>>>> improved performance. The best results in terms of throughput were >>>>> seen for NAT_FQ_NELTS=2048. For even larger values than that, we got >>>>> reduced performance compared to the 2048 case. >>>>> >>>>> The tests were done for VPP 20.05 running on a Ubuntu 18.04 server >>>>> with a 12-core Intel Xeon CPU and two Mellanox mlx5 network cards. >>>>> The number of NAT threads was 8 in some of the tests and 4 in some of >>>>> the tests. >>>>> >>>>> According to these tests, the effect of changing NAT_FQ_NELTS can be >>>>> quite large. For example, for one test case chosen such that >>>>> congestion drops were a significant problem, the throughput increased >>>>> from about 43 to 90 Gbit/second with the amount of congestion drops >>>>> per second reduced to about one third. In another kind of test, >>>>> throughput increased by about 20% with congestion drops reduced to >>>>> zero. Of course such results depend a lot on how the tests are >>>>> constructed. But anyway, it seems clear that the choice of >>>>> NAT_FQ_NELTS value can be important and that increasing it would be >>>>> good, at least for the kind of usage we have tested now. >>>>> >>>>> Based on the above, we are considering changing NAT_FQ_NELTS from 64 >>>>> to a larger value and start trying that in our production environment >>>>> (so far we have only tried it in a test environment). >>>>> >>>>> Were there specific reasons for setting NAT_FQ_NELTS to 64? >>>>> >>>>> Are there some potential drawbacks or dangers of changing it to a >>>>> larger value? >>>>> >>>>> Would you consider changing to a larger value in the official VPP >>>>> code? >>>>> >>>>> Best regards, >>>>> Elias >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >>> >> >
signature.asc
Description: Message signed with OpenPGP
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18041): https://lists.fd.io/g/vpp-dev/message/18041 Mute This Topic: https://lists.fd.io/mt/78240090/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-