A lookup is performed based on 6 tuple (fib, saddr, daddr, sport, dport, proto) - value contains thread index + pool index. If on the wrong thread, then a handoff is performed. See nat44_ed_classify.c.
Mentioned extreme case with buffer shortage can occur if total number of buffers available to VPP is close to or less than the number of possible buffers stuck in frame queues. If you dedicated lots of memory to VPP, then this won’t occur. I don’t have an answer for increasing congestion drop, but I would read through the infra code which handles frame queues. A while ago (>1 year) the algorithm wasn’t optimal - as in, the frames would enter the queue as they were, causing stalls and congestion drop if the consuming thread was slower than the producing thread. Producing thread would pick a couple of packets from NIC at a time (say 1 or 2) and create lots of small frames for consuming thread. But because consuming thread would only process 1 or 2 packets at a time, the performance would be bad because vectorisation savings would not kick in. My fix at that time (https://gerrit.fd.io/r/c/vpp/+/28980) was to add coalescing to infra so that small frames would be merged into bigger ones. This fixed my test scenario where one worker would hand off 100% traffic to another worker. This patch was never merged as Damjan did a rework of that part of code and if I remember correctly, it was no longer needed as the new algorithm didn’t suffer from that issue. Maybe it’s worth it to take a look at what happens (you would need to add some debug counters or maybe prints (careful with those - IO messes up performance and can change system dynamic, my solution was to print stats once per second or so) of some sorts probably, I don’t think there is a tool for that). Regards, Klement On 17 Mar 2022, at 09:51, Pan Yueyang <yueyang....@epfl.ch<mailto:yueyang....@epfl.ch>> wrote: Hi Klement, thanks for your prompt reply. I am not clear about how VPP makes this split choice. In which file does it make such a split and what are the factors that influence this split decision? Also, I still can see a case in which you will have more congestion drop with a larger queue size. I did not see the VPP complaining about the buffer allocation so I think the sizes are valid (I test 32 to 2048). I tested with the same settings except for NAT_FQ_NELTS and with the same traffic. I think the latency of processing a packet should be similar in my case. Thus the processing speed of the part after the handoff should be similar (similar mean and variance). Then with a larger queue, I expect the congestion drop should be no more than with a small queue because it could hold more variance. If the incoming speed is basically larger than the processing speed itself, then I ] expect with the larger queue, the congestion drop will not increase heavily at least (with the assumption that the handoff rate is almost constant). Thus I still don’t understand why the congestion drop would increase after some queue size even if the packets pile up in the handoff queue. Could you please elaborate more on this? Best Wishes Pan ________________________________ 发件人: Klement Sekera <klem...@graphiant.com<mailto:klem...@graphiant.com>> 发送时间: 2022年3月17日 14:59:58 收件人: Pan Yueyang 抄送: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 主题: Re: [vpp-dev] Meanings of different vector rates and questions about NAT44 handoff queue size Hey, I can provide insight into the NAT bit. On 17 Mar 2022, at 06:06, Yueyang Pan via lists.fd.io<http://lists.fd.io/> <yueyang.pan=epfl...@lists.fd.io<mailto:yueyang.pan=epfl...@lists.fd.io>> wrote: Also I noticed size of handoff queue is very much important to the performance of NAT44 but I was wondering why the congestion drop in NAT handoff would increase after the handoff queue size (NAT_FQ_NELTS) is larger than a certain value (in my case 512). Does anyone also experience this case and have any ideas? Handoff queue size is in frames, so a size of 64 can hold up to 64*256 packets, but there is rarely a full frame there, because these frames are the result of an incoming frame being split in between workers. Say you have 2 workers and a full frame of 256 packets hits worker #1, which sifts through the packets and decides that 156 packets are for #1 and 100 are for #2. So in this case, 156 packets continue processing on worker #1 and there is a new frame created with 100 packets and put into a queue to be processed on worker #2. If you are hitting congestion drop it means your system is very close to or simply cannot handle such packet rates. Increasing frame queue size might help with some slight invariances, but ultimately makes the situation worse as packets begin to pile up within vpp waiting to be process. If you continued increasing this size, eventually you’d hit a situation where NIC driver would have problems allocating new vlib_buffers, because all buffers would be stuck in queues waiting. Regards, Klement
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#21050): https://lists.fd.io/g/vpp-dev/message/21050 Mute This Topic: https://lists.fd.io/mt/89839411/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-