A lookup is performed based on 6 tuple (fib, saddr, daddr, sport, dport, proto) 
- value contains thread index + pool index. If on the wrong thread, then a 
handoff is performed. See nat44_ed_classify.c.

Mentioned extreme case with buffer shortage can occur if total number of 
buffers available to VPP is close to or less than the number of possible 
buffers stuck in frame queues. If you dedicated lots of memory to VPP, then 
this won’t occur.

I don’t have an answer for increasing congestion drop, but I would read through 
the infra code which handles frame queues. A while ago (>1 year) the algorithm 
wasn’t optimal - as in, the frames would enter the queue as they were, causing 
stalls and congestion drop if the consuming thread was slower than the 
producing thread. Producing thread would pick a couple of packets from NIC at a 
time (say 1 or 2) and create lots of small frames for consuming thread. But 
because consuming thread would only process 1 or 2 packets at a time, the 
performance would be bad because vectorisation savings would not kick in. My 
fix at that time (https://gerrit.fd.io/r/c/vpp/+/28980) was to add coalescing 
to infra so that small frames would be merged into bigger ones. This fixed my 
test scenario where one worker would hand off 100% traffic to another worker. 
This patch was never merged as Damjan did a rework of that part of code and if 
I remember correctly, it was no longer needed as the new algorithm didn’t 
suffer from that issue. Maybe it’s worth it to take a look at what happens (you 
would need to add some debug counters or maybe prints (careful with those - IO 
messes up performance and can change system dynamic, my solution was to print 
stats once per second or so) of some sorts probably, I don’t think there is a 
tool for that).

Regards,
Klement

On 17 Mar 2022, at 09:51, Pan Yueyang 
<yueyang....@epfl.ch<mailto:yueyang....@epfl.ch>> wrote:


Hi Klement, thanks for your prompt reply. I am not clear about how VPP makes 
this split choice. In which file does it make such a split and what are the 
factors that influence this split decision?
Also, I still can see a case in which you will have more congestion drop with a 
larger queue size. I did not see the VPP complaining about the buffer 
allocation so I think the sizes are valid (I test 32 to 2048). I tested with 
the same settings except for NAT_FQ_NELTS and with the same traffic. I think 
the latency of processing a packet should be similar in my case. Thus the 
processing speed of the part after the handoff should be similar (similar mean 
and variance). Then with a larger queue, I expect the congestion drop should be 
no more than with a small queue because it could hold more variance. If the 
incoming speed is basically larger than the processing speed itself, then I ] 
expect with the larger queue, the congestion drop will not increase heavily at 
least (with the assumption that the handoff rate is almost constant). Thus I 
still don’t understand why the congestion drop would increase after some queue 
size even if the packets pile up in the handoff queue. Could you please 
elaborate more on this?
       Best Wishes
       Pan


________________________________
发件人: Klement Sekera <klem...@graphiant.com<mailto:klem...@graphiant.com>>
发送时间: 2022年3月17日 14:59:58
收件人: Pan Yueyang
抄送: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
主题: Re: [vpp-dev] Meanings of different vector rates and questions about NAT44 
handoff queue size

Hey,

I can provide insight into the NAT bit.

On 17 Mar 2022, at 06:06, Yueyang Pan via lists.fd.io<http://lists.fd.io/> 
<yueyang.pan=epfl...@lists.fd.io<mailto:yueyang.pan=epfl...@lists.fd.io>> wrote:

Also I noticed size of handoff queue is very much important to the performance 
of NAT44 but I was wondering why the congestion drop in NAT handoff would 
increase after the handoff queue size (NAT_FQ_NELTS) is larger than a certain 
value (in my case 512). Does anyone also experience this case and have any 
ideas?

Handoff queue size is in frames, so a size of 64 can hold up to 64*256 packets, 
but there is rarely a full frame there, because these frames are the result of 
an incoming frame being split in between workers. Say you have 2 workers and a 
full frame of 256 packets hits worker #1, which sifts through the packets and 
decides that 156 packets are for #1 and 100 are for #2. So in this case, 156 
packets continue processing on worker #1 and there is a new frame created with 
100 packets and put into a queue to be processed on worker #2.

If you are hitting congestion drop it means your system is very close to or 
simply cannot handle such packet rates. Increasing frame queue size might help 
with some slight invariances, but ultimately makes the situation worse as 
packets begin to pile up within vpp waiting to be process. If you continued 
increasing this size, eventually you’d hit a situation where NIC driver would 
have problems allocating new vlib_buffers, because all buffers would be stuck 
in queues waiting.

Regards,
Klement

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21050): https://lists.fd.io/g/vpp-dev/message/21050
Mute This Topic: https://lists.fd.io/mt/89839411/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to