Hello VPP experts, There seems to be a problem with the way port number is selected for NAT: sometimes the selected port number leads to a different thread index being selected for out2in packets, making that session useless. This applies to the current master branch as well as the latest stable branches, I think.
Here is the story as I understand it, please correct me if I have misunderstood something. Each NAT thread has a range of port numbers that it can use, and when a new session is created a port number is picked at random from within that range. That happens when a in2out packet is NATed. Then later when a response comes as a out2in packet, VPP needs to make sure it is handled by the correct thread, the same thread that created the session. The port number to use for a new session is selected in nat_alloc_addr_and_port_default() like this: portnum = (port_per_thread * snat_thread_index) + snat_random_port(1, port_per_thread) + 1024; where port_per_thread is the number of ports each thread is allowed to use, and snat_random_port() returns a random number in the given range. This means that the smallest possible portnum is 1025, that can happen when snat_thread_index is zero. The corresponding calculation to get the thread index back based on the port number is essentially this: (portnum - 1024) / port_per_thread This works most of the time, but not always. It works in all cases except when snat_random_port() returns the largest possible value, in that case we end up with the wrong thread index. That means that out2in packets arriving for that session get handed off to another thread. The other thread is unaware of that session so all out2in packets are then dropped for that session. Since each thread has thousands of port numbers to choose from and the problem only appears for one particular choice, only a small fraction of all sessions are affected by this. In my tests there was 8 NAT threads, then the port_per_thread value was about 8000 so that the probability was about 1/8000 or roughly 0.0125% of all sessions that failed. The test I used was simply to try many separate ping commands with the "-c 1" option, all should give the normal result "1 packets transmitted, 1 received, 0% packet loss" but due to this problem some of the pings fail. Note that it needs to be separate ping commands so that VPP creates a new session for each of them. Provided that you test a large enough number of sessions, it is straightforward to reproduce the problem. It could be fixed in different ways, one way is to simply shift the arguments to snat_random_port() down by one: snat_random_port(1, port_per_thread) --> snat_random_port(0, port_per_thread-1) I pushed such a change to gerrit, here: https://gerrit.fd.io/r/c/vpp/+/27786 The smallest port number used then becomes 1024 instead of 1025 as it has been so far, I suppose that should be OK since it is the "well- known ports" from 0 to 1023 that should be avoided, port 1024 should be okay to use. What do you think, does it make sense to fix it in this way? Best regards, Elias
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16880): https://lists.fd.io/g/vpp-dev/message/16880 Mute This Topic: https://lists.fd.io/mt/75267169/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-