Hello VPP experts, For our NAT44 usage of VPP we have encountered a problem with VPP running out of memory, which now, after much headache and many out-of- memory crashes over the past several months, has turned out to be caused by an infinite loop where VPP gets stuck repeating the three nodes ip4-lookup, ip4-local and nat44-hairpinning. A single packet gets passed around and around between those three nodes, eating more and more memory which causes that worker thread to get stuck and VPP to run out of memory after a few seconds. (Earlier we speculated that it was due to a memory leak but now it seems it was not.)
This concerns the current master branch as well as the stable/2009 branches and earlier VPP versions as well. One scenario when this happens is when a UDP (or TCP) packet is sent from a client on the inside with a destination IP address that matches an existing static NAT mapping that maps that IP address on the inside to the same IP address on the outside. Then, the problem can be triggered for example by doing this from a client on the inside, where DESTINATION_IP is the IP address of such a static mapping: echo hello > /dev/udp/$DESTINATION_IP/33333 Here is the packet trace for the thread that receives the packet at rdma-input: ---------- Packet 42 00:03:07:636840: rdma-input rdma: Interface179 (4) next-node bond-input l2-ok l3-ok l4-ok ip4 udp 00:03:07:636841: bond-input src d4:6a:35:52:30:db, dst 02:fe:8d:23:60:a7, Interface179 -> BondEthernet0 00:03:07:636843: ethernet-input IP4: d4:6a:35:52:30:db -> 02:fe:8d:23:60:a7 802.1q vlan 1013 00:03:07:636844: ip4-input UDP: SOURCE_IP_INSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0xe7e3 dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 48824 -> 33333 length 14, checksum 0x781e 00:03:07:636846: ip4-sv-reassembly-feature [not-fragmented] 00:03:07:636847: nat44-in2out-worker-handoff NAT44_IN2OUT_WORKER_HANDOFF : next-worker 8 trace index 41 ---------- So it is doing handoff to thread 8 with trace index 41. Nothing wrong so far, I think. Here is the beginning of the corresponding packet trace for the receiving thread: ---------- Packet 57 00:03:07:636850: handoff_trace HANDED-OFF: from thread 7 trace index 41 00:03:07:636850: nat44-in2out NAT44_IN2OUT_FAST_PATH: sw_if_index 6, next index 3, session -1 00:03:07:636855: nat44-in2out-slowpath NAT44_IN2OUT_SLOW_PATH: sw_if_index 6, next index 0, session 11 00:03:07:636927: ip4-lookup fib 0 dpo-idx 577 flow hash: 0x00000000 UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636930: ip4-local UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636932: nat44-hairpinning new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping 00:03:07:636934: ip4-lookup fib 0 dpo-idx 577 flow hash: 0x00000000 UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636936: ip4-local UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636937: nat44-hairpinning new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping 00:03:07:636937: ip4-lookup fib 0 dpo-idx 577 flow hash: 0x00000000 UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636939: ip4-local UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN fragment id 0x50fe, flags DONT_FRAGMENT UDP: 63957 -> 33333 length 14, checksum 0xb40b 00:03:07:636940: nat44-hairpinning new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping ... ... and so on. In principle it never ends. To get this trace I had added a hack in nat44-hairpinning to stop when my added debug counter exceeded a few thousand. Without that, it seems to loop forever, that worker thread gets stuck. What happens seems to be that the nat44-hairpinning node determines that there is an existing session and then decides the packet should go to the ip4-lookup node, followed by the ip4-local, followed by the nat44-hairpinning node which makes the same decision again, so it just goes round and round like that. Inside the snat_hairpinning() function it always comes to the "Destination is behind the same NAT, use internal address and port" part and returns 1 there which causes it to choose the ip4-lookup node as next node, even if nothing actually changed. I have a fix that involves changing the snat_hairpinning() function so that it checks if nothing has changed and in that case returns 0, effectively breaking the otherwise infinite loop, but I am not convinced that this is a good solution, feels a bit like an ugly fix even though it seems to solve the problem in practice. Two different questions related to this: (1) The specific NAT hairpinning issue, what should be done to handle it properly? (2) A more general question about detecting if VPP gets into an infinite loop where the same packet gets passed around a ridiculously large number of times among different nodes, maybe it would be a good idea to try to detect when that happens and give an error message about it instead of hanging or crashing when it happens? Best regards, Elias
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18225): https://lists.fd.io/g/vpp-dev/message/18225 Mute This Topic: https://lists.fd.io/mt/78662322/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-