Re: [vpp-dev] VPP hanging and running out of memory due to infinite loop related to nat44-hairpinning

Mrityunjay Kumar Wed, 02 Dec 2020 14:57:48 -0800

Guys, I am not sure , my input is helpful or not but the same issue was
triggered to me and we concluded it in a different way.
In your packet trace , it seems pkt is triggering to vpp node
ip4-sv-reassembly-feature. I suggest first try to enable reassembly on the
interface,

*set interface reassembly <interface-name> on*

Maybe some experts can say by default reassembly is enabled on the
interface but it's not.

try it once, let us know , still u r facing problems. ??


*Regards*,
Mrityunjay Kumar.
Mobile: +91 - 9731528504



On Wed, Dec 2, 2020 at 9:10 PM Elias Rudberg <elias.rudb...@bahnhof.net>
wrote:

> Hello VPP experts,
>
> For our NAT44 usage of VPP we have encountered a problem with VPP
> running out of memory, which now, after much headache and many out-of-
> memory crashes over the past several months, has turned out to be
> caused by an infinite loop where VPP gets stuck repeating the three
> nodes ip4-lookup, ip4-local and nat44-hairpinning. A single packet gets
> passed around and around between those three nodes, eating more and
> more memory which causes that worker thread to get stuck and VPP to run
> out of memory after a few seconds. (Earlier we speculated that it was
> due to a memory leak but now it seems it was not.)
>
> This concerns the current master branch as well as the stable/2009
> branches and earlier VPP versions as well.
>
> One scenario when this happens is when a UDP (or TCP) packet is sent
> from a client on the inside with a destination IP address that matches
> an existing static NAT mapping that maps that IP address on the inside
> to the same IP address on the outside.
>
> Then, the problem can be triggered for example by doing this from a
> client on the inside, where DESTINATION_IP is the IP address of such a
> static mapping:
>
> echo hello > /dev/udp/$DESTINATION_IP/33333
>
> Here is the packet trace for the thread that receives the packet at
> rdma-input:
>
> ----------
>
> Packet 42
>
> 00:03:07:636840: rdma-input
>   rdma: Interface179 (4) next-node bond-input l2-ok l3-ok l4-ok ip4 udp
> 00:03:07:636841: bond-input
>   src d4:6a:35:52:30:db, dst 02:fe:8d:23:60:a7, Interface179 ->
> BondEthernet0
> 00:03:07:636843: ethernet-input
>   IP4: d4:6a:35:52:30:db -> 02:fe:8d:23:60:a7 802.1q vlan 1013
> 00:03:07:636844: ip4-input
>   UDP: SOURCE_IP_INSIDE -> DESTINATION_IP
>     tos 0x00, ttl 63, length 34, checksum 0xe7e3 dscp CS0 ecn NON_ECN
>     fragment id 0x50fe, flags DONT_FRAGMENT
>   UDP: 48824 -> 33333
>     length 14, checksum 0x781e
> 00:03:07:636846: ip4-sv-reassembly-feature
>   [not-fragmented]
> 00:03:07:636847: nat44-in2out-worker-handoff
>   NAT44_IN2OUT_WORKER_HANDOFF : next-worker 8 trace index 41
>
> ----------
>
> So it is doing handoff to thread 8 with trace index 41. Nothing wrong
> so far, I think.
>
> Here is the beginning of the corresponding packet trace for the
> receiving thread:
>
> ----------
>
> Packet 57
>
> 00:03:07:636850: handoff_trace
>   HANDED-OFF: from thread 7 trace index 41
> 00:03:07:636850: nat44-in2out
>   NAT44_IN2OUT_FAST_PATH: sw_if_index 6, next index 3, session -1
> 00:03:07:636855: nat44-in2out-slowpath
>   NAT44_IN2OUT_SLOW_PATH: sw_if_index 6, next index 0, session 11
> 00:03:07:636927: ip4-lookup
>   fib 0 dpo-idx 577 flow hash: 0x00000000
>   UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>     tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>     fragment id 0x50fe, flags DONT_FRAGMENT
>   UDP: 63957 -> 33333
>     length 14, checksum 0xb40b
> 00:03:07:636930: ip4-local
>     UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>       tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>       fragment id 0x50fe, flags DONT_FRAGMENT
>     UDP: 63957 -> 33333
>       length 14, checksum 0xb40b
> 00:03:07:636932: nat44-hairpinning
>   new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping
> 00:03:07:636934: ip4-lookup
>   fib 0 dpo-idx 577 flow hash: 0x00000000
>   UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>     tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>     fragment id 0x50fe, flags DONT_FRAGMENT
>   UDP: 63957 -> 33333
>     length 14, checksum 0xb40b
> 00:03:07:636936: ip4-local
>     UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>       tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>       fragment id 0x50fe, flags DONT_FRAGMENT
>     UDP: 63957 -> 33333
>       length 14, checksum 0xb40b
> 00:03:07:636937: nat44-hairpinning
>   new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping
> 00:03:07:636937: ip4-lookup
>   fib 0 dpo-idx 577 flow hash: 0x00000000
>   UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>     tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>     fragment id 0x50fe, flags DONT_FRAGMENT
>   UDP: 63957 -> 33333
>     length 14, checksum 0xb40b
> 00:03:07:636939: ip4-local
>     UDP: SOURCE_IP_OUTSIDE -> DESTINATION_IP
>       tos 0x00, ttl 63, length 34, checksum 0x5eee dscp CS0 ecn NON_ECN
>       fragment id 0x50fe, flags DONT_FRAGMENT
>     UDP: 63957 -> 33333
>       length 14, checksum 0xb40b
> 00:03:07:636940: nat44-hairpinning
>   new dst addr DESTINATION_IP port 33333 fib-index 0 is-static-mapping
>
> ...
>
> ... and so on. In principle it never ends. To get this trace I had
> added a hack in nat44-hairpinning to stop when my added debug counter
> exceeded a few thousand. Without that, it seems to loop forever, that
> worker thread gets stuck.
>
> What happens seems to be that the nat44-hairpinning node determines
> that there is an existing session and then decides the packet should go
> to the ip4-lookup node, followed by the ip4-local, followed by the
> nat44-hairpinning node which makes the same decision again, so it just
> goes round and round like that. Inside the snat_hairpinning() function
> it always comes to the "Destination is behind the same NAT, use
> internal address and port" part and returns 1 there which causes it to
> choose the ip4-lookup node as next node, even if nothing actually
> changed.
>
> I have a fix that involves changing the snat_hairpinning() function so
> that it checks if nothing has changed and in that case returns 0,
> effectively breaking the otherwise infinite loop, but I am not
> convinced that this is a good solution, feels a bit like an ugly fix
> even though it seems to solve the problem in practice.
>
> Two different questions related to this:
>
> (1) The specific NAT hairpinning issue, what should be done to handle
> it properly?
>
> (2) A more general question about detecting if VPP gets into an
> infinite loop where the same packet gets passed around a ridiculously
> large number of times among different nodes, maybe it would be a good
> idea to try to detect when that happens and give an error message about
> it instead of hanging or crashing when it happens?
>
> Best regards,
> Elias
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18239): https://lists.fd.io/g/vpp-dev/message/18239
Mute This Topic: https://lists.fd.io/mt/78662322/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP hanging and running out of memory due to infinite loop related to nat44-hairpinning

Reply via email to