Dear VPP hackers,

We are spawning and destroying GTPU tunnels at a high rate. Only 10K tunnels 
ever exist simultaneously in our test.

With default settings, we observe out of memory error in load_balance_create 
after approximately .5M tunnel create commands. Apparently, load balancers are 
leaking.

As far as my understanding goes, a load_balancer is first created in 
fib_entry_track, to get notifications about the route changes. This is only 
created once for a unique DIP and the refcount is correctly decremented once 
the last subscription ceases.

The refcount is also bumped in gtpu_tunnel_restack_dpo, when next_dpo is 
updated. Since the later is never reset, the refcount never drops to zero.

This is straightforward to exercise in CLI: create and immediately destroy 
several GTPU tunnels. Compare `show dpo memory` output before and after.

It looks like other tunnel types, namely geneve, vxlan, vxlan-gpe and vxlan-gbp 
are also susceptible.

My take was to add a call to dpo_reset in add_del_tunnel delete path. Please 
take a look at the patch: https://gerrit.fd.io/r/c/vpp/+/26617

Note: was unable to make a test case for vxlan and vxlan-gbp since they don't 
point next_dpo at a load balancer but rather at a dpo picked up from a bucket 
in the load balancer.

Best,
N
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16130): https://lists.fd.io/g/vpp-dev/message/16130
Mute This Topic: https://lists.fd.io/mt/73171448/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to