Hi all!
We've noticed that each refork causes VPP to leak every frame allocated
at each
worker. It happens at src/vlib/threads.c:916 as vector of next frames is
free'd
without clearing frame allocations.
In turn that causes a fairly significant memory leak in the main heap (up to
200MiB per day in our production setup) that makes running VPP under real
traffic load with frequent reforks unsustainable.
It would be simple and neat to free these vectors, but it won't work:
vlib_frame_free required node runtime to be passed, but
nf->node_runtime_index
is not the right one and since b32bd70c1e83fff90d060ea1bbb41eb55e3f62b1
causes
vlib_frame_free to crash.
Prior to b32bd70 the following code worked fine.
```
vec_foreach(nf, nm_clone->next_frames) {
vlib_node_runtime_t *r =
vlib_node_get_runtime (vm_clone, nf->node_runtime_index);
if ((nf->flags & VLIB_FRAME_IS_ALLOCATED) && nf->frame != NULL)
{
vlib_frame_t *f = nf->frame;
nf->frame = NULL;
vlib_frame_free(vm_clone, r, f);
}
}
```
Reading through vlib_frame_alloc_to_node and existing calls of
vlib_frame_free
in dispatch_pending_node I've figured out that another way to do so may
be to:
1) Iterate through all the nodes
2) Iterate through all the nexts for the node
3) Free allocated frames with the node's runtime
The code looks something like:
```
for (i = 0; i < VLIB_N_NODE_TYPE; i++)
{
vlib_node_runtime_t *r;
vec_foreach (r, nm_clone->nodes_by_type[i])
{
u32 next_index
for (next_index = 0; next_index < r->n_next_nodes;
next_index++)
{
u32 next_node_index = node->next_nodes[next_index];
vlib_next_frame_t *nf =
vlib_node_runtime_get_next_frame (vm_clone, r, next_index);
if ((nf->flags & VLIB_FRAME_IS_ALLOCATED)
&& nf->frame != NULL
&& (nf->frame->frame_flags & VLIB_FRAME_IS_ALLOCATED))
{
vlib_frame_t *f = nf->frame;
nf->frame = NULL;
vlib_node_runtime_t *rt =
vlib_node_get_runtime (vm_clone, next_node_index);
vlib_frame_free (vm_clone, rt, f);
}
}
}
}
```
Well, it does not work either because next_nodes is already rewritten by
main
and cannot be used as it's probably free of refer to not yet exitstent in
nm_clone nodes.
Please give an idea, what is the proper way to fix this leak, maybe nf/frame
structs need additional refs?
--
Best regards,
Dmitry
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21803): https://lists.fd.io/g/vpp-dev/message/21803
Mute This Topic: https://lists.fd.io/mt/93111065/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-