Hi all!

We've noticed that each refork causes VPP to leak every frame allocated at each worker. It happens at src/vlib/threads.c:916 as vector of next frames is free'd
without clearing frame allocations.

In turn that causes a fairly significant memory leak in the main heap (up to
200MiB per day in our production setup) that makes running VPP under real
traffic load with frequent reforks unsustainable.

It would be simple and neat to free these vectors, but it won't work:
vlib_frame_free required node runtime to be passed, but nf->node_runtime_index is not the right one and since b32bd70c1e83fff90d060ea1bbb41eb55e3f62b1 causes
vlib_frame_free to crash.

Prior to b32bd70 the following code worked fine.
```
vec_foreach(nf, nm_clone->next_frames) {
  vlib_node_runtime_t *r =
      vlib_node_get_runtime (vm_clone, nf->node_runtime_index);
  if ((nf->flags & VLIB_FRAME_IS_ALLOCATED) && nf->frame != NULL)
    {
      vlib_frame_t *f = nf->frame;
      nf->frame = NULL;
      vlib_frame_free(vm_clone, r, f);
    }
}
```

Reading through vlib_frame_alloc_to_node and existing calls of vlib_frame_free in dispatch_pending_node I've figured out that another way to do so may be to:
1) Iterate through all the nodes
2) Iterate through all the nexts for the node
3) Free allocated frames with the node's runtime

The code looks something like:
```
for (i = 0; i < VLIB_N_NODE_TYPE; i++)
  {
    vlib_node_runtime_t *r;
    vec_foreach (r, nm_clone->nodes_by_type[i])
      {
        u32 next_index
        for (next_index = 0; next_index < r->n_next_nodes;
             next_index++)
          {
            u32 next_node_index = node->next_nodes[next_index];
            vlib_next_frame_t *nf =
              vlib_node_runtime_get_next_frame (vm_clone, r, next_index);
            if ((nf->flags & VLIB_FRAME_IS_ALLOCATED)
                 && nf->frame != NULL
                 && (nf->frame->frame_flags & VLIB_FRAME_IS_ALLOCATED))
              {
                vlib_frame_t *f = nf->frame;
                nf->frame = NULL;
                vlib_node_runtime_t *rt =
                  vlib_node_get_runtime (vm_clone, next_node_index);
                vlib_frame_free (vm_clone, rt, f);
              }
          }
      }
  }
```

Well, it does not work either because next_nodes is already rewritten by main
and cannot be used as it's probably free of refer to not yet exitstent in
nm_clone nodes.

Please give an idea, what is the proper way to fix this leak, maybe nf/frame
structs need additional refs?

--
Best regards,
Dmitry

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21803): https://lists.fd.io/g/vpp-dev/message/21803
Mute This Topic: https://lists.fd.io/mt/93111065/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to