Hello VPP experts,

We have been using VPP for NAT44 for a while and it has been working
fine, but a few days ago when we tried turing on nat ipfix logging, vpp
crashed. It turned out that the problem went away if we used only a
single thread, so it seemed related to how threading was handled in the
ipfix logging code. The crash happened in different ways on different
runs but often seemed related to the snat_ipfix_send() function in
plugins/nat/nat_ipfix_logging.c.

Having looked at the code in nat_ipfix_logging.c I have the following
theory about what goes wrong (I might have misunderstood something, if
so please correct me):

In the the snat_ipfix_send() function, a vlib_main_t data structure is
used, a pointer to it is fetched in the following way:

   vlib_main_t *vm = frm->vlib_main;

So the frm->vlib_main pointer comes from "frm" which has been set to
flow_report_main which is a global data structure from vnet/ipfix-
export/flow_report.c that as far as I can tell only exists once in
memory (not once per thread). This means that different threads calling
the snat_ipfix_send() function are using the same vlib_main_t data
structure. That is not how it should be, I think, instead each thread
should be using its own thread-specific vlib_main_t data structure.

A suggestion for how to fix this is to replace the line

   vlib_main_t *vm = frm->vlib_main;

with the following line

   vlib_main_t *vm = vlib_mains[thread_index];

in all places where worker threads are using such a vlib_main_t
pointer. Using vlib_mains[thread_index] means that we are picking the
thread-specific vlib_main_t data structure for the current thread,
instead of all threads using the same vlib_main_t. I pushed such a
change to gerrit, here: https://gerrit.fd.io/r/c/vpp/+/26359

That fix seems to solve the issue in my tests, vpp does not crash
anymore after the change. Please have a look at it and let me know if
this seems reasonable or if I have misunderstood something.

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15990): https://lists.fd.io/g/vpp-dev/message/15990
Mute This Topic: https://lists.fd.io/mt/72786912/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to