Hello,

There was a merge conflict for my previous fix for this. Now I made a
new one, it's essentially the same thing, just avoiding the merge
conflict: https://gerrit.fd.io/r/c/vpp/+/26659

Please have a look at that one and merge if it seems ok. Based on our
experience from the past few weeks it seems good, we have seen no more
ipfix loggning crashes after implementing this fix.

Best regards,
Elias


On Sun, 2020-04-05 at 12:08 +0000, Dave Barach via lists.fd.io wrote:
> If you have the thread index handy, that's OK. Otherwise, use
> vlib_get_main() which grabs the thread index from thread local
> storage. 
> 
> -----Original Message-----
> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Elias
> Rudberg
> Sent: Sunday, April 5, 2020 4:58 AM
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] VPP nat ipfix logging problem, need to use thread-
> specific vlib_main_t?
> 
> Hello VPP experts,
> 
> We have been using VPP for NAT44 for a while and it has been working
> fine, but a few days ago when we tried turing on nat ipfix logging,
> vpp crashed. It turned out that the problem went away if we used only
> a single thread, so it seemed related to how threading was handled in
> the ipfix logging code. The crash happened in different ways on
> different runs but often seemed related to the snat_ipfix_send()
> function in plugins/nat/nat_ipfix_logging.c.
> 
> Having looked at the code in nat_ipfix_logging.c I have the following
> theory about what goes wrong (I might have misunderstood something,
> if so please correct me):
> 
> In the the snat_ipfix_send() function, a vlib_main_t data structure
> is used, a pointer to it is fetched in the following way:
> 
>    vlib_main_t *vm = frm->vlib_main;
> 
> So the frm->vlib_main pointer comes from "frm" which has been set to
> flow_report_main which is a global data structure from vnet/ipfix-
> export/flow_report.c that as far as I can tell only exists once in
> memory (not once per thread). This means that different threads
> calling the snat_ipfix_send() function are using the same vlib_main_t
> data structure. That is not how it should be, I think, instead each
> thread should be using its own thread-specific vlib_main_t data
> structure.
> 
> A suggestion for how to fix this is to replace the line
> 
>    vlib_main_t *vm = frm->vlib_main;
> 
> with the following line
> 
>    vlib_main_t *vm = vlib_mains[thread_index];
> 
> in all places where worker threads are using such a vlib_main_t
> pointer. Using vlib_mains[thread_index] means that we are picking the
> thread-specific vlib_main_t data structure for the current thread,
> instead of all threads using the same vlib_main_t. I pushed such a
> change to gerrit, here: https://gerrit.fd.io/r/c/vpp/+/26359
> 
> That fix seems to solve the issue in my tests, vpp does not crash
> anymore after the change. Please have a look at it and let me know if
> this seems reasonable or if I have misunderstood something.
> 
> Best regards,
> Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16147): https://lists.fd.io/g/vpp-dev/message/16147
Mute This Topic: https://lists.fd.io/mt/72786912/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to