Hi Elias,

it’s spot on. I think all of it. Would you like to push an atomic-increment 
patch or should I?

Thanks for spotting this!!!
Klement

> On 1 Apr 2021, at 13:39, Elias Rudberg <elias.rudb...@bahnhof.net> wrote:
> 
> Hello VPP experts,
> 
> I think there is a thread safety issue in the NAT plugin regarding the
> counter for busy ports.
> 
> Looking at this for the master branch now, there has been some
> refactoring lately but the issue has anyway been there for a long time,
> at least several VPP versions back, although filenames and function
> names have changed.
> 
> Here I will take the endpoint-independent code in nat44-ei/nat44_ei.c
> code because that is the part I am using, but it looks like a similar
> issue is there for nat44-ed as well.
> 
> In the nat44_ei_alloc_default_cb() function in nat44_ei.c there is a
> part that looks like this:
> 
>  --a->busy_##n##_port_refcounts[portnum];          \
>  a->busy_##n##_ports_per_thread[thread_index]++;   \
>  a->busy_##n##_ports++;                            \
> 
> where the variable "a" is an address (nat44_ei_address_t) that belongs
> to the "addresses" in the global nat44_ei_main, so not thread-specific. 
> As I understand it, different threads may be using the same "a" at the
> same time.
> 
> At first sight it might seem like all those three lines are risky
> because different threads can execute this code at the same time for
> the same "a". However, the _port_refcounts[portnum] and
> _ports_per_thread[thread_index] parts are actually okay to access
> because the [portnum] and [thread_index] ensures that those lines only
> access parts of those arrays that belong to thecurrent thread, that is
> how the port number is selected.
> 
> So the first two lines there are fine, I think, but the third line,
> incrementing a->busy_##n##_ports, can give a race condition when
> different threads execute it at the same time. The same issue is also
> there in other places where the busy_##n##_ports values are updated.
> 
> I think this is not critical because the busy_##n##_ports information
> (that can be wrong because of this thread safety issue) is not used
> very much. However those values are used in nat44_ei_del_address()
> where it looks like this:
> 
>  /* Delete sessions using address */
>  if (a->busy_tcp_ports || a->busy_udp_ports || a->busy_icmp_ports)
>    {
> 
> and then inside that if-statement there is some code to delete those
> sessions. If the busy_##n##_ports values are wrong it could in
> principle happen that the session deletion is skipped when there were
> actually some sessions that needed deleting. Perhaps rare and perhaps
> resulting in nothing worse than a small memory leak, but anyway.
> 
> One effect of this is that there can be an inconsistency, if we were to
> sum up the busy_##n##_ports_per_thread values for all threads, that
> should be equal to busy_##n##_ports but due to this issue there could
> be a difference, because while the busy_##n##_ports_per_thread values
> are correct the busy_##n##_ports values may have been corupted due to
> the race condition mentioned above.
> 
> Not sure if the above is a problem in practice, my main motivation for
> reporting this is that it confuses me when I am trying to understand
> how te code works in order to do some modifications. Either the code is
> not thread safe there, or I have misunderstood things.
> 
> What do you think, is it an issue?
> If not, what have I missed?
> 
> (This is not an April fools' joke, I really am this pedantic)
> 
> Best regards,
> Elias
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19088): https://lists.fd.io/g/vpp-dev/message/19088
Mute This Topic: https://lists.fd.io/mt/81773552/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to