Hi Elias, it’s spot on. I think all of it. Would you like to push an atomic-increment patch or should I?
Thanks for spotting this!!! Klement > On 1 Apr 2021, at 13:39, Elias Rudberg <elias.rudb...@bahnhof.net> wrote: > > Hello VPP experts, > > I think there is a thread safety issue in the NAT plugin regarding the > counter for busy ports. > > Looking at this for the master branch now, there has been some > refactoring lately but the issue has anyway been there for a long time, > at least several VPP versions back, although filenames and function > names have changed. > > Here I will take the endpoint-independent code in nat44-ei/nat44_ei.c > code because that is the part I am using, but it looks like a similar > issue is there for nat44-ed as well. > > In the nat44_ei_alloc_default_cb() function in nat44_ei.c there is a > part that looks like this: > > --a->busy_##n##_port_refcounts[portnum]; \ > a->busy_##n##_ports_per_thread[thread_index]++; \ > a->busy_##n##_ports++; \ > > where the variable "a" is an address (nat44_ei_address_t) that belongs > to the "addresses" in the global nat44_ei_main, so not thread-specific. > As I understand it, different threads may be using the same "a" at the > same time. > > At first sight it might seem like all those three lines are risky > because different threads can execute this code at the same time for > the same "a". However, the _port_refcounts[portnum] and > _ports_per_thread[thread_index] parts are actually okay to access > because the [portnum] and [thread_index] ensures that those lines only > access parts of those arrays that belong to thecurrent thread, that is > how the port number is selected. > > So the first two lines there are fine, I think, but the third line, > incrementing a->busy_##n##_ports, can give a race condition when > different threads execute it at the same time. The same issue is also > there in other places where the busy_##n##_ports values are updated. > > I think this is not critical because the busy_##n##_ports information > (that can be wrong because of this thread safety issue) is not used > very much. However those values are used in nat44_ei_del_address() > where it looks like this: > > /* Delete sessions using address */ > if (a->busy_tcp_ports || a->busy_udp_ports || a->busy_icmp_ports) > { > > and then inside that if-statement there is some code to delete those > sessions. If the busy_##n##_ports values are wrong it could in > principle happen that the session deletion is skipped when there were > actually some sessions that needed deleting. Perhaps rare and perhaps > resulting in nothing worse than a small memory leak, but anyway. > > One effect of this is that there can be an inconsistency, if we were to > sum up the busy_##n##_ports_per_thread values for all threads, that > should be equal to busy_##n##_ports but due to this issue there could > be a difference, because while the busy_##n##_ports_per_thread values > are correct the busy_##n##_ports values may have been corupted due to > the race condition mentioned above. > > Not sure if the above is a problem in practice, my main motivation for > reporting this is that it confuses me when I am trying to understand > how te code works in order to do some modifications. Either the code is > not thread safe there, or I have misunderstood things. > > What do you think, is it an issue? > If not, what have I missed? > > (This is not an April fools' joke, I really am this pedantic) > > Best regards, > Elias > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19088): https://lists.fd.io/g/vpp-dev/message/19088 Mute This Topic: https://lists.fd.io/mt/81773552/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-