Hello VPP experts,

I think there is a thread safety issue in the NAT plugin regarding the
counter for busy ports.

Looking at this for the master branch now, there has been some
refactoring lately but the issue has anyway been there for a long time,
at least several VPP versions back, although filenames and function
names have changed.

Here I will take the endpoint-independent code in nat44-ei/nat44_ei.c
code because that is the part I am using, but it looks like a similar
issue is there for nat44-ed as well.

In the nat44_ei_alloc_default_cb() function in nat44_ei.c there is a
part that looks like this:

  --a->busy_##n##_port_refcounts[portnum];          \
  a->busy_##n##_ports_per_thread[thread_index]++;   \
  a->busy_##n##_ports++;                            \

where the variable "a" is an address (nat44_ei_address_t) that belongs
to the "addresses" in the global nat44_ei_main, so not thread-specific. 
As I understand it, different threads may be using the same "a" at the
same time.

At first sight it might seem like all those three lines are risky
because different threads can execute this code at the same time for
the same "a". However, the _port_refcounts[portnum] and
_ports_per_thread[thread_index] parts are actually okay to access
because the [portnum] and [thread_index] ensures that those lines only
access parts of those arrays that belong to thecurrent thread, that is
how the port number is selected.

So the first two lines there are fine, I think, but the third line,
incrementing a->busy_##n##_ports, can give a race condition when
different threads execute it at the same time. The same issue is also
there in other places where the busy_##n##_ports values are updated.

I think this is not critical because the busy_##n##_ports information
(that can be wrong because of this thread safety issue) is not used
very much. However those values are used in nat44_ei_del_address()
where it looks like this:

  /* Delete sessions using address */
  if (a->busy_tcp_ports || a->busy_udp_ports || a->busy_icmp_ports)
    {

and then inside that if-statement there is some code to delete those
sessions. If the busy_##n##_ports values are wrong it could in
principle happen that the session deletion is skipped when there were
actually some sessions that needed deleting. Perhaps rare and perhaps
resulting in nothing worse than a small memory leak, but anyway.

One effect of this is that there can be an inconsistency, if we were to
sum up the busy_##n##_ports_per_thread values for all threads, that
should be equal to busy_##n##_ports but due to this issue there could
be a difference, because while the busy_##n##_ports_per_thread values
are correct the busy_##n##_ports values may have been corupted due to
the race condition mentioned above.

Not sure if the above is a problem in practice, my main motivation for
reporting this is that it confuses me when I am trying to understand
how te code works in order to do some modifications. Either the code is
not thread safe there, or I have misunderstood things.

What do you think, is it an issue?
If not, what have I missed?

(This is not an April fools' joke, I really am this pedantic)

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19087): https://lists.fd.io/g/vpp-dev/message/19087
Mute This Topic: https://lists.fd.io/mt/81773552/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to