On Fri, Jun 28, 2019 at 10:53 AM Neil Horman <nhor...@tuxdriver.com> wrote: > > The AF_PACKET protocol, when running as a memory mapped socket uses a > pending frame counter to track the number of skbs in flight during > transmission. It is incremented during the sendmsg call (via > tpacket_snd), and decremented (possibly asynchronously) in the skb > desructor during skb_free. > > The counter is currently implemented as a percpu variable for each open > socket, but for reads (via packet_read_pending), we iterate over every > cpu variable, accumulating the total pending count. > > Given that the socket transmit path is an exclusive path (locked via the > pg_vec_lock mutex), we do not have the ability to increment this counter > on multiple cpus in parallel. This implementation also seems to have > the potential to be broken, in that, should an skb be freed on a cpu > other than the one that it was initially transmitted on, we may > decrement a counter that was not initially incremented, leading to > underflow. > > As such, adjust the packet socket struct to convert the per-cpu counter > to an atomic_t variable (to enforce consistency between the send path > and the skb free path). This saves us some space in the packet_sock > structure, prevents the possibility of underflow, and should reduce the > run time of packet_read_pending, as we only need to read a single > variable, instead of having to loop over every available cpu variable > instance. > > Tested by myself by running a small program which sends frames via > AF_PACKET on multiple cpus in parallel, with good results. > > Signed-off-by: Neil Horman <nhor...@tuxdriver.com> > CC: "David S. Miller" <da...@davemloft.net> > CC: Willem de Bruijn <will...@google.com> > ---
This essentially is a revert of commit b013840810c2 ("packet: use percpu mmap tx frame pending refcount"). That has some benchmark numbers and also discusses the overflow issue. I think more interesting would be to eschew the counter when MSG_DONTWAIT, as it is only used to know when to exit the loop if need_wait. But, IMHO packet sockets are deprecated in favor of AF_XDP and should be limited to bug fixes at this point. > net/packet/af_packet.c | 40 +++++----------------------------------- > net/packet/internal.h | 2 +- > 2 files changed, 6 insertions(+), 36 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 8d54f3047768..25ffb486fac9 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -1154,43 +1154,17 @@ static void packet_increment_head(struct > packet_ring_buffer *buff) > > static void packet_inc_pending(struct packet_ring_buffer *rb) > { > - this_cpu_inc(*rb->pending_refcnt); > + atomic_inc(&rb->pending_refcnt); > } If making this change, can also remove these helper functions. The layer of indirection just hinders readability.