On Sun, Jun 23, 2019 at 10:39:12AM -0400, Willem de Bruijn wrote: > On Sun, Jun 23, 2019 at 7:40 AM Neil Horman <nhor...@tuxdriver.com> wrote: > > > > On Sat, Jun 22, 2019 at 10:21:31PM -0400, Willem de Bruijn wrote: > > > > > -static void __packet_set_status(struct packet_sock *po, void *frame, > > > > > int status) > > > > > +static void __packet_set_status(struct packet_sock *po, void *frame, > > > > > int status, > > > > > + bool call_complete) > > > > > { > > > > > union tpacket_uhdr h; > > > > > > > > > > @@ -381,6 +382,8 @@ static void __packet_set_status(struct > > > > > packet_sock *po, void *frame, int status) > > > > > BUG(); > > > > > } > > > > > > > > > > + if (po->wait_on_complete && call_complete) > > > > > + complete(&po->skb_completion); > > > > > > > > This wake need not happen before the barrier. Only one caller of > > > > __packet_set_status passes call_complete (tpacket_destruct_skb). > > > > Moving this branch to the caller avoids a lot of code churn. > > > > > > > > Also, multiple packets may be released before the process is awoken. > > > > The process will block until packet_read_pending drops to zero. Can > > > > defer the wait_on_complete to that one instance. > > > > > > Eh no. The point of having this sleep in the send loop is that > > > additional slots may be released for transmission (flipped to > > > TP_STATUS_SEND_REQUEST) from another thread while this thread is > > > waiting. > > > > > Thats incorrect. The entirety of tpacket_snd is protected by a mutex. No > > other > > thread can alter the state of the frames in the vector from the kernel send > > path > > while this thread is waiting. > > I meant another user thread updating the memory mapped ring contents. > Yes, thats true, and if that happens, we will loop through this path again (the do..while section, picking up the next frame for transmit)
> > > Else, it would have been much simpler to move the wait below the send > > > loop: send as many packets as possible, then wait for all of them > > > having been released. Much clearer control flow. > > > > > Thats (almost) what happens now. The only difference is that with this > > implementation, the waiting thread has the opportunity to see if userspace > > has > > queued more frames for transmission during the wait period. We could > > potentially change that, but thats outside the scope of this fix. > > Agreed. I think the current, more complex, behavior was intentional. > We could still restructure to move it out of the loop and jump back. > But, yes, definitely out of scope for a fix. > Yes, it was, though based on your comments I've moved the wait_for_completion call to the bottom of the loop, so its only checked after we are guaranteed to have sent at least one frame. I think that makes the code a bit more legible. > > > Where to set and clear the wait_on_complete boolean remains. Integer > > > assignment is fragile, as the compiler and processor may optimize or > > > move simple seemingly independent operations. As complete() takes a > > > spinlock, avoiding that in the DONTWAIT case is worthwhile. But probably > > > still preferable to set when beginning waiting and clear when calling > > > complete. > > We avoid any call to wait_for_complete or complete already, based on the > > gating > > of the need_wait variable in tpacket_snd. If the transmitting thread > > doesn't > > set MSG_DONTWAIT in the flags of the msg structure, we will never set > > wait_for_complete, and so we will never manipulate the completion queue. > > But we don't know the state of this at tpacket_destruct_skb time without > wait_for_completion? > Sure we do, wait_for_complete is stored in the packet_sock structure, which is available and stable at the time tpacket_destruct_skb is called. po->wait_for_complete is set in tpacket_snd iff: 1) The MSG_DONTWAIT flag is clear and 2) We have detected that the next frame in the memory mapped buffer does not have its status set to TP_STATUS_SEND_REQUEST. If those two conditions are true, we set po->wait_for_complete to 1, which indicates that tpacket_destruct_skb should call complete, when all the frames we've sent to the physical layer have been freed (i.e. when packet_read_pending is zero). If wait_for_complete is non-zero, we also can be confident that the calling task is either: a) Already blocking on wait_for_completion_interruptible_timeout or b) Will be waiting on it shortly In case (a) the blocking/transmitting task will be woken up, and continue on its way In case (b) the transmitting task will call wait_for_completion_interruptible_timeout, see that the completion has already been called (based on the completion structs done variable being positive), and return immediately. I've made a slight update to the logic/comments in my next version to make that a little more clear Neil