> > -static void __packet_set_status(struct packet_sock *po, void *frame, int > > status) > > +static void __packet_set_status(struct packet_sock *po, void *frame, int > > status, > > + bool call_complete) > > { > > union tpacket_uhdr h; > > > > @@ -381,6 +382,8 @@ static void __packet_set_status(struct packet_sock *po, > > void *frame, int status) > > BUG(); > > } > > > > + if (po->wait_on_complete && call_complete) > > + complete(&po->skb_completion); > > This wake need not happen before the barrier. Only one caller of > __packet_set_status passes call_complete (tpacket_destruct_skb). > Moving this branch to the caller avoids a lot of code churn. > > Also, multiple packets may be released before the process is awoken. > The process will block until packet_read_pending drops to zero. Can > defer the wait_on_complete to that one instance.
Eh no. The point of having this sleep in the send loop is that additional slots may be released for transmission (flipped to TP_STATUS_SEND_REQUEST) from another thread while this thread is waiting. Else, it would have been much simpler to move the wait below the send loop: send as many packets as possible, then wait for all of them having been released. Much clearer control flow. Where to set and clear the wait_on_complete boolean remains. Integer assignment is fragile, as the compiler and processor may optimize or move simple seemingly independent operations. As complete() takes a spinlock, avoiding that in the DONTWAIT case is worthwhile. But probably still preferable to set when beginning waiting and clear when calling complete.