On Thu, 2017-09-21 at 11:14 +0200, Paolo Abeni wrote: > Hi, > > Thank you for looking at it! > > On Wed, 2017-09-20 at 10:41 -0700, Eric Dumazet wrote: > > On Wed, 2017-09-20 at 18:54 +0200, Paolo Abeni wrote: > > > Noref sk do not carry a socket refcount, are valid > > > only inside the current RCU section and must be > > > explicitly cleared before exiting such section. > > > > > > They will be used in a later patch to allow early demux > > > without sock refcounting. > > > > > > > > > > > +/* dummy destructor used by noref sockets */ > > > +void sock_dummyfree(struct sk_buff *skb) > > > +{ > > > > BUG(); > > > > > +} > > > +EXPORT_SYMBOL(sock_dummyfree); > > > + > > We can call sock_dummyfree() in legitimate paths, see below, but we can > add a: > > WARN_ON_ONCE(!rcu_read_lock_held());
This wont be enough see below. > > here and in skb_clear_noref_sk(). That should help much to catch > possible bugs. > > > I do not see how you ensure we do not leave RCU section with an skb > > destructor pointing to this sock_dummyfree() > > > > This patch series looks quite dangerous to me. > > The idea is to explicitly clear the sknoref references before leaving > the RCU section. Quite alike what we currently do for dst noref, but > here the only place where we get a noref socket is the socket early > demux, thus the scope of this change is more limited to what we have > with noref dst_entries. > > The relevant code is in the next 2 patches; after the demux we preserve > the sknoref only if the skb has a local destination. The UDP socket > will then set the noref on early demux lookup, and the skb will either: > > * land on the corresponding UDP socket, the receive function will steal > the sknoref > * be dropped by some nft/iptables target - the dummy destructor is > called > * forwarded by some nft/iptables target outside the input path; we > clear the skref explicitly in such targets. > > Currently there are an handful of places affected, and we can simplify > the code dropping the early demux result for locally terminated > multicast sockets on a host acting as a multicast router, please see > the comment on the next patch. > > > Do we really have real applications using connected UDP sockets and > > wanting very high pps throughput ? > > The ultimate goal is to improve the unconnected UDP sockets scenario, > we do actually have use cases for that - DNS servers and VoIP SBCs. Unconnected UDP traffic does not use refcounting on sk _already_. And SO_REUSEPORT already allows us to handle all the traffic we want _already_. Please take a look at 71563f3414e917c62acd8e0fb0edf8ed6af63e4b This might tell you why I am so nervous about your changes. Checking WARN_ON_ONCE(!rcu_read_lock_held()); is not enough. rcu_read_lock() skb->destructor = sock_dummyfree; queue the packet into an intermediate queue. rcu_read_unlock(); .... rcu_read_lock() ... if (skb->sk && skb->sk->state == ...) // crash Also you covered IPv4, but really we need to forget about IPv4 and focus on IPv6 only. And _then_ take care of IPv4 compat.