On Mon, Jun 30, 2014 at 02:59:45PM -0700, Ben Pfaff wrote: > The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the > following code to refill the Netlink socket buffer with more dump skbs > while a dump is in progress: > > if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { > ret = netlink_dump(sk); > if (ret) { > sk->sk_err = ret; > sk->sk_error_report(sk); > } > } > > The netlink_dump() function that this calls returns a negative number on > error, the convention used throughout the kernel, and thus sk->sk_err > receives a negative value on error. > > However, sk->sk_err is supposed to contain either 0 or a positive errno > value, as one can see from a quick "grep" through net for 'sk_err =', e.g.: > > ipv4/tcp.c:2067: sk->sk_err = ECONNRESET; > ipv4/tcp.c:2069: sk->sk_err = ECONNRESET; > ipv4/tcp_input.c:4106: sk->sk_err = ECONNREFUSED; > ipv4/tcp_input.c:4109: sk->sk_err = EPIPE; > ipv4/tcp_input.c:4114: sk->sk_err = ECONNRESET; > netlink/af_netlink.c:741: sk->sk_err = ENOBUFS; > netlink/af_netlink.c:1796: sk->sk_err = ENOBUFS; > packet/af_packet.c:2476: sk->sk_err = ENETDOWN; > unix/af_unix.c:341: other->sk_err = ECONNRESET; > unix/af_unix.c:407: skpair->sk_err = > ECONNRESET; > > The result is that the next attempt to receive from the socket will return > the error to userspace with the wrong sign. > > (The root of the error in this case is that multiple threads are attempting > to read a single flow dump from a shared fd. That should work, but the > kernel has an internal race that can result in one or more of those threads > hitting the EINVAL case at the start of netlink_dump(). The EINVAL is > harmless in this case and userspace should be able to ignore it, but > reporting the EINVAL as if it were a 22-byte message received in userspace > throws a real wrench in the works.) > > This bug makes me think that there are probably not many programs doing > multithreaded Netlink dumps. Maybe it is good that we are considering > other approaches. > > VMware-BZ: #1255704 > Reported-by: Mihir Gangar <gang...@vmware.com> > Signed-off-by: Ben Pfaff <b...@nicira.com>
Alex acked this off-list, so I applied it to master, branch-2.3, and branch-2.2. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev