Hey net devs, I would like some clarity on a problem I ran into last week. I was diagnosing a DNS issue last week and got very side tracked by how netstat reported stats to me. My issue was that UDP packets were being dropped by all UDP sockets on the host, so when I ran `nestat -naus` and it informed me that UdpInErrors (https://elixir.bootlin.com/linux/v5.4-rc2/source/include/uapi/linux/snmp.h#L156) was my main problem I spent a day trying to figure out what application/mechanism was dropping UDP packets on the host. My suspicion, based on the statistic I was seeing, was that it was going to be something like BPF or a security module. To be fair to me, these two mechanisms do indeed report their drops within this statistic (https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L2051). Imagine my surprise when I discovered that the error that was actually happening, was that the global UDP socket min was being reached, and all the host UDP sockets were, indeed, experiencing buffer errors. The problem is that wihtin the regular UDP socket datapath `UDP_MIB_RCVBUFERRORS` only seem to be set here (https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L1945) when the error is "ENOMEM". However, when `__sk_mem_raise_allocated` fails (https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L1455) it reports "ENOBUF". The issue ended up being an application that was not processing it's backlog, because it wasn't closing old UDP sockets. IMO, I would have gotten to this dianosis quicker if when I ran `nestat -naus` I had gotten UdpRcvBuffErrors (`UDP_MIB_RCVBUFERRORS`) instead of UdpInErrors. I realize that it is too late to change this error reporting now, because it would break user space, but I think a new error could be added to the kernel for UDP, such as UdpRcvBuffGlobalErrors, or something like that, which could be double reported. I think this would be a real time saver for folks, because I really think UdpInErrors is counter-intuitively incorrect.
Thanks, Nate Sweet