When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on the netlink socket. If the wait timeout fully expires (timeo == 0), netlink mistakenly interprets the zeroed timeout as a non-blocking request. It then triggers netlink_overrun that drops the event, completely bypassing the audit subsystem's internal retry queue, and falsely returns ENOBUFS to user-space, resulting in the following error:
auditd[]: Error receiving audit netlink packet (No buffer space available) Fix this by detecting when a blocking sender's timeout has expired (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead of retrying with timeo=0 (which would incorrectly trigger netlink_overrun on the next iteration), safely free the skb and return -EAGAIN, allowing the audit subsystem to gracefully enqueue the pending event into its internal backlog. Suggested-by: Steve Grubb <[email protected]> Suggested-by: Paul Moore <[email protected]> Signed-off-by: Ricardo Robaina <[email protected]> --- Changes in v2: - Use the simple check (timeo == 0 && !nonblock) to detect expired timeout, avoiding adding a new NETLINK flag. net/netlink/af_netlink.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2aeb0680807d..fdc3db74b178 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb, } err = netlink_attachskb(sk, skb, &timeo, ssk); - if (err == 1) + if (err == 1) { + /* timeo may have been zeroed by schedule_timeout inside + * netlink_attachskb. If the caller is a timed-blocking sender + * (not genuinely nonblocking), don't re-enter with timeo=0 as + * that would misfire netlink_overrun on the next iteration. + */ + if (timeo == 0 && !nonblock) { + kfree_skb(skb); + return -EAGAIN; + } goto retry; + } if (err) return err; -- 2.53.0

