On Friday, September 04, 2015 05:14:54 AM Richard Guy Briggs wrote: > There are several reports of the kernel losing contact with auditd ...
Even if this doesn't completely solve the problem, I like the extra reporting and robustness of this change. Some comments inline ... > diff --git a/kernel/audit.c b/kernel/audit.c > index 1c13e42..4ee114a 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb) > audit_hold_skb(skb); > } > > +static char *audit_strerror(int err) > +{ > + switch (err) { > + case -ECONNREFUSED: > + return "ECONNREFUSED"; > + case -EPERM: > + return "EPERM"; > + case -ENOMEM: > + return "ENOMEM"; > + case -EAGAIN: > + return "EAGAIN"; > + case -ERESTARTSYS: > + return "ERESTARTSYS"; > + case -EINTR: > + return "EINTR"; > + default: > + return "(other)"; > + } > +} See comments below. > static void kauditd_send_skb(struct sk_buff *skb) > { > int err; > + int attempts = 0; > +#define AUDITD_RETRIES 5 > + > +restart: > /* take a reference in case we can't send it and we want to hold it */ > skb_get(skb); Should the restart label go after the skb_get() call? It seems like if we ever jump to restart we could end up needlessly bumping the skb's refcnt. > err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0); > if (err < 0) { > BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */ > + pr_err("netlink_unicast sending to audit_pid=%d returned error: > %d, %s\n" > + , audit_pid, err, audit_strerror(err)); This is a style nit, but please put the comma on the preceding line. I know why you did it, but it bothers me. I'm also debating if audit_strerror() is worth it. I agree with you that it is a good idea to indicate the specific error code, I'm just not sure we need to bother translating that into a proper errno name. Can you think of some reason why we would need the errno name as opposed to the integer? > if (audit_pid) { > - pr_err("*NO* daemon at audit_pid=%d\n", audit_pid); > - audit_log_lost("auditd disappeared"); > - audit_pid = 0; > - audit_sock = NULL; > + if (err == -ECONNREFUSED || err == -EPERM > + || ++attempts >= AUDITD_RETRIES) { > + audit_log_lost("audit_pid=%d reset"); > + audit_pid = 0; > + audit_sock = NULL; > + } else { > + pr_warn("re-scheduling(#%d) write to > audit_pid=%d\n" > + , attempts, audit_pid); Same thing with the comma. > + set_current_state(TASK_INTERRUPTIBLE); > + schedule(); > + __set_current_state(TASK_RUNNING); > + goto restart; See my comment above about the skb's reference count. > + } > } > /* we might get lucky and get this in the next auditd */ > audit_hold_skb(skb); -- paul moore www.paul-moore.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/