On Friday, September 04, 2015 05:14:54 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd ...

Even if this doesn't completely solve the problem, I like the extra reporting 
and robustness of this change.  Some comments inline ...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..4ee114a 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb)
>       audit_hold_skb(skb);
>  }
> 
> +static char *audit_strerror(int err)
> +{
> +     switch (err) {
> +     case -ECONNREFUSED:
> +             return "ECONNREFUSED";
> +     case -EPERM:
> +             return "EPERM";
> +     case -ENOMEM:
> +             return "ENOMEM";
> +     case -EAGAIN:
> +             return "EAGAIN";
> +     case -ERESTARTSYS:
> +             return "ERESTARTSYS";
> +     case -EINTR:
> +             return "EINTR";
> +     default:
> +             return "(other)";
> +     }
> +}

See comments below.

>  static void kauditd_send_skb(struct sk_buff *skb)
>  {
>       int err;
> +     int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
>       /* take a reference in case we can't send it and we want to hold it */
>       skb_get(skb);

Should the restart label go after the skb_get() call?  It seems like if we 
ever jump to restart we could end up needlessly bumping the skb's refcnt.

>       err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
>       if (err < 0) {
>               BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> +             pr_err("netlink_unicast sending to audit_pid=%d returned error: 
> %d, 
%s\n"
> +                    , audit_pid, err, audit_strerror(err));

This is a style nit, but please put the comma on the preceding line.  I know 
why you did it, but it bothers me.

I'm also debating if audit_strerror() is worth it.  I agree with you that it 
is a good idea to indicate the specific error code, I'm just not sure we need 
to bother translating that into a proper errno name.  Can you think of some 
reason why we would need the errno name as opposed to the integer?

>               if (audit_pid) {
> -                     pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> -                     audit_log_lost("auditd disappeared");
> -                     audit_pid = 0;
> -                     audit_sock = NULL;
> +                     if (err == -ECONNREFUSED || err == -EPERM
> +                         || ++attempts >= AUDITD_RETRIES) {
> +                             audit_log_lost("audit_pid=%d reset");
> +                             audit_pid = 0;
> +                             audit_sock = NULL;
> +                     } else {
> +                             pr_warn("re-scheduling(#%d) write to 
> audit_pid=%d\n"
> +                                     , attempts, audit_pid);

Same thing with the comma.

> +                             set_current_state(TASK_INTERRUPTIBLE);
> +                             schedule();
> +                             __set_current_state(TASK_RUNNING);
> +                             goto restart;

See my comment above about the skb's reference count.

> +                     }
>               }
>               /* we might get lucky and get this in the next auditd */
>               audit_hold_skb(skb);

-- 
paul moore
www.paul-moore.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to