On Monday, September 07, 2015 05:10:13 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd when
> it is, in fact, still running.  When this happens, kernel syslogs show:
>       "audit: *NO* daemon at audit_pid=<pid>"
> although auditd is still running, and is apparently happy, listening on
> the netlink socket. The pid in the "*NO* daemon" message matches the pid
> of the running auditd process.  Restarting auditd solves this.
> 
> The problem appears to happen randomly, and doesn't seem to be strongly
> correlated to the rate of audit events being logged.  The problem
> happens fairly regularly (every few days), but not yet reproduced to
> order.
> 
> On production kernels, BUG_ON() is a no-op, so any error will trigger
> this.
> 
> Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
> orphaning a newer auditd startup") eliminates one possible cause.  This
> isn't the case here, since the PID in the error message and the PID of
> the running auditd match.
> 
> The primary expected cause of error here is -ECONNREFUSED when the audit
> daemon goes away, when netlink_getsockbyportid() can't find the auditd
> portid entry in the netlink audit table (or there is no receive
> function).  If -EPERM is returned, that situation isn't likely to be
> resolved in a timely fashion without administrator intervention.  In
> both cases, reset the audit_pid.  This does not rule out a race
> condition.  SELinux is expected to return zero since this isn't an INET
> or INET6 socket.  Other LSMs may have other return codes.  Log the error
> code for better diagnosis in the future.
> 
> In the case of -ENOMEM, the situation could be temporary, based on local
> or general availability of buffers.  -EAGAIN should never happen since
> the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
> -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
> expected to receive signals.  In these cases (or any other unexpected
> ones for now), report the error and re-schedule the thread, retrying up
> to 5 times.
> 
> v2:
>       Removed BUG_ON().
>       Moved comma in pr_*() statements.
>       Removed audit_strerror() text.
> 
> Reported-by: Vipin Rathor <v.rat...@gmail.com>
> Reported-by: <ctc...@hotmail.com>
> Signed-off-by: Richard Guy Briggs <r...@redhat.com>
> ---
>  kernel/audit.c |   24 +++++++++++++++++++-----
>  1 files changed, 19 insertions(+), 5 deletions(-)

Queued up for linux-audit#next as soon as 4.3-rc1 is released.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..18cdfe2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
>  static void kauditd_send_skb(struct sk_buff *skb)
>  {
>       int err;
> +     int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
>       /* take a reference in case we can't send it and we want to hold it */
>       skb_get(skb);
>       err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
>       if (err < 0) {
> -             BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> +             pr_err("netlink_unicast sending to audit_pid=%d returned error: 
> %d\n",
> +                    audit_pid, err);
>               if (audit_pid) {
> -                     pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> -                     audit_log_lost("auditd disappeared");
> -                     audit_pid = 0;
> -                     audit_sock = NULL;
> +                     if (err == -ECONNREFUSED || err == -EPERM
> +                         || ++attempts >= AUDITD_RETRIES) {
> +                             audit_log_lost("audit_pid=%d reset");
> +                             audit_pid = 0;
> +                             audit_sock = NULL;
> +                     } else {
> +                             pr_warn("re-scheduling(#%d) write to 
> audit_pid=%d\n",
> +                                     attempts, audit_pid);
> +                             set_current_state(TASK_INTERRUPTIBLE);
> +                             schedule();
> +                             __set_current_state(TASK_RUNNING);
> +                             goto restart;
> +                     }
>               }
>               /* we might get lucky and get this in the next auditd */
>               audit_hold_skb(skb);

-- 
paul moore
security @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to