On Tue, Feb 11, 2014 at 10:29:36AM +0100, Hans Dedecker wrote:
> Disable netlink auto ack when doing a delete in the get callback
> handler to avoid race conditions resulting into stalled message
> on the netlink socket.
> 
> Solves issue reported in https://dev.openwrt.org/ticket/14590
> 
> Signed-off-by: Karl Vogel <karl.vo...@gmail.com>
> Acked-by: Hans Dedecker <dedec...@gmail.com>
> ---
>  system-linux.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/system-linux.c b/system-linux.c
> index db78240..e1b9924 100644
> --- a/system-linux.c
> +++ b/system-linux.c
> @@ -456,8 +456,9 @@ static int cb_clear_event(struct nl_msg *msg, void *arg)
>       hdr->nlmsg_type = type;
>       hdr->nlmsg_flags = NLM_F_REQUEST;
>  
> -     if (!nl_send_auto_complete(sock_rtnl, clr->msg))
> -             nl_wait_for_ack(sock_rtnl);
> +     nl_socket_disable_auto_ack(sock_rtnl);
> +     nl_send_auto_complete(sock_rtnl, clr->msg);
> +     nl_socket_enable_auto_ack(sock_rtnl);
>  
>       return NL_SKIP;
>  }
> -- 
> 1.7.1
> 

Just some more background information regarding this..

The current code had 2 issues. First issue was an incorrect return
code check, nl_send_auto_complete() returns the number of bytes
send or a negative value for an error. Therefor the wait for
ack nl_wait_for_ack() was never called in the original code.

The second issue is that the incoming netlink message is a
multipart message, so if the nl_wait_for_ack() was called, it
would get the next part of the multipart message, instead of
the actual ACK.

Since the original code never called the wait for ack, it didn't
run into this issue, but it did receive a stray ACK further down
the road, which caused other issues (like the netlink socket
not being totally read out, hence why the /proc/net/netlink Rmem
column showed outstanding buffered message on the socket).

This patch disables the auto ack generation for the delete, since
the ack isn't really useful (what would we do if the delete fails
anyway?). The only other way to work around this, would be to use
another netlink socket to use in the callback, but that requires
alot more resources.
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

Reply via email to