On Thu, Sep 24, 2015 at 05:29:34PM -0700, Steven Schlansker wrote:
> Hello linux-kernel,
> 
> I write to you on behalf of many developers at my company, who
> are having trouble with their applications endlessly locking up
> inside of libc code, with no hope of recovery.
> 
> Currently it affects our Mono and Node processes mostly, and the
> symptoms are the same:  user code invokes getaddrinfo, and libc
> attempts to determine whether ipv4 or ipv6 is appropriate, by using
> the RTM_GETADDR netlink message.  The write into the netlink socket
> succeeds, and it immediately reads back the results ... and waits
> forever.  The read never returns.  The stack looks like this:
> 
> #0  0x00007fd7d8d214ad in recvmsg () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00007fd7d8d3e44d in make_request (fd=fd@entry=13, pid=1) at 
> ../sysdeps/unix/sysv/linux/check_pf.c:177
> #2  0x00007fd7d8d3e9a4 in __check_pf 
> (seen_ipv4=seen_ipv4@entry=0x7fd7d37fdd00, 
> seen_ipv6=seen_ipv6@entry=0x7fd7d37fdd10, 
>     in6ai=in6ai@entry=0x7fd7d37fdd40, in6ailen=in6ailen@entry=0x7fd7d37fdd50) 
> at ../sysdeps/unix/sysv/linux/check_pf.c:341
> #3  0x00007fd7d8cf64e1 in __GI_getaddrinfo (name=0x31216e0 
> "mesos-slave4-prod-uswest2.otsql.opentable.com", service=0x0, 
>     hints=0x31216b0, pai=0x31f09e8) at ../sysdeps/posix/getaddrinfo.c:2355
> #4  0x0000000000e101c8 in uv__getaddrinfo_work (w=0x31f09a0) at 
> ../deps/uv/src/unix/getaddrinfo.c:102
> #5  0x0000000000e09179 in worker (arg=<optimized out>) at 
> ../deps/uv/src/threadpool.c:91
> #6  0x0000000000e16eb1 in uv__thread_start (arg=<optimized out>) at 
> ../deps/uv/src/unix/thread.c:49
> #7  0x00007fd7d8ff3182 in start_thread (arg=0x7fd7d37fe700) at 
> pthread_create.c:312
> #8  0x00007fd7d8d2047d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> (libuv is part of Node and makes DNS lookups "asynchronous" by having
> a thread pool in the background working)
> 
> The applications will run for hours or days successfully, until eventually 
> hanging with
> no apparent pattern or cause.  And once this hang happens it hangs badly, 
> because
> check_pf is holding a lock during the problematic recvmsg call.
> 
> I raised this issue on the libc-help mailing list, but I'm hoping that lkml 
> will
> have a higher number of people familiar with netlink that may better offer 
> advice.
> The original thread is here:
> https://sourceware.org/ml/libc-help/2015-09/msg00014.html
> 
> Looking at the getaddrinfo / check_pf source code:
> https://fossies.org/dox/glibc-2.22/sysdeps_2unix_2sysv_2linux_2check__pf_8c_source.html
> 
> 146  if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
> 147      (struct sockaddr *) &nladdr,
> 148      sizeof (nladdr))) < 0)
> 149    goto out_fail;
> 150 
> 151  bool done = false;
> 152 
> 153  bool seen_ipv4 = false;
> 154  bool seen_ipv6 = false;
> 155 
> 156  do
> 157  {
> 158    struct msghdr msg =
> 159    {
> 160      (void *) &nladdr, sizeof (nladdr),
> 161      &iov, 1,
> 162      NULL, 0,
> 163      0
> 164    };
> 165 
> 166  ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
> 167  if (read_len <= 0)
> 168    goto out_fail;
> 169 
> 170  if (msg.msg_flags & MSG_TRUNC)
> 171    goto out_fail;
> 172 
> 
> I notice that there is possibility that if messages are dropped either on send
> or receive side, maybe this code will hang forever?  The netlink(7) man page 
> makes
> me slightly worried:
> 
> > Netlink is not a reliable protocol.  It tries its best to deliver a message 
> > to its destination(s), but may drop messages when an out-of-memory  
> > condition  or  other error occurs.
> > However,  reliable  transmissions from kernel to user are impossible in any 
> > case.  The kernel can't send a netlink message if the socket buffer is 
> > full: the message will be dropped and the kernel and the user-space process 
> > will no longer have the same view of kernel state.  It is up to the 
> > application to detect when  this  happens (via the ENOBUFS error returned 
> > by recvmsg(2)) and resynchronize.
> 
> 
> I have taken the glibc code and created a simple(r) program to attempt to 
> reproduce this issue.
> I inserted some simple polling between the sendto and recvmsg calls to make 
> the failure case more evident:
> 
>       struct pollfd pfd;
>       pfd.fd = fd;
>       pfd.events = POLLIN;
>       pfd.revents = 0;
> 
>       int pollresult = poll(&pfd, 1, 1000);
>       if (pollresult < 0) {
>         perror("glibc: check_pf: poll");
>         abort();
>       } else if (pollresult == 0 || pfd.revents & POLLIN == 0) {
>         fprintf(stderr, "[%ld] glibc: check_pf: netlink socket read 
> timeout\n", gettid());
>         abort();
>       }
> 
> I have placed the full source code and strace output here:
> https://gist.github.com/stevenschlansker/6ad46c5ccb22bc4f3473
> 
> The process quickly sends off hundreds of threads which sit in a
> loop attempting this RTM_GETADDR message exchange.
> 
> The code may be compiled as "gcc -o pf_dump -pthread pf_dump.c"
> 
> An example invocation that quickly fails:
> 
> root@24bf2e440b5e:/# strace -ff -o pfd ./pf_dump 
> [3700] exit success
> glibc: check_pf: netlink socket read timeout
> Aborted (core dumped)
> 
> Interestingly, this seems to be very easy to reproduce using pthreads, but 
> much less
> common with fork() or clone()d threads.  I'm not sure if this is just an 
> artifact
> of how I am testing or an actual clue, but I figured I'd mention it.
> 
> I have tested this program on vanilla kernels 4.0.4 and 4.2.1 -- the 4.0.4 
> version
> reliably crashes, but I am having trouble reproducing on 4.2.1
> 
> So usually I would upgrade to 4.2.1 and be happy, except we ran into serious 
> problems
> with 4.1.2 and are now a little shy about upgrading:
> 
> https://bugzilla.xamarin.com/show_bug.cgi?id=29212
> 
> So my questions from here are:
> 
> * Is this glibc code correct?
> * What are the situations where a recvmsg from a netlink socket can hang as 
> it does here?
> * Is the potential "fix" in 4.2.1 due to any particular commit?  I checked 
> the changelogs and nothing caught my eye.
> 
> We'll be testing out 4.2.1 more thoroughly over the coming days but I am 
> hoping someone here
> can shed some light on our problem.
> 
http://comments.gmane.org/gmane.linux.network/363085

might explain your problem.

I thought this was resolved in 4.1, but it looks like the problem still persists
there. At least I have reports from my workplace that 4.1.6 and 4.1.7 are still
affected. I don't know if there have been any relevant changes in 4.2.

Copying Herbert and Eric for additional input.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to