On Tue, 2015-06-09 at 14:34 +0100, David Woodhouse wrote: > On Wed, 2015-04-08 at 15:08 +0200, Johannes Berg wrote: > > On Wed, 2015-04-08 at 13:03 +0100, David Woodhouse wrote: > > > > > I'm not sure if this is entirely fixed. In Fedora 22 (4.0.0-rc5-git4) > > > I'm occasionally seeing glibc deadlock in __check_pf() on a netlink > > > recvmsg(), here: > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/check_pf.c;h=162606d7;hb=glibc-2.21#l166 > > > > > > As I understand it, this shouldn't happen. Even if messages are > > > dropped (which surely shouldn't happen as often as I'm seeing this), > > > glibc should get ENOBUFS from the recvmsg() call. > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1209433 > > > > > > I haven't bisected and proved that it *was* this commit which > > > introduced the problem, as it only happens after a day or two of > > > running Evolution and I haven't managed to trigger it more reliably. > > > > I don't see the connection to this change. > > > > The issue with my patch was that some code for NLM_F_DUMP would have > > this pattern: > > > > int fill_function(...) > > { > > ... > > return nlmsg_end(...); > > } > > > > loop (...) { > > if (fill_function() <= 0) > > break; /* continue in next dump */ > > } > > > > and that all had to be converted to be just "< 0" now. > > > > Additionally, the failure mode of this was the process running out of > > memory due to receiving the same results over and over again - does that > > happen for you? It seems it was stuck in recvmsg(), but that may just be > > a side effect of happening to interrupt at that point? > > > > I don't think the problem was introduced by your change. At > https://github.com/nahi/httpclient/issues/232 it seems to have been > observed even in November of last year. > > I've added some debugging, and it seems that when it deadlocks, glibc > doesn't get *any* response to its RTM_GETADDR request. I know we'd get > ENOBUFS is a *response* was dropped... but what about when the request > itself is dropped? Does userspace get any hint of that? Is this purely > a glibc bug, for assuming its request got delivered and unconditionally > waiting for a response? > > I don't know why it suddenly started happening to me in the 4.0 kernel > when I'd never seen it before, but it's still happening. I've put a > poll() in the glibc code (referenced above), and made it fail after a 5 > -second timeout. That will at least prevent me from throwing my > computer out the window for the time being... >
Please check that this patch fixes your issue : http://patchwork.ozlabs.org/patch/473041/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html