On Wed, Jan 25, 2023 at 12:04:14AM +0100, Olivier Houchard wrote: > > 0x00000af892c770b0 <conn_backend_get+1136>: mov %r12,%rdi > > 0x00000af892c770b3 <conn_backend_get+1139>: callq 0xaf892c24e40 > > <srv_lookup_conn_next> > > 0x00000af892c770b8 <conn_backend_get+1144>: mov %rax,%r12 > > 0x00000af892c770bb <conn_backend_get+1147>: test %rax,%rax > > 0x00000af892c770be <conn_backend_get+1150>: je 0xaf892c770e5 > > <conn_backend_get+1189> > > 0x00000af892c770c0 <conn_backend_get+1152>: mov 0x18(%r12),%rax > > =>0x00000af892c770c5 <conn_backend_get+1157>: mov 0xa0(%rax),%r11 > > 0x00000af892c770cc <conn_backend_get+1164>: test %r11,%r11 > > 0x00000af892c770cf <conn_backend_get+1167>: je 0xaf892c770b0 > > <conn_backend_get+1136> > > 0x00000af892c770d1 <conn_backend_get+1169>: mov %r12,%rdi > > 0x00000af892c770d4 <conn_backend_get+1172>: mov %r13d,%esi > > > > 1229 conn = > > srv_lookup_conn(&srv->per_thr[i].safe_conns, hash); > > 1230 while (conn) { > > 1231 ======>>> if (conn->mux->takeover && > > conn->mux->takeover(conn, i) == 0) { > > ^^^^^^^^^^^ > > > > It's here. %rax==0, which is conn->mux when we're trying to dereference > > it to retrieve ->takeover. I can't see how it's possible to have a NULL > > mux here in an idle connection since they're placed there by the muxes > > themselves. But maybe there's a tiny race somewhere that's impossible to > > reach except under such an extreme contention between the two sockets. > > I really have no idea regarding this one for now, maybe Olivier could > > spot a race here ? Maybe we're missing a barrier somewhere. > > One way I can see that happening, as unlikely as it may be, is if > h1_takeover() is called, but fails to allocate a task or a tasklet, and > will call h1_release(), which will set the mux to NULL and call > conn_free().
:-) thanks for the analysis! I would find it strange that the task allocation fails, it would indicate the machine is lacking RAM. Or maybe in order to avoid extreme lock contention in malloc() on this platform when running on non-uniform machines, they decided to use trylocks that can simply fail to report memory instead of waiting forever. In this case we should be able to reproduce the same issue by setting the fail-alloc fault injection rate to a non-null value. > It was previously ok to happen, because conn_free() would > unconditionally remove the connection from any list, but since the idle > connections now stay in a tree, and no longer in a list, that no longer > happens. I'm not sure why this has to be different, we could as well decide to unconditionally remove it from the tree. I'll have a look and try to figure why. > Another possibility of h1_release() been called while still in the > idle tree is if h1_wake() gets called, but the only way I see that > happening is from mmux_stopping_process(), whatever that is, so that's > unlikely to be the problem in this case. Indeed. It's still good to keep this in mind though. > Anyway, I suggest doing something along the line of the patch attached, > and add a BUG_ON() in conn_free() to catch any freeing of connection > still in the idle tree. I agree, better crash where the problem is caused than where it hurts. I'll take your patch, thanks! Willy