On Wed, Jan 25, 2023 at 12:04:14AM +0100, Olivier Houchard wrote:
> >   0x00000af892c770b0 <conn_backend_get+1136>:   mov    %r12,%rdi
> >   0x00000af892c770b3 <conn_backend_get+1139>:   callq  0xaf892c24e40 
> > <srv_lookup_conn_next>
> >   0x00000af892c770b8 <conn_backend_get+1144>:   mov    %rax,%r12
> >   0x00000af892c770bb <conn_backend_get+1147>:   test   %rax,%rax
> >   0x00000af892c770be <conn_backend_get+1150>:   je     0xaf892c770e5 
> > <conn_backend_get+1189>
> >   0x00000af892c770c0 <conn_backend_get+1152>:   mov    0x18(%r12),%rax
> > =>0x00000af892c770c5 <conn_backend_get+1157>:   mov    0xa0(%rax),%r11
> >   0x00000af892c770cc <conn_backend_get+1164>:   test   %r11,%r11
> >   0x00000af892c770cf <conn_backend_get+1167>:   je     0xaf892c770b0 
> > <conn_backend_get+1136>
> >   0x00000af892c770d1 <conn_backend_get+1169>:   mov    %r12,%rdi
> >   0x00000af892c770d4 <conn_backend_get+1172>:   mov    %r13d,%esi
> > 
> >   1229                          conn = 
> > srv_lookup_conn(&srv->per_thr[i].safe_conns, hash);
> >   1230                          while (conn) {
> >   1231        ======>>>                 if (conn->mux->takeover && 
> > conn->mux->takeover(conn, i) == 0) {
> >                                             ^^^^^^^^^^^
> > 
> > It's here. %rax==0, which is conn->mux when we're trying to dereference
> > it to retrieve ->takeover. I can't see how it's possible to have a NULL
> > mux here in an idle connection since they're placed there by the muxes
> > themselves. But maybe there's a tiny race somewhere that's impossible to
> > reach except under such an extreme contention between the two sockets.
> > I really have no idea regarding this one for now, maybe Olivier could
> > spot a race here ? Maybe we're missing a barrier somewhere.
>  
> One way I can see that happening, as unlikely as it may be, is if
> h1_takeover() is called, but fails to allocate a task or a tasklet, and
> will call h1_release(), which will set the mux to NULL and call
> conn_free().

:-)  thanks for the analysis!

I would find it strange that the task allocation fails, it would indicate
the machine is lacking RAM. Or maybe in order to avoid extreme lock
contention in malloc() on this platform when running on non-uniform
machines, they decided to use trylocks that can simply fail to report
memory instead of waiting forever. In this case we should be able to
reproduce the same issue by setting the fail-alloc fault injection rate
to a non-null value.

> It was previously ok to happen, because conn_free() would
> unconditionally remove the connection from any list, but since the idle
> connections now stay in a tree, and no longer in a list, that no longer
> happens.

I'm not sure why this has to be different, we could as well decide to
unconditionally remove it from the tree. I'll have a look and try to
figure why.

> Another possibility of h1_release() been called while still in the
> idle tree is if h1_wake() gets called, but the only way I see that
> happening is from mmux_stopping_process(), whatever that is, so that's
> unlikely to be the problem in this case.

Indeed. It's still good to keep this in mind though.

> Anyway, I suggest doing something along the line of the patch attached,
> and add a BUG_ON() in conn_free() to catch any freeing of connection
> still in the idle tree.

I agree, better crash where the problem is caused than where it hurts.
I'll take your patch, thanks!

Willy

Reply via email to