On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet <asmad...@codewreck.org> wrote: > > Tom Herbert wrote on Thu, Feb 14, 2019: > > > This second patch[2] (the current thread) now does an extra clone if > > > there is an offset, but the problem really isn't in the clone but the > > > pull itself that can fail and return NULL when there is memory pressure. > > > For some reason I hadn't been able to reproduce that behaviour with > > > strparser doing the pull, but I assume it would also be possible to hit > > > in extreme situations, I'm not sure... > > > > This option looks the best to me, at least as a way to fix the issue > > without requiring a change to the API. If the pull fails, doesn't that > > just mean that the parser fails? Is there some behavior with this > > patch that is not being handled gracefully? > > Yes, the parser fails with -ENOMEM ; that is not handled gracefully at > all: from a user point of view, the connection just hangs (recvmsg never > returns), without so much as a message in dmesg either. > Dominique,
That's by design. Here is the description in kcm.txt: "When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and write space available (POLLOUT) events are handled by the multiplexor. If there is a state change (disconnection) or other error on a TCP socket, an error is posted on the TCP socket so that a POLLERR event happens and KCM discontinues using the socket. When the application gets the error notification for a TCP socket, it should unattach the socket from KCM and then handle the error condition (the typical response is to close the socket and create a new connection if necessary)." > It took me a while to figure out what failed exactly as I did indeed > expect strparser/kcm to handle that better, but ultimately as things > stand if the parser fails it calls strp_parser_err() with the error > which ends up in strp_abort_strp that should call > sk->sk_error_report(sk) but in kcm case sk is the csk and I guess > failing csk does not make a pending recv on the kcm sock to fail... > > I'm not sure whether propagating the error to the upper socket is the > right thing to do, kcm is meant to be able to work with multiple > underlying sockets so I feel we must be cautious about that, but Right, that's the motivation for the design. > strparser might be able to retry somehow. We could retry on -ENOMEM since it is potentially a transient error, but generally for errors (like connection is simply broken) it seems like it's working as intended. I suppose we could add a socket option to fail the KCM socket when all attached lower sockets have failed, but I not sure that would be a significant benefit for additional complexity. > This is what I said below: > > > [,,,] > > > - the current patch, that I could only get to fail with KASAN, but does > > > complexify kcm a bit; this also does not fix bpf sockmap at all. > > > Would still require to fix the hang, so make strparser retry a few times > > > if strp->cb.parse_msg failed maybe? Or at least get the error back to > > > userspace somehow. The error should be getting to userspace via the TCP socket. Tom > > Thanks, > -- > Dominique