Tom Herbert wrote on Thu, Feb 14, 2019: > On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet > <asmad...@codewreck.org> wrote: > > Yes, the parser fails with -ENOMEM ; that is not handled gracefully at > > all: from a user point of view, the connection just hangs (recvmsg never > > returns), without so much as a message in dmesg either. > > That's by design. Here is the description in kcm.txt: > > "When a TCP socket is attached to a KCM multiplexor data ready > (POLLIN) and write space available (POLLOUT) events are handled by the > multiplexor. If there is a state change (disconnection) or other error > on a TCP socket, an error is posted on the TCP socket so that a > POLLERR event happens and KCM discontinues using the socket. When the > application gets the error notification for a TCP socket, it should > unattach the socket from KCM and then handle the error condition (the > typical response is to close the socket and create a new connection if > necessary)."
Sigh, that's what I get for relying on pieces of code found on the internet. It does make "trivial" kcm sockets difficult to use though, the old ganesha code I adapted to kcm was the horrible (naive?) kind spawning one thread per connection and just waiting on read(), so making it just create a kcm socket from the tcp one and wait on recvmsg() until an error pops up does not seem an option. That being said I'm a bit confused, I thought part of the point of kcm was the multiplexing so a simple server could just keep attaching tcp sockets to a single kcm socket and only have a single trivial loop reading from the kcm socket ; but I guess there's no harm in also looking for POLLERR on the tcp sockets... It would still need to close them once clients disconnect I guess, this really only affects my naive server. > > strparser might be able to retry somehow. > > We could retry on -ENOMEM since it is potentially a transient error, Yes, I think we should aim to retry on -ENOMEM; I agree all errors are not meant to be retried on but there already are different cases based on what the parse cb returned; but that can be a different patch. > but generally for errors (like connection is simply broken) it seems > like it's working as intended. I suppose we could add a socket option > to fail the KCM socket when all attached lower sockets have failed, > but I not sure that would be a significant benefit for additional > complexity. Right, I agree it's probably not worth doing, now I'm aware of this I can probably motivate myself to change this old code to use epoll properly. I think it could make sense to have this option for simpler programs, but as things stand I guess we can require kcm users to do this much, thanks for the explanation, and sorry for having failed to notice it. With all that said I guess my patch should work correctly then, I'll try to find some time to check the error does come back up the tcp socket in my reproducer but I have no reason to believe it doesn't. I'd like to see some retry on ENOMEM before this is merged though, so while I'm there I'll resend this with a second patch doing that retry,.. I think just not setting strp->interrupted and not reporting the error up might be enough? Will have to try either way. Thanks for the feedback, -- Dominique