On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa <han...@stressinduktion.org> wrote: > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote: >> As far as I can tell, enabling IP_RECVERR causes the presence of a >> queued error to cause recvmsg, etc to return an error (once). It's >> worse, though: a new error can be queued asynchronously at any time, >> this setting sk_err to a nonzero value. How do I sensibly distinguish >> recvmsg failures to to genuine errors receiving messages from recvmsg >> failures because there's a queued error? >> >> The only way I can see to get reliable error handling is to literally >> call recvmsg in a loop: >> >> while (true /* or while POLLIN is set */) { >> int ret = recvmsg(..., MSG_ERRQUEUE not set); >> if (ret < 0 && /* what goes here? */) { >> whoops! this might be a harmless asynchronous error! >> take no action! >> } > > I see either two possibilities: > > We export the icmp_err_convert tables along with the udp_lib_err error > conversions to user space and spice them up with flags to mark if they > are transient (icmp_err_convert already has a fatal flag).
This seems overcomplicated. I'd rather have a flag I pass to tell the kernel that I don't want to see transient errors (nd that I'll clear them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR. > > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the > sock_extended_err, no? I do this already, which makes me think that there's a bug or another race somewhere. I've only seen a failure once in several years of operation. The failure happened on a ping socket. I suspect that the race is: ping_err: ip_icmp_error(...); user: recvmsg(MSG_ERRQUEUE) and dequeues the error. ping_err: sk_err = err; user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the error via sock_error. user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN. Now the user code thinks that it was a real (non-transient) error and aborts. Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE? Even if this race were fixed, this interface still sucks IMO. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html