On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote: > On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa > <han...@stressinduktion.org> wrote: > > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote: > >> As far as I can tell, enabling IP_RECVERR causes the presence of a > >> queued error to cause recvmsg, etc to return an error (once). It's > >> worse, though: a new error can be queued asynchronously at any time, > >> this setting sk_err to a nonzero value. How do I sensibly distinguish > >> recvmsg failures to to genuine errors receiving messages from recvmsg > >> failures because there's a queued error? > >> > >> The only way I can see to get reliable error handling is to literally > >> call recvmsg in a loop: > >> > >> while (true /* or while POLLIN is set */) { > >> int ret = recvmsg(..., MSG_ERRQUEUE not set); > >> if (ret < 0 && /* what goes here? */) { > >> whoops! this might be a harmless asynchronous error! > >> take no action! > >> } > > > > I see either two possibilities: > > > > We export the icmp_err_convert tables along with the udp_lib_err error > > conversions to user space and spice them up with flags to mark if they > > are transient (icmp_err_convert already has a fatal flag). > > This seems overcomplicated. I'd rather have a flag I pass to tell the > kernel that I don't want to see transient errors (nd that I'll clear > them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR. > > > > > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after > > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the > > sock_extended_err, no? > > I do this already, which makes me think that there's a bug or another > race somewhere. I've only seen a failure once in several years of > operation. > > The failure happened on a ping socket. I suspect that the race is: > > ping_err: ip_icmp_error(...); > > user: recvmsg(MSG_ERRQUEUE) and dequeues the error. > > ping_err: sk_err = err; > > user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the > error via sock_error. > > user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN. > > Now the user code thinks that it was a real (non-transient) error and > aborts. > > Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?
Hmm, I don't think this will help. > Even if this race were fixed, this interface still sucks IMO. Yes. :/ My proposal would be to make the error conversion lazy: Keeping duplicate data is not a good idea in general: So we shouldn't use sk->sk_err if IP_RECVERR is set at all but let sock_error just use the sk_error_queue and extract the error code from there. Only if IP_RECVERR was not set, we use sk->sk_err logic. What do you think? Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html