On Tue, Jun 2, 2015, at 23:42, Hannes Frederic Sowa wrote: > On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote: > > On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa > > <han...@stressinduktion.org> wrote: > > > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote: > > > > [...] > > > > I do this already, which makes me think that there's a bug or another > > race somewhere. I've only seen a failure once in several years of > > operation. > > > > The failure happened on a ping socket. I suspect that the race is: > > > > ping_err: ip_icmp_error(...); > > > > user: recvmsg(MSG_ERRQUEUE) and dequeues the error. > > > > ping_err: sk_err = err; > > > > user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the > > error via sock_error. > > > > user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN. > > > > Now the user code thinks that it was a real (non-transient) error and > > aborts. > > > > Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE? > > Hmm, I don't think this will help. > > > Even if this race were fixed, this interface still sucks IMO. > > Yes. :/ > > My proposal would be to make the error conversion lazy: > > Keeping duplicate data is not a good idea in general: So we shouldn't > use sk->sk_err if IP_RECVERR is set at all but let sock_error just use > the sk_error_queue and extract the error code from there. > > Only if IP_RECVERR was not set, we use sk->sk_err logic. > > What do you think?
I just noticed that this will probably break existing user space applications which require that icmp errors are transient even with IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer and xchg the pointer, hmmm.... Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html