Hi, On 22/11/14 18:09, Robert N. M. Watson wrote: > On 21 Nov 2014, at 17:40, Adrian Chadd <adr...@freebsd.org> wrote: >>>> Skimming through a bunch of hosts with moderately loaded hosts >>>> with reasonably high uptime I couldn't find one where >>>> net.inet.tcp.timer_race was not zero. A ny suggestions how to >>>> best reproduce the race(s) in tcp_timer.c? >>> >>> They would likely occur only on very highly loaded hosts, as they >>> require race conditions to arise between TCP timers and TCP >>> close. I think I did manage to reproduce it at one stage, and >>> left the counter in to see if we could spot it in production, and >>> I have had (multiple) reports of it in deployed systems. I'm not >>> sure it's worth trying to reproduce them, given that knowledge -- >>> we should simply fix them. >> >> Wasn't this just fixed by Julien @ Verisign? > > I don't believe so, although it's the kind of thing Julien is very > good at fixing! > > The issue here is that we can't call callout_drain() from contexts > where we finalise TCP connection close and attempt to free the inpcb. > The 'easy' fix is to create a taskqueue thread to do the > callout_drain() in the event that we discover that callout_stop() > isn't able to guarantee that pending callouts are neither in > execution nor scheduled. We'd then defer the very tail of TCP > teardown to that asynchronous context rather than trying to do it to > completion in the current (and rather more sensitive) one. This would > happen only very in frequently so have little overhead in practice, > although one would want to carefully look at the sync behaviour to > make sure it wasn't frequently enough that a backlog might build up.
Ironically, I was not aware of this discussion before we: - (re)discovered this TCP timers callout race condition - followed Robert's XXXRW comments and advices in source code - proposed a patch for fixing it: Fix TCP timers use-after-free old race conditions https://reviews.freebsd.org/D2079 If the proposed patch follows most of Robert's advices, instead of using a taskqueue thread calling callout_drain(), we re-use directly the callout API to schedule tcpcb's uma_zfree() at appropriate time. Register to the review if your are interested in this fix proposition. But still, nice irony to find this thread _after_ proposing a fix... :) -- Julien
signature.asc
Description: OpenPGP digital signature