On Thu, 2014-02-27 at 10:40 +0100, Peter Zijlstra wrote: > On Mon, Feb 24, 2014 at 09:06:51AM +0100, Mike Galbraith wrote: > > Hi Peter, > > > > I wonder if the below makes sense for mainline. > > > > Background: I received some rather surprising news recently, a user of > > old 2.6.32 kernels regularly receive log spam stemming from old 208 day > > era warnings/protections inserted to prevent explosions from what was at > > the time unknown bad juju happening (but don't report logs that look > > like graffiti artist with an unlimited supply of spray paint gone mad). > > > > The kernel that emitted the below does NOT contain.. > > 9993bc63 sched/x86: Fix overflow in cyc2ns_offset > > ..though these folks use kexec fwtw. They're one of those "You update > > your kernel IFF world stops spinning" users, so will likely not be > > terribly interested in me making their boxen say BUG(), and may even be > > doing something naughty that induces it for all I know. > > > > In any case, NOT using nutty output from the intentionally racy function > > seems like a good plan no matter who or what makes weird unreproducible > > (elsewhere) sh*t happen. Wedging a bent 64 bit peg into 32 bit hole > > could make boom, on top of doing funny things to balancing. > > > > sched: don't use nutty scale_rt_power() output > > > > Boxen instructed to gripe if they see nutty cpu_power catch us > > trashing it while seriously dazed and confused for an unknown reason. > > > > Dec 18 05:50:56 kernel: [40091179.401405] update_group_power: cpu_power = > > 3148183471 > > Dec 18 05:51:01 /usr/sbin/cron[2279]: (root) CMD (/opt/blah/fix_cdr_bin.job > > >> /opt/blah/fix_cdr_bin.out 2>&1) > > Dec 18 05:51:06 kernel: [40091189.455713] update_cpu_power: cpu_power = > > 19495027282; scale_rt = 19495027282 > > Dec 18 05:51:16 kernel: [22076800.665578] update_cpu_power: cpu_power = > > 2671067611; scale_rt = 18428729677871137243 > > Dec 18 05:51:16 kernel: [40091199.188773] update_cpu_power: cpu_power = > > 2675064501; scale_rt = 18428729677875134133 > > > > Don't do that, make a scary warning instead. > > > > Yeah, I'm in two minds about that. Crappy clocks can make a whole lot of > missery. Then again, we usually guard against them going backwards. > > How about something like so? Most other sites don't complain about > clocks going backwards either, they just deal with it.
Yeah, better to warp protect scale_rt_power() directly. This small set of identical weird ass boxen should be reliable tsc. They jump back and forth in time by _exactly 208 days_, and do that straight from boot, and randomly thereafter. Wish I could get my hands on one of the things, but that ain't gonna happen. Those boxen have long uptimes, which proves you can survive with a sched clock that's going completely bonkers, which is kinda surprising to me. On a busy box, I'd expect some poor victim to eat the mother of all latency hits. > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5564,6 +5564,7 @@ static unsigned long scale_rt_power(int > { > struct rq *rq = cpu_rq(cpu); > u64 total, available, age_stamp, avg; > + s64 delta; > > /* > * Since we're reading these variables without serialization make sure > @@ -5572,7 +5573,11 @@ static unsigned long scale_rt_power(int > age_stamp = ACCESS_ONCE(rq->age_stamp); > avg = ACCESS_ONCE(rq->rt_avg); > > - total = sched_avg_period() + (rq_clock(rq) - age_stamp); > + delta = rq_clock(rq) - age_stamp; > + if (unlikely(delta < 0)) > + delta = 0; > + > + total = sched_avg_period() + delta; > > if (unlikely(total < avg)) { > /* Ensures that power won't end up being negative */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/