2013/10/1 Frederic Weisbecker <fweis...@gmail.com>: > On Wed, Aug 21, 2013 at 06:41:46PM +0200, Oleg Nesterov wrote: >> On 08/21, Peter Zijlstra wrote: >> > >> > The other consideration is that this adds two branches to the normal >> > schedule path. I really don't know what the regular ratio between >> > schedule() and io_schedule() is -- and I suspect it can very much depend >> > on workload -- but it might be a net loss due to that, even if it makes >> > io_schedule() 'lots' cheaper. >> >> Yes, agreed. Please ignore it for now, I didn't try to actually suggest >> this change. And even if this is fine perfomance wise, this needs some >> benchmarking. >> >> Well. actually I have a vague feeling that _perhaps_ this change can >> help to solve other problems we are discussing, but I am not sure and >> right now I can't even explain the idea to me. >> >> In short: please ignore ;) >> >> Oleg. >> > > Ok, the discussion is hard to synthesize but I think I now have more > clues to send a better new iteration. > > So we have the choice between keeping atomics or using rq->lock. It seems > that using rq->lock is going to be worrisome for several reasons. So let's > stick to atomics (but tell me if you prefer the other way). > > So the idea for the next try is to do something along the lines of: > > struct cpu_idletime { > nr_iowait, > seqlock, > idle_start, > idle_time, > iowait_time, > } __cacheline_aligned_in_smp; > > DEFINE_PER_CPU(struct cpu_idletime, cpu_idletime); > > io_schedule() > { > int prev_cpu; > > preempt_disable(); > prev_cpu_idletime = __this_cpu_ptr(&cpu_idletime); > atomic_inc(prev_cpu_idletime->nr_iowait); > WARN_ON_ONCE(is_idle_task(current)); > preempt_enable_no_resched(); > > schedule(); > > write_seqlock(prev_cpu_idletime->seqlock) > if (!atomic_dec_return(prev_cpu_idletime->nr_iowait)) > flush_cpu_idle_time(prev_cpu_idletime, 1)
I forgot... cpu_idletime->idle_start; after the update. Also now I wonder if we actually should lock the inc part. Otherwise it may be hard to get the readers right... Thanks. > write_sequnlock(prev_cpu_idletime->seqlock) > > } > > flush_cpu_idle_time(cpu_idletime, iowait) > { > if (!cpu_idletime->idle_start) > return; > > if (nr_iowait) > cpu_idletime->iowait_time = NOW() - cpu_idletime->idle_start; > else > cpu_idletime->idle_time = NOW() - cpu_idletime->idle_start; > } > > idle_entry() > { > write_seqlock(this_cpu_idletime->seqlock) > this_cpu_idletime->idle_start = NOW(); > write_sequnlock(iowait(cpu)->seqlock) > } > > idle_exit() > { > write_seqlock(this_cpu_idletime->seqlock) > flush_cpu_idle_time(this_cpu_idletime, > atomic_read(&this_cpu_idletime->nr_iowait)); > this_cpu_idletime->idle_start = 0; > write_sequnlock(this_cpu_idletime->seqlock) > } > > > Now this all realy on the fact that atomic_inc(cpu_idletime->nr_iowait) can't > happen > in a CPU that is already idle. So it can't happen between idle_entry() and > idle_exit(). > Hence the WARN_ON(is_idle_task(current)) below after the inc. > > Hmm? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/