On Mon, Sep 24, 2012 at 09:30:05AM -0700, Linus Torvalds wrote: > On Mon, Sep 24, 2012 at 9:12 AM, Peter Zijlstra <a.p.zijls...@chello.nl> > wrote: > > > > So we're looking for an idle cpu around @target. We prefer a cpu of an > > idle core, since SMT-siblings share L[12] cache. The way we do this is > > by iterating the topology tree downwards starting at the LLC (L3) cache > > level. Its groups are either the SMT-siblings or singleton groups. > > So if it'sally guaranteed to be SMT-siblings or singleton groups, then > the whole "for_each_cpu()" is a total disaster. That's a truly > expensive way to look up adjacent CPU's. Is there no saner way to look > up that thing? Like a simple circular list of SMT siblings (I realize > that on x86 that list is either one or two, but other SMT > implementations are groups of four or more). > > So I suspect your patch largely makes things faster (avoid those > insane cpumask operations), but the for_each_cpu() one is still an > absolutely horrible way to find a couple of basically statically known > (modulo hotplug, which is disabled here anyway) CPU's. So even if the > algorithm makes sense at some higher level, it doesn't really seem to > make sense from an implementation standpoint. > > Also, do we really want to spread things out that aggressively? > How/why do we know that we don't want to share L2 caches, for example? > It sounds like a bad idea from a power standpoint, and possibly > performance too.
Right, maybe the quicker lookup would be the other way around, down the cache hierarchy: check the CPUs sharing L1, then L2 and if there's no idle cpu, fall back to the L3-sharing ones and then simply grab one. I don't know whether that could work though, we'd need to run it heavily. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/