On 05/05/2014 12:50 AM, Preeti U Murthy wrote: > Yeah now I see it. But I still feel wake_affine() and > select_idle_sibling() are not at fault primarily because when they were > introduced, I don't think it was foreseen that the cpu topology would > grow to the extent it is now.
It's not about "fault", it is about the fact that on current large NUMA systems they are broken, and could stand some improvement :) > select_idle_sibling() for instance scans the cpus within the purview of > the last level cache of a cpu and this was a small set. Hence there was > no overhead. Now with many cpus sharing the L3 cache, we see an > overhead. wake_affine() probably did not expect the NUMA nodes to come > under its governance as well and hence it sees no harm in waking up > tasks close to the waker because it still believes that it will be > within a node. If two tasks truly are related to each other, I think we will want to have the wake_affine logic pull them towards each other, all the way across a giant NUMA system if needs be. The problem is that the current wake_affine logic starts in the ON position, and only switches off in a few very specific scenarios. I suspect we would be better off with the reverse, starting with wake_affine in the off position, and switching it on when we detect it makes sense to do so. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/