On Thu, 2013-01-24 at 15:15 +0800, Michael Wang wrote: > On 01/24/2013 02:51 PM, Mike Galbraith wrote: > > On Thu, 2013-01-24 at 14:01 +0800, Michael Wang wrote: > > > >> I've enabled WAKE flag on my box like you did, but still can't see > >> regression, and I've just tested on a power server with 64 cpu, also > >> failed to reproduce the issue (not compared with virgin yet, but can't > >> see collapse). > > > > I'm not surprised. I'm seeing enough inconsistent crap to come to the > > conclusion that stock scheduler knobs flat can't be used on a largish > > box, they're just too preempt-happy, leading to weird crap. > > > > My 2 missing nodes came back, and the very same kernel that highly > > repeatably collapsed with 2 nodes does not with 4 nodes, and 2 nodes > > does not collapse with only preemption knob tweaking, and that's > > bullshit. Virgin shows instability in the mid-range, make a tiny tweak > > that should have little if any effect there, and that instability > > vanishes entirely. Test runs are not consistent enough boot to boot etc > > etc. Either stock knobs suck on NUMA boxen, or this box is possessed. > > Mike, I wonder the reason why change back to the old way make collapse > away may not because there are logical error in new balance path, it's > just changed the cost of select_task_rq(), whatever it's more or less, > it's accidentally achieve the same effect as you tweak the knob, so > that's the reason why it looks like old is better than new.
That's what I'm saying, it's a useless crap side-effect of a preempt happy kernel. Results with these knobs are just not stable. Results go wildly unstable with 2 nodes vs 4 in this box, but can be stabilized in all with preemption knob adjustment.. or phase of moon might make them appear stable.. or not. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/