On Mon, 2013-01-21 at 13:07 +0800, Michael Wang wrote: > That seems like the default one, could you please show me the numbers in > your datapoint file?
Yup, I do not touch the workfile. Datapoints is what you see in the tabulated result... 1 1 1 5 5 5 10 10 10 ... so it does three consecutive runs at each load level. I quiesce the box, set governor to performance, echo 250 32000 32 4096 > /proc/sys/kernel/sem, then ./multitask -nl -f, and point it at ./datapoints. > I'm not familiar with this benchmark, but I'd like to have a try on my > server, to make sure whether it is a generic issue. One thing I didn't like about your changes is that you don't ask wake_affine() if it's ok to pull cross node or not, which I though might induce imbalance, but twiddling that didn't fix up the collapse, pretty much leaving only the balance path. > >> And I'm confusing about how those new parameter value was figured out > >> and how could them help solve the possible issue? > > > > Oh, that's easy. I set sched_min_granularity_ns such that last_buddy > > kicks in when a third task arrives on a runqueue, and set > > sched_wakeup_granularity_ns near minimum that still allows wakeup > > preemption to occur. Combined effect is reduced over-scheduling. > > That sounds very hard, to catch the timing, whatever, it could be an > important clue for analysis. (Play with the knobs with a bunch of different loads, I think you'll find that those settings work well) > >> Do you have any idea about which part in this patch set may cause the > >> issue? > > > > Nope, I'm as puzzled by that as you are. When the box had 40 cores, > > both virgin and patched showed over-scheduling effects, but not like > > this. With 20 cores, symptoms changed in a most puzzling way, and I > > don't see how you'd be directly responsible. > > Hmm... > > > > >> One change by designed is that, for old logical, if it's a wake up and > >> we found affine sd, the select func will never go into the balance path, > >> but the new logical will, in some cases, do you think this could be a > >> problem? > > > > Since it's the high load end, where looking for an idle core is most > > likely to be a waste of time, it makes sense that entering the balance > > path would hurt _some_, it isn't free.. except for twiddling preemption > > knobs making the collapse just go away. We're still going to enter that > > path if all cores are busy, no matter how I twiddle those knobs. > > May be we could try change this back to the old way later, after the aim > 7 test on my server. Yeah, something funny is going on. I'd like select_idle_sibling() to just go away, that task be integrated into one and only one short and sweet balance path. I don't see why fine_idlest* needs to continue traversal after seeing a zero. It should be just fine to say gee, we're done. Hohum, so much for pure test and report, twiddle twiddle tweak, bend spindle mutilate ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/