On Wed, 26 Sep 2012, Borislav Petkov wrote:
It always selected target_cpu, but the fact is, that doesn't really
sound very sane. The target cpu is either the previous cpu or the
current cpu, depending on whether they should be balanced or not. But
that still doesn't make any *sense*.
In fact, the whole select_idle_sibling() logic makes no sense
what-so-ever to me. It seems to be total garbage.
For example, it starts with the maximum target scheduling domain, and
works its way in over the scheduling groups within that domain. What
the f*ck is the logic of that kind of crazy thing? It never makes
sense to look at a biggest domain first. If you want to be close to
something, you want to look at the *smallest* domain first. But
because it looks at things in the wrong order, it then needs to have
that inner loop saying "does this group actually cover the cpu I am
interested in?"
Please tell me I am mis-reading this?
First of all, I'm so *not* a scheduler guy so take this with a great
pinch of salt.
The way I understand it is, you either want to share L2 with a process,
because, for example, both working sets fit in the L2 and/or there's
some sharing which saves you moving everything over the L3. This is
where selecting a core on the same L2 is actually a good thing.
Or, they're too big to fit into the L2 and they start kicking each-other
out. Then you want to spread them out to different L2s - i.e., different
HT groups in Intel-speak.
an observation from an outsider here.
if you do overload a L2 cache, then the core will be busy all the time and
you will end up migrating a task away from that core.
It seems to me that trying to figure out if you are going to overload the
L2 is an impossible task, so just assume that it will all fit, and the
worst case is you have one balancing cycle where you can't do as much work
and then the normal balancing will kick in and move something anyway.
over the long term, the work lost due to not moving optimally right away
is probably much less than the work lost due to trying to figure out the
perfect thing to do.
and since the perfect thing to do is going to be both workload and chip
specific, trying to model that in your decision making is a lost cause.
David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/