On Mon, Jul 18, 2016 at 6:24 PM, Dario Faggioli <dario.faggi...@citrix.com> wrote: > On Mon, 2016-07-18 at 17:48 +0100, George Dunlap wrote: >> On 15/07/16 15:50, Dario Faggioli wrote: >> > >> > +/* >> > + * If all the siblings of cpu (including cpu itself) are in >> > idlers, >> > + * set all their bits in mask. >> > + * >> > + * In order to properly take into account tickling, idlers needs >> > to be >> > + * set qeual to something like: >> *equal (I can fix this on check-in) >> > Oops! > >> > + * >> > + * rqd->idle & (~rqd->tickled) >> > + * >> > + * This is because cpus that have been tickled will very likely >> > pick up some >> > + * work as soon as the manage to schedule, and hence we should >> > really consider >> > + * them as busy. >> OK this is something that slightly confused me when I was reviewing >> the >> patch the first time: that rqd->idle is *all* pcpus which are >> currently >> idle (and thus we need to & (~tickled) when using it), but rqd- >> >smt_idle >> is meant to be maintained as *non-tickled* idle pcpus. >> > Short answer is, "yes, this recap of yours is correct". > > In fact, the difference between idle and smt_idle is that the former is > valid instantaneously, while the latter is tracking a state. > > IOW, if, at any given time, I want to know what pcpus are idle, I check > rqd->idle. If I want to know what are idle and also are not (or are > unlikely) just about to pick up work, I can check > rqd->idle&(~rqd->tickled) > > Let's now consider smt_idle and assume that, at time t siblings pcpus 2 > and 3 are idle (as in, their bit is 1 in rqd->idle). If I'd be basing > smt_idle just on that, I could at this point set the bit of the core in > smt_idle. This in turn means that work will likely be sent to either 2 > or 3 (depending on all the other factors that influence this). Let's > assume we select 2. But if either of them --although being idle-- was > has actually been tickled already, we may have taken a suboptimal > decision. In fact, if 3 was tickled, both 2 and 3 will pick up work, > and if there is another core (say, made up of siblings pcpus 6 and 7) > which is truly fully idle, we would better have chosen a pcpu from > there. If 2 was the one that was tickled, that's even worse, because I > most likely have 2 work items, and am tickling only 1 pcpu! > > So, again, yes, basically this means that I need smt_idle to be > representative of the set of non-tickled idle pcpus. > >> Are you planning at some point to have a follow-up patch which >> changes >> rqd->idle to be non-tickled idle pcpus as well? Unless I missed >> something it looks like at the moment the only times rqd->idle is >> acted >> upon is after &~-ing out rqd->tickled anyway. >> > I am indeed, but I was planning to do that after this round of changes > (this series, plus soft-affinity, plus caps, which I have in my queue). > > It's, after all, an optimization, and hence I think it is fine to leave > it to when things will be proven to be working. :-) > > If you're saying that this discrepancy between rqd->idle's and > rqd->smt_idle's semantic is, at minimum, unideal, I do agree... but I > think, for now at least, it's worth living with it.
I hadn't actually said anything, but you know me well enough to guess what I'm thinking. :-) I am somewhat torn between feeling like the inconsistency and as you say, the fact that this is a distinct improvement and it would seem a bit petty to insist that you either wait or produce a patch to change idle at the same time. But I do think that the difference needs to be called out a bit better. What about folding in something like the attached patch? -George
commit fd8fe6d8526cc9d6abe510aae7a654d1b72d4305 Author: George Dunlap <george.dun...@citrix.com> Commit: George Dunlap <george.dun...@citrix.com> George's mods diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 6ccc6f0..3e1720c 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -353,8 +353,8 @@ struct csched2_runqueue_data { struct list_head svc; /* List of all vcpus assigned to this runqueue */ unsigned int max_weight; - cpumask_t idle, /* Currently idle */ - smt_idle, /* Fully idle cores (as in all the siblings are idle) */ + cpumask_t idle, /* Currently idle pcpus */ + smt_idle, /* Fully idle-and-untickled cores (see below) */ tickled; /* Have been asked to go through schedule */ int load; /* Instantaneous load: Length of queue + num non-idle threads */ s_time_t load_last_update; /* Last time average was updated */ @@ -454,17 +454,20 @@ struct csched2_dom { */ /* - * If all the siblings of cpu (including cpu itself) are in idlers, - * set all their bits in mask. - * - * In order to properly take into account tickling, idlers needs to be - * set qeual to something like: - * - * rqd->idle & (~rqd->tickled) - * - * This is because cpus that have been tickled will very likely pick up some - * work as soon as the manage to schedule, and hence we should really consider - * them as busy. + * If all the siblings of cpu (including cpu itself) are both idle and + * untickled, set all their bits in mask. + * + * NB that rqd->smt_idle is different than rqd->idle. rqd->idle + * records pcpus that at are merely idle (i.e., at the moment do not + * have a vcpu running on them). But you have to manually filter out + * which pcpus have been tickled in order to find cores that are not + * going to be busy soon. Filtering out tickled cpus pairwise is a + * lot of extra pain; so for rqd->smt_idle, we explicitly make so that + * the bits of a pcpu are set only if all the threads on its core are + * both idle *and* untickled. + * + * This means changing the mask when either rqd->idle or rqd->tickled + * changes. */ static inline void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel