On 29.04.2022 12:52, Dario Faggioli wrote:
> On Wed, 2022-04-13 at 12:00 +0200, Jan Beulich wrote:
>> I also have a more general question here: sched.h says "Bitmask of
>> CPUs
>> on which this VCPU may run" for hard affinity and "Bitmask of CPUs on
>> which this VCPU prefers to run" for soft affinity. Additionally
>> there's
>> soft_aff_effective. Does it make sense in the first place for one to
>> be
>> a proper subset of the of the other in _both_ directions? 
>>
> I'm not sure I'm 100% getting what you're asking. In particular, I'm
> not sure what you mean with "for one to be a propper subset of the
> other in both directions"?
> 
> Anyway, soft and hard affinity are under the complete control of the
> user (I guess we can say that they're policy), so we tend to accept
> pretty much everything that comes from the user.
> 
> That is, the user can set an hard affinity to 1-6 and a soft affinity
> of (a) 2-3, (b) 0-2, (c) 7-12, etc.
> 
> Case (a), i.e., soft is a strict subset of hard, is the one that makes
> the most sense, of course. With this configuration, the vCPU(s) can run
> on CPUs 1, 2, 3, 4, 5 and 6, but the scheduler will prefer to run it
> (them) on 2 and/or 3.
> 
> Case (b), i.e., no strict subset, but there's some overlap, also means
> that soft-affinity is going to be considered and have an effect. In
> fact, vCPU(s) will prefer to run on CPUs 1 and/or 2, but of course it
> (they) will never run on CPU 0. Of course, the user can, at a later
> point in time, change the hard affinity so that it includes CPU 0, and
> we'll be back to the strict-subset case. So that's way we want to keep
> 0 in the mast, even if it causes soft to not be a strict subset of
> hard.
> 
> In case (c), soft affinity is totally useless. However, again, the user
> can later change hard to include some or all CPUs 7-12, so we keep it.
> We do, however, print a warning. And we also use the soft_aff_effective
> flag to avoid going through the soft-affinity balancing step in the
> scheduler code. This is, in fact, why we also check whether hard is not
> a strict subset of soft. As, if it is, there's no need to do anything
> about soft, as honoring hard will automatically take care of that as
> well.
> 
>> Is that mainly
>> to have a way to record preferences even when all preferred CPUs are
>> offline, to be able to go back to the preferences once CPUs come back
>> online?
>>
> That's another example/use case, yes. We want to record the user's
> preference, whatever the status of the system (and of other aspects of
> the configuration) is.
> 
> But I'm not really sure I've answered... Have I?

You did. My question really only was whether there are useful scenarios
for proper-subset cases in both possible directions.

>> Then a follow-on question is: Why do you use cpumask_all for soft
>> affinity in the first of the two calls above? Is this to cover for
>> the
>> case where all CPUs in dom0_cpus would go offline?
>>
> Mmm... what else should I be using?

I was thinking of dom0_cpus.

> If dom0_nodes is in "strict" mode,
> we want to control hard affinity only. So we set soft to the default,
> which is "all". During operations, since hard is a subset of "all",
> soft-affinity will be just ignored.

Right - until such point that all (original) Dom0 CPUs have gone
offline. Hence my 2nd question.

> So I'm using "all" because soft-affinity is just "all", unless someone
> sets it differently.

How would "someone set it differently"? Aiui you can't control both
affinities at the same time.

> But I am again not sure that I fully understood and properly addressed
> your question. :-(
> 
> 
>>> +    }
>>>      else
>>>          sched_set_affinity(unit, &cpumask_all, &cpumask_all);
>>
>> Hmm, you leave this alone. Wouldn't it be better to further
>> generalize
>> things, in case domain affinity was set already? I was referring to
>> the mask calculated by sched_select_initial_cpu() also in this
>> regard.
>> And when I did suggest to re-use the result, I did mean this
>> literally.
>>
> Technically, I think we can do that. Although, it's probably cumbersome
> to do, without adding at least one cpumask on the stack, or reshuffle
> the locking between sched_select_initial_cpu() and sched_init_vcpu(),
> in a way that I (personally) don't find particularly pretty.

Locking? sched_select_initial_cpu() calculates into a per-CPU variable,
which I sincerely hope cannot be corrupted by another CPU.

> Also, I don't think we gain much from doing that, as we probably still
> need to have some special casing of dom0, for handling dom0_vcpus_pin.

dom0_vcpus_pin is likely always going to require special casing, until
such point where we drop support for it.

> And again, soft and hard affinity should be set to what the user wants
> and asks for. And if, for instance, he/she passes
> dom0_nodes="1,strict", soft-affinity should just be all. If, e.g., we
> set both hard and soft affinity to the CPUs of node 1, and if later
> hard affinity is manually changed to "all", soft affinity will remain
> to node 1, even if it was never asked for it to be that way, and the
> user will need to change that explicitly as well. (Of course, it's not
> particularly clever to boot with dom0_nodes="1,strict" and then change
> dom0's vCPUs' hard affinity to node 0... but the user is free to do
> that.)

I can certainly accept this as justification for using "all" further up.

Jan


Reply via email to