On Thu, 16 May 2013, Liviu Dudau wrote:

> From previous discussions between Achin, Charles and Nico I am aware 
> that Nico has decided for the moment that target residency should be 
> useful enough to be used by MCPM. That is because Nico is a big 
> proponent of doing everything in the kernel and keeping the firmware 
> dumb and (mostly) out of the way. However, the view that we have here 
> at ARM (but I will only speak in my name here) is that in order to 
> have alignment with AArch64 kernel and the way it is using PSCI 
> interface, we should be moving the kernel on AArch32 and armv7a to run 
> in non-secure mode. At that time, the kernel will make PSCI calls to 
> do CPU_ON, CPU_SUSPEND, etc. and the aim is to provide to the firmware 
> the deepest C-state that the core can support going to without being 
> woken up to do any additional state management. It is then the 
> latitude of the firmware to put the core in that state or to tally the 
> sum of all requests in a cluster and decide to put the cores and the 
> cluster in the lowest common C-state.

That's all good.

My worry is about the definition of all the different C-state on all the 
different platforms.  I think it is simpler to have the kernel tell the 
firmware what it anticipates in terms of load/quiescence periods (e.g. 
the next interrupt is likely to happen in x millisecs), and let the 
firmware and/or low-level machine specific backend translate that into 
the appropriate C-state on its own.  After all, the firmware is supposed 
to know what is the best C-state to apply given a target latency and the 
current state of the surrounding CPUs, which may also differ depending 
on the cluster type, etc.

> Regarding the migration of the guest kernels, it should be transparent 
> (to a certain extent) wether on resume it is running on the same core 
> or it has been migrated. The host OS should have a better 
> understanding on what can be achieved and what invariants it can still 
> hold, but it should not be limited to do that in a specific amount of 
> time. Lets take an example: one core in the cluster says that it can 
> go as deep as cluster shutdown but it does so in your use of the API 
> by saying that it would like to sleep for at least amount X of time. 
> The host however has to tally all the cores in the cluster in order to 
> decide if the cluster can be shutdown, has to do a lot of cache 
> maintainance and state saving, turning off clocks and devices etc, and 
> in doing so is going to consume some compute cycles; it will then 
> substract the time spent making a decision and doing the cleanup and 
> then figure out if there is still time left for each of the cores to 
> go to sleep for the specified amount of time. All this implies that 
> the guest has to have an understanding of the time the host is 
> spending in doing maintainance operations before asking the hypervisor 
> for a target residency and the host still has to do the math again to 
> validate that the guest request is still valid.

I don't follow your reasoning.  Why would the guest have to care 
about what the host can do at all and in what amount of time?  What the 
guest should tell the host is this: "I don't anticipate any need for the 
CPU during the next 500 ms so take this as a hint to perform the most 
efficient power saving given the constraints you alone have the 
knowledge of."  The host should know how long it takes to flush its 
cache, whether or not that cache is in use by other guests, etc.  But 
the guest should not care.

And in this case the math performed by the guest and the host are 
completely different.

> If we choose to use the target C-state, the request validation is 
> simplified to a comparision between each core target C-state and the 
> lowest common C-state per cluster, all done in the host.
> 
> Of course, by describing C-states in terms of target residency times 
> both schemes can be considered equivalent. But that target residency 
> time is not constant for all code paths and for all conditions and 
> that makes the decision process more complicated.

For who?

If the guest is responsible for choosing a C-state itself and pass it on 
to the host, it has to process through a set of available C-states and 
select the proper one according to the target residency time it must 
compute anyway since this is all the scheduler can tell you.  And since 
those C-states are likely to have different latency profiles on 
different clusters, the guest will have to query the type of host it is 
running on or the available C-states each time it wants to select one, 
etc.  So I don't think passing the target residency directly to the host 
is more complicated when you look at the big picture.


Nicolas

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to