On Tue, May 21, 2013 at 10:08:29PM +0100, Sebastian Capella wrote:
> Thanks Liviu!
> 
> Some comments below..
> 
> Quoting Liviu Dudau (2013-05-21 10:15:42)
> > ... Which side of the interface are you actually thinking of?
> 
> Both, I'm really just trying to understand the problem.
> 
> > I don't think there is any C-state other than simple idle (which
> > translates into an WFI for the core) that *doesn't* take into account
> > power domain latencies and code path lengths to reach that state.
> 
> I'm speaking more about additional c-states after the
> lowest independent compute domain cstate, where we may add additional
> cstates which reduce the power further at a higher latency cost.  These
> may be changing power states for the rest of the SOC or external power
> chips/supplies.  Those states would effectively enter the lowest PSCI
> C-state, but then have additional steps in the CPUIdle hw specific
> driver.

Quoting from the PSCI spec:

"ARM systems generally include a power controller which provides the necessary
mechanisms to control processor power. It normally provides interfaces to allow 
a number
of power management functions. These often include support for transitioning 
processors,
clusters or a superset, into low power states, where the processors are either 
fully switched
off, or in quiescent states where they are not executing code. ARM strongly 
recommends
that control of these states, via this power controller, is vested in the 
secure world.
Otherwise, the OSPM could enter a low power mode without informing the Trusted 
OS.
Even if such an arrangement could be made robust, it is unlikely to perform as 
well. In
particular, for states where the core is fully power gated, a longer boot 
sequence would
take place upon wake up as full initialization would be required by the secure 
world. This
would be required as the secure components would effectively be booting from 
scratch
every time. On a system where this power control is vested in the Secure world, 
these
components would have an opportunity to save their state before powering off, 
allowing a
faster resumption on power up. In addition, the secure world might need to 
manage
peripherals as part of a power transition."

If you don't have such a power controller in your system then yes, you
will have to drive the hardware from the CPUidle hw driver. But I don't
see the need of a separate C-state for that.

I would say that the list of C-states that I have listed further down
should cover most of the cases, maybe with the addition of an 
SYSTEM_SUSPEND state if I understood your concerns correctly.

Going on a tangent a bit:

To me, the C-states are like layers in an onion. Each deeper C-state
includes the previous C-states that came in the list earlier. Therefore,
you describe the C-state in terms of minimum total time to spend in that
state and it includes the worst transition times (cost of reaching
that state and to come out of it). Completely made up example:

CPU_ON          < 2ms
CPU_IDLE        > 2ms
CPU_OFF         > 10ms
CLUSTER_OFF     > 500ms
SYSTEM_SUSPEND  > 5min
SYSTEM_OFF      > 1h

If you do that then the CPUidle driver decision becomes as simple as
finding the right state that would not lead to a missed event and you
don't really have to understand the costs of the host OS (if there is
any). It should match the expectations of a real time system as well,
if the table is correctly fine tuned (and if one understands that a
real time system is about constant time response, not immediate response).

> 
> > I don't know how to draw the line between the host OS costs and the
> > guest OS costs when using target latencies. On one hand I think that
> > the host OS should add its own costs into what gets passed to the
> > guest and the guest will see a slower than baremetal system in terms
> > of state transitions;
> 
> I was thinking maybe this also.. Is there a way to query the state
> transition cost information through PSCI?  Would there be a way to
> have the layers of hosts/monitors/etc contribute the cost of their
> paths into the query results?

Possibly. PSCI spec doesn't specify any API for querying the C-state
costs because the way to do so is still in the air. We know that the
server world would like to carry on using ACPI for describing those
states, device tree-based systems would probably invent a different way
or learn how to integrate with ACPI.


> 
> > ... on the other hand I would like to see the
> > guest OS shielded from this type of information as there are too many
> > variables behind it (is the host OS also under some monitor code? are
> > all transitions to the same state happening in constant time or are
> > they dependent of number of cores involved, their state, etc, etc)
> 
> I agree, but don't see how.  In our systems, we do very much care about
> the costs, and have ~real time constraints to manage.  I think
> we need a good understanding of costs for the hw states.

And are those costs constant? Do you depend on how many CPUs you have
online to determine how long it will take to do a cluster shutdown? Does
having the DMA engine on add to the quiescence time? While I don't doubt
that you understand what are the minimum time constraints that the
hardware imposes, it's the combination of all the elements in the system
that is under software control that gives the final answer and in most
cases it is "depends".


> 
> > If one uses a simple set of C-states (CPU_ON, CPU_IDLE, CPU_OFF, 
> > CLUSTER_OFF, SYSTEM_OFF) then the guest could make requests independent
> > of the host OS latencies _after the relevant translations between
> > time-to-next-event and intended target C-state have been performed_.
> 
> I think that if we don't know the real cost of entering a state,
> we basically will end up chosing the wrong states in many occasions.

True. But that "real" cost is usually an estimate of the worst case, or
an average time, right?

> 
> CPUIdle is already binning the allowable costs into a specific state.
> If we decide that CPUIdle does not know the real cost of the states then
> the binning will be wrong sometimes, and cpuidle would not be selecting
> the correct states.  I think this could have bad side effects for real time
> systems.

CPUidle does know the costs. The "reality" of those costs depends on the
system you are running (virtualised or not, trusted OS trapping you calls
or not). If the costs do not reflect the actual transition time then yes,
CPUidle will make the wrong decision and the system won't work as intended.
I'm not advocating doing that.

Also, I don't understand your remark regarding real time systems. If the
CPUidle costs are wrong the decision will be wrong regardless of the type
of system you use. Or are you concerned that being too conservative and
lying to the OS about the actual cost for the system to transition to the
new state at that moment will introduce unnecessary delays and forgo the
real time functionality.


> 
> For my purposes and as things are today, I'd likely factor in the
> (probably pre-known & measured) host os/monitor costs into the cpuidle
> DT entries and have cpuidle run the show.  At the lower layers, it won't
> matter what is passed through as long as the correct state is chosen.

Understood. I'm advocating the same thing with the only added caveat that
the state you choose is not a physical system state in all cases, but a
state that makes sense for the OS running at that level. As such, the
numbers that will be used by CPUidle will be in the "ballpark" region
rather than absolute numbers.

Any running OS should only be concerned with getting the time to the next
event right (be it real time constrained or not) and finding out which
C-state will guarantee availability at that time. If one doesn't know
when the next event will come then being conservative should be good
enough. There is no way you will have a ~real time system if you transition
to cluster off and the real cost of coming out is measured in miliseconds,
regardless of how you came to that decision.

Best regards,
Liviu


> 
> Thanks,
> 
> Sebastian
> 
> `
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯


_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to