Hi Stefano,
On 24/06/2022 22:31, Stefano Stabellini wrote:
On Fri, 24 Jun 2022, Julien Grall wrote:
On 23/06/2022 23:07, Stefano Stabellini wrote:
On Thu, 23 Jun 2022, dmitry.semen...@gmail.com wrote:
From: Dmytro Semenets <dmytro_semen...@epam.com>
So wouldn't it be better to remove the panic from the implementation of
call_psci_cpu_off?
I have asked to keep the panic() in call_psci_cpu_off(). If you remove the
panic() then we will hide the fact that the CPU was not properly turned off
and will consume more energy than expected.
The WFI loop is fine when shutting down or rebooting because we know this will
only happen for a short period of time.
Yeah, I don't think we should hide that CPU_OFF failed. I think we
should print a warning. However, given that we know CPU_OFF can
reasonably fail in normal conditions returning DENIED when a Trusted OS
is present, then I think we should not panic.
We know how to detect this condition (see section 5.9 in DEN0022D.b). So
I would argue we should fix it properly rather than removing the panic().
If there was a way to distinguish a failure because a Trusted OS is
present (the "normal" failure) from other failures, I would suggest to:
- print a warning if failed due to a Trusted OS being present
- panic in other cases
Unfortunately it looks like in all cases the return code is DENIED :-(
I am confused. Per the spec, the only reason CPU_OFF can return DENIED
is because the Trusted OS is resident. So what other cases are you
talking about?
Given that, I would not panic and only print a warning in all cases. Or
we could ASSERT which at least goes away in !DEBUG builds.
ASSERT() is definitely not way to deal with external input. I could
possibly accept a WARN(), but see above.
The reason I am saying this is that stop_cpu is called in a number of
places beyond halt_this_cpu and as far as I can tell any of them could
trigger the panic. (I admit they are unlikely places but still.)
This is one of the example where the CPU will not be stopped for a short
period of time. We should deal with them differently (i.e. migrating the
trusted OS or else) so we give all the chance for the CPU to be fully powered.
IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it.
I see your point now. I was seeing the two things as one.
I think it is true that the WFI loop is likely to work. Also it is true
that from a power perspective it makes no different on power down or
reboot. From that point of view this patch is OK.
But even on shut-down/reboot, why not do that as a fallback in case
CPU_OFF didn't work? It is going to work most of the times anyway, why
change the default for the few cases that doesn't work?
Because we would not be consistent how we would turn off a CPU on a
system supporting PSCI. I would prefer to use the same method everywhere
so it is easier to reason about.
I am also not sure how you define "most of the time". Yes it is possible
that the boards we aware of will not have this issue, but how about the
one we don't know?
Given that this patch would work, I don't want to insist on this and let
you decide.
But even if we don't want to remove the panic as part of this patch, I
think we should remove the panic in a separate patch anyway, at least
until someone investigates and thinks of a strategy how to migrate the
TrustedOS as you suggested.
If we accept this patch, then we remove the immediate pain. The other
uses of stop_cpu() are in:
1) idle_loop(), this is reachable when turning off a CPU after boot
(not supported on Arm)
2) start_secondary(), this is only used during boot (CPU
hot-unplug is not supported)
Even if it would be possible to trigger the panic() in 2), I am not
aware of an immediate issue there. So I think it would be the wrong
approach to remove the panic() first and then investigate.
The advantage of the panic() is it will remind us that some needs to be
fixed. With a warning (or WARN()) people will tend to ignore it.
Cheers,
--
Julien Grall