On Wed, 20 Mar 2024, Julien Grall wrote:
> Hi John,
> 
> On 20/03/2024 16:24, John Ernberg wrote:
> > Hi Bertrand,
> > 
> > On 3/13/24 11:07, Bertrand Marquis wrote:
> > > Hi,
> > > 
> > > > On 8 Mar 2024, at 15:04, Julien Grall <jul...@xen.org> wrote:
> > > > 
> > > > Hi John,
> > > > 
> > > > Thank you for the reply.
> > > > 
> > > > On 08/03/2024 13:40, John Ernberg wrote:
> > > > > On 3/7/24 00:07, Julien Grall wrote:
> > > > > >    > Ping on the watchdog discussion bits.
> > > > > > 
> > > > > > Sorry for the late reply.
> > > > > > 
> > > > > > On 06/03/2024 13:13, John Ernberg wrote:
> > > > > > > On 2/9/24 14:14, John Ernberg wrote:
> > > > > > > > 
> > > > > > > > >       * IMX_SIP_TIMER_*:  This seems to be related to the
> > > > > > > > > watchdog.
> > > > > > > > > Shouldn't dom0 rely on the watchdog provided by Xen instead?
> > > > > > > > > So those
> > > > > > > > > call will be used by Xen.
> > > > > > > > 
> > > > > > > > That is indeed a watchdog SIP, and also for setting the SoC
> > > > > > > > internal RTC
> > > > > > > > if it is being used.
> > > > > > > > 
> > > > > > > > I looked around if there was previous discussion and only really
> > > > > > > > found [3].
> > > > > > > > Is the xen/xen/include/watchdog.h header meant to be for this
> > > > > > > > kind of
> > > > > > > > watchdog support or is that more for the VM watchdog? Looking at
> > > > > > > > the x86
> > > > > > > > ACPI NMI watchdog it seems like the former, but I have never
> > > > > > > > worked with
> > > > > > > > x86 nor ACPI.
> > > > > > 
> > > > > > include/watchdog.h contains helper to configure the watchdog for
> > > > > > Xen. We
> > > > > > also have per-VM watchdog and this is configured by the hypercall
> > > > > > SCHEDOP_watchdog.
> > > > > > 
> > > > > > > > 
> > > > > > > > Currently forwarding it to Dom0 has not caused any watchdog
> > > > > > > > resets with
> > > > > > > > our watchdog timeout settings, our specific Dom0 setup and VM
> > > > > > > > count.
> > > > > > 
> > > > > > IIUC, the SMC API for the watchdog would be similar to the ACPI NMI
> > > > > > watchdog. So I think it would make more sense if this is not exposed
> > > > > > to
> > > > > > dom0 (even if Xen is doing nothing with it).
> > > > > > 
> > > > > > Can you try to hide the SMCs and check if dom0 still behave
> > > > > > properly?
> > > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > This SMC manages a hardware watchdog, if it's not pinged within a
> > > > > specific interval the entire board resets.
> > > > 
> > > > Do you know what's the default interval? Is it large enough so Xen +
> > > > dom0 can boot (at least up to when the watchdog driver is initialized)?
> > > > 
> > > > > If I block the SMCs the watchdog driver in Dom0 will fail to ping the
> > > > > watchdog, triggering a board reset because the system looks to have
> > > > > become unresponsive. The reason this patch set started is because we
> > > > > couldn't ping the watchdog when running with Xen.
> > > > > In our specific system the bootloader enables the watchdog as early as
> > > > > possible so that we can get watchdog protection for as much of the
> > > > > boot
> > > > > as we possibly can.
> > > > > So, if we are to block the SMC from Dom0, then Xen needs to take over
> > > > > the pinging. It could be implemented similarly to the NMI watchdog,
> > > > > except that the system will reset if the ping is missed rather than
> > > > > backtrace.
> > > > > It would also mean that Xen will get a whole watchdog driver-category
> > > > > due to the watchdog being vendor and sometimes even SoC specific when
> > > > > it
> > > > > comes to Arm.
> > > > > My understanding of the domain watchdog code is that today the domain
> > > > > needs to call SCHEDOP_watchdog at least once to start the watchdog
> > > > > timer. Since watchdog protection through the whole boot process is
> > > > > desirable we'd need some core changes, such as an option to start the
> > > > > domain watchdog on init. >
> > > > > It's quite a big change to make
> > > > 
> > > > For clarification, above you seem to mention two changes:
> > > > 
> > > > 1) Allow Xen to use the HW watchdog
> > > > 2) Allow the domain to use the watchdog early
> > > > 
> > > > I am assuming by big change, you are referring to 2?
> > > > 
> > > > , while I am not against doing it if it
> > > > > makes sense, I now wonder if Xen should manage hardware watchdogs.
> > > > > Looking at the domain watchdog code it looks like if a domain does not
> > > > > get enough execution time, the watchdog will not be pinged enough and
> > > > > the guest will be reset. So either watchdog approach requires Dom0 to
> > > > > get execution time. Dom0 also needs to service all the PV backends
> > > > > it's
> > > > > responsible for. I'm not sure it's valuable to add another layer of
> > > > > watchdog for this scenario as the end result (checking that the entire
> > > > > system works) is achieved without it as well.
> > > > > So, before I try to find the time to make a proposal for moving the
> > > > > hardware watchdog bit to Xen, do we really want it?
> > > > 
> > > > Thanks for the details. Given that the watchdog is enabled by the
> > > > bootloader, I think we want Xen to drive the watchdog for two reasons:
> > > > 1) In true dom0less environment, dom0 would not exist
> > > > 2) You are relying on Xen + Dom0 to boot (or at least enough to get the
> > > > watchdog working) within the watchdog interval.
> > > 
> > > Definitely we need to consider the case where there is no Dom0.
> > > 
> > > I think there are in fact 3 different use cases here:
> > > - watchdog fully driven in a domain (dom0 or another): would work if it is
> > > expected
> > >     that Xen + Domain boot time is under the watchdog initial refresh
> > > rate. I think this
> > >     could make sense on some applications where your system depends on a
> > > specific
> > >     domain to be properly booted to work. This would require an initial
> > > refresh time
> > >     configurable in the boot loader probably.
> > 
> > This is our use-case. ^
> > 
> > Our dom0 is monitoring and managing the other domains in our system.
> > Without dom0 working the system isn't really working as a whole.
> > 
> > @Julien: Would you be ok with the patch set continuing in the direction
> > of the
> > original proposal, letting another party (or me at a later time) implement
> > the fully driven by Xen option?
> I am concerned about this particular point from Bertrand:
> 
> "would work if it is expected that Xen + Domain boot time is under the
> watchdog initial refresh rate."
> 
> How will a user be able to figure out how to initially configure the watchdog?
> Is this something you can easily configure in the bootloader at runtime?
> 
> 
> Overall, I am not for this approach at least in the current status. I would be
> more inclined if there are some documentations explaining how this is supposed
> to be configured on NXP, so others can use the code.
>
> Anyway, this is why we have multiple Arm maintainers for this kind of
> situation. If they other agrees with you, then they can ack the patch and this
> can be merged.


The approach here would not be my choice either. However, I think it
would be nice to have better support for NXP imx8 boards in upstream
Xen. To that end, I would ack these patches but I would ask to add a
document under xen.git/docs/ explaining the approach, limitations, and
requirements, so that someone else can use the code effectively.

Reply via email to