On Tue, Nov 01, 2022 at 01:03:17PM +0000, Daniel P. Berrangé wrote: > On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote: > > On Mon, 31 Oct 2022 11:48:58 -0400 > > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > > > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote: > > > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote: > > > > > The TCO watchdog is unconditionally integrated into the Q35 machine > > > > > type by default, but at the same time is unconditionally disabled > > > > > from firing by a host config option that overrides guest OS attempts > > > > > to enable it. People have to know to set a magic -global to make > > > > > it non-broken > > > > > > > > Incidentally I found that originally the TCO watchdog was not > > > > unconditionally enabled. Its exposure to the guest could be > > > > turned on/off using > > > > > > > > -global ICH9-LPC.enable_tco=bool > > > > > > > > This was implemented for machine type compat, but it also gave > > > > apps a way to disable the watchdog functionality. Unfortunately > > > > that ability was discarded in this series: > > > > > > > > > > > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabk...@redhat.com/ > > > > > > > > but the 'enable_tco' property still exists in QOM, but silently > > > > ignored. > > > > > > > > Seems we should either fix the impl of 'enable_tco', or remove the > > > > QOM property entirely, so we don't pretend it can be toggled anymore. > > > > > > > > With regards, > > > > Daniel > > > > > > i am inclined to say you are right and the fix is to fix the impl. > > > > Is there need for users to disable whatchdog at all? > > It was always present since then and no one complained, > > so perhaps we should ditch property instead fixing it > > to keep it simple. > > Thinking about it more, I think we should NOT fix the 'enable_tco' property, > because there will be no way for a mgmt appp to tell if they're using a > fixed or broken QEMU.
This is always the case for any bug. We don't as a rule add properties for this, it is distro's responsibility to pick bugfixes for features users care about. What makes this bug different? > So if they use 'enable_tco' on a broken QEMU and then > live migrate, they'll get an guest ABI change. If we did want to support > disabling it, then we should have a brand new property that apps can probe > for. > > In the absence of a request to disable watchdog, I'd say we just delete > 'enable_tco' right now. If someone wants it in future, we can add it with > a new name. > > With regards, > Daniel I am really stressed about watchdog firing and resetting VMs that previously were working fine. Adding this without a safety mechanism to quickly disable in case of problems in the field is not wise imho. > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|