On 2025/08/07 0:29, Laurent Vivier wrote:
A race condition between guest driver actions and QEMU timers can lead
to an assertion failure when the guest switches the e1000e from legacy
interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
RDTR) is active, but the guest enables MSI-X before the timer fires,
the pending interrupt cause can trigger an assert in
`e1000e_intmgr_collect_delayed_causes()`.
The function's assertion (`assert(core->delayed_causes == 0)`)
incorrectly assumes that it's impossible for a legacy delayed interrupt
to be pending once the device is in MSI-X mode.
This behavior is incorrect. On a physical device, a driver-initiated
mode switch would mask interrupts, reconfigure the hardware, and clear
any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
mechanism is used instead. Therefore, any pending interrupt from the
old mode should be ignored.
It is true that triggering assertion is incorrect as per:
docs/devel/secure-coding-practices.rst
However, I don't see statements in the datasheet that says mode switch
will clear stale interrupts.
The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is
also unclear. Behaving drivers may indeed use ITR for that purpose, but
the question for us is: what will e1000e do when the guest tries to use
TIDV/RDTR in MSI-X mode anyway? That defines the behavior we need to
implement.
If the datasheet describes the expected behavior with delayed interrupts
in MSI-X, a reference to the datasheet should be made at least in the
patch message. Otherwise, perhaps this "if (msix_enabled(core->owner))"
is just extraneous and should be removed.
Regards,
Akihiko Odaki
Replace the overly strict assertion with a statement that clears any
stale `delayed_causes`. This correctly models the hardware's behavior
of discarding obsolete interrupt events during a mode change and
prevents the QEMU process from terminating.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1863
Signed-off-by: Laurent Vivier <lviv...@redhat.com>
---
hw/net/e1000e_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 24138587905b..d0ec892488d4 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -342,7 +342,7 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
uint32_t res;
if (msix_enabled(core->owner)) {
- assert(core->delayed_causes == 0);
+ core->delayed_causes = 0;
return 0;
}