On 06/08/2025 19:44, Akihiko Odaki wrote:
On 2025/08/07 0:29, Laurent Vivier wrote:
A race condition between guest driver actions and QEMU timers can lead
to an assertion failure when the guest switches the e1000e from legacy
interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
RDTR) is active, but the guest enables MSI-X before the timer fires,
the pending interrupt cause can trigger an assert in
`e1000e_intmgr_collect_delayed_causes()`.
The function's assertion (`assert(core->delayed_causes == 0)`)
incorrectly assumes that it's impossible for a legacy delayed interrupt
to be pending once the device is in MSI-X mode.
This behavior is incorrect. On a physical device, a driver-initiated
mode switch would mask interrupts, reconfigure the hardware, and clear
any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
mechanism is used instead. Therefore, any pending interrupt from the
old mode should be ignored.
It is true that triggering assertion is incorrect as per: docs/devel/secure-coding-
practices.rst
However, I don't see statements in the datasheet that says mode switch will clear stale
interrupts.
The datasheet doesn't say "stale interrupts are cleared." but it describes two
fundamentally separate and mutually exclusive hardware paths for generating interrupts:
IntelĀ® 82574 GbE Controller Family Datasheet
https://docs.rs-online.com/96e8/0900766b81384733.pdf
See 7.4.1 Legacy and MSI Interrupt Modes
Legacy/MSI Path: An event (like a packet transmission completing) sets a cause bit in the
ICR (Interrupt Cause Register). The legacy delay timers (TIDV/RDTR) can delay the
propagation of this cause. When the timer expires, if the corresponding bit in the IMS
(Interrupt Mask Set) is enabled, the hardware asserts the INTx pin to signal the CPU.
See 7.4.2 MSI-X Mode
MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped via the IVAR
(Interrupt Vector Allocation Register) to a specific MSI-X vector. The ITR (Interrupt
Throttle Rate) register for that vector then determines if an interrupt should be
generated. If allowed, the hardware performs a memory write to the address specified in
the PCI MSI-X table, delivering the message to a specific CPU core.
Only non-queue causes are reflected in ICR (so not TIDV/RDTR).
The key here is that a TIDV timer expiring is only connected to the legacy ICR->IMS->INTx
path. There is no described hardware path for a TIDV timer expiration to trigger an MSI-X
memory write.
Therefore, if a guest enables MSI-X, it reconfigures its interrupt controller (APIC) to
listen for MSI-X messages, not legacy INTx pin assertions. So, even if the stale TIDV
timer fires on the hardware, the interrupt it generates has no configured path to be
received by the guest OS. From the guest's perspective, the interrupt is lost. Our
emulation should model this by ignoring/clearing the stale event, which is precisely what
the patch does.
The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is also unclear.
Behaving drivers may indeed use ITR for that purpose, but the question for us is: what
will e1000e do when the guest tries to use TIDV/RDTR in MSI-X mode anyway? That defines
the behavior we need to implement.
The TIDV and RDTR registers are part of the device's memory-mapped I/O space. The hardware
will almost certainly allow a write to these registers at any time.
However, based on the separate hardware paths described above, the write would be
ineffective. A driver could set the TIDV timer, and the timer would likely count down, but
its expiration event is only wired to the legacy interrupt generation logic. Since the
device is configured for MSI-X interrupt delivery, that path is dormant. The write is
accepted, but the action is inert.
If the datasheet describes the expected behavior with delayed interrupts in MSI-X, a
reference to the datasheet should be made at least in the patch message. Otherwise,
perhaps this "if (msix_enabled(core->owner))" is just extraneous and should be removed.
The "if (msix_enabled(core->owner))" check is not extraneous and must be kept. It
correctly separates the emulation logic for these two mutually exclusive hardware modes.
Removing it would incorrectly allow a legacy delayed interrupt to be processed as if it
were valid in MSI-X mode, which is not how the hardware works.
Moreover it can introduce unexpected behavior in the guest as this case could
not be managed.
Thanks,
Laurent