Hi Ard, Laszlo,

Greetings and thanks for your time to help investigate the issue. Finally we found it is caused by KVM and fixed by this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?id=2113c5f62b7423e4a72b890bd479704aa85c81ba


KVM: arm/arm64: Only skip MMIO insn once
If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: Mark Rutland <mark.rutl...@arm.com> Signed-off-by: Andrew Jones <drjo...@redhat.com> Signed-off-by: Marc Zyngier <m...@kernel.org>

Before this patch, MMIO instructions may be skipped more than once when VCPU is requested to exit immediately, and mmio32 assembly function is not far from mmio16...

Thanks,

Heyi

On 2019/8/23 2:56, Laszlo Ersek wrote:
On 08/22/19 11:24, Ard Biesheuvel wrote:
On Thu, 22 Aug 2019 at 10:40, Zhanghailiang
<zhang.zhanghaili...@huawei.com> wrote:
Hi All,



We caught an ‘Synchronous Exception’ error while booting VM with uefi firmware 
in the avocado-vt tests.

The Edk2 version we used is edk2-stable201905. The qemu version is qemu-4.0.0 
and kernel version is 4.19.0.

Parts of the log we got from serial is bellow, you can get the full log from 
attachment.

We can easily reproduce this issue with running avocado-vt tests. Actually, we 
tried the new edk2 from upstream,

It is still can be reproduced.



Reproduce command:

# avocado run type_specific.io-github-autotest-qemu.qmp_event_notification 
--vt-type qemu --vt-guest-os Guest.Linux.Fedora.29



Qemu command is :

..
It reports that this is a alignment fault from log, We analyzed the callstack 
from log:

VirtioScsiPassThru-> 
VirtioFlush->virtio10SetQueueNotify->Virtio10Transfer->PciIoMemWrite-> 
CpuMemoryServiceWrite-> MmioWrite32 <- here, the address is not align.

The faulting address ends in 0x16, so the access is to the QueueSelect
field in VIRTIO_PCI_COMMON_CFG. This is a UINT16 field, so the access
should be 16-bit not 32-bits wide.

Could you dump the instructions leading up to the first
Virtio10Transfer() call in Virtio10SetQueueNotify()? (from
Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/OvmfPkg/Virtio10Dxe/Virtio10/DEBUG/Virtio10.dll)

     2280:       aa0103e5        mov     x5, x1
     2284:       d2800044        mov     x4, #0x2                        // #2
     2288:       d28002c3        mov     x3, #0x16                       // #22
     228c:       52800002        mov     w2, #0x0                        // #0
     2290:       aa0003e1        mov     x1, x0
     2294:       aa0603e0        mov     x0, x6
     2298:       97fffcf3        bl      1664 <Virtio10Transfer>

If the size is passed correctly here, we'll have to track down how the
call gets routed to Mmio32Write instead of Mmio16Write(). Do you have
any patches on top of edk2-stable-201905 ?
Right -- checking the "QueueSelect" (whole word) references in
Virtio10SetQueueNotify(), the "FieldSize" arguments passed to
Virtio10Transfer() are:

- sizeof SavedQueueSelect
- sizeof Index
- sizeof SavedQueueSelect

and both "SavedQueueSelect" and "Index" are of type UINT16.

Virtio10Transfer() maps (FieldSize==2) to "EfiPciIoWidthUint16".

PciIoMemWrite() can only decrease "Width" (provided
"PcdUnalignedPciIoEnable" is set to TRUE -- which is not the case in
ArmVirtPkg). So "Width" is passed to RootBridgeIoMemWrite() unchanged,
in "MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c".

The latter passes "Width" unchanged to CpuMemoryServiceWrite(), in
"ArmPkg/Drivers/ArmPciCpuIo2Dxe/ArmPciCpuIo2Dxe.c".

That function seems to set "OperationWidth" to "EfiCpuIoWidthUint16"
(value 1, unchanged), which should result in a call to MmioWrite16()...


I have a different question. We recently saw a bunch of Synchronous
Exceptions, but those were not deterministic. Whenever they fired (which
was not always), they popped up in different spots. It turned out to be
a KVM regression, apparently a problem with the vtimer. I believe it was
fixed by a backport of upstream commit 6bc210003dff ("KVM: arm/arm64:
Don't emulate virtual timers on userspace ioctls", 2019-04-25). I could
be totally off-target, of course.

(The RHBZ is <https://bugzilla.redhat.com/show_bug.cgi?id=1720125>, but
*of course* it has to be a private bug; it was reported for the kernel
after all! /s)

Thanks
Laszlo




.




-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#46528): https://edk2.groups.io/g/devel/message/46528
Mute This Topic: https://groups.io/mt/32987799/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to