On 28.10.21 16:36, Roger Pau Monné wrote:
> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>>
>> On 27.10.21 20:35, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com>
>>>>
>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>> the base address may not be aligned in the way that the translation
>>>> always work. If not adjusted with respect to the base address it may not be
>>>> able to properly convert SBDF and crashes:
>>>>
>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>> I can't find a printk() that may output this message. Where does this comes 
>>> from?
>> That was a debug print. I shouldn't have used that in the patch description, 
>> but
>> probably after "---" to better explain what's happening
>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not 
>>> mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>> This is from dom0 I am working on now.
>>> IMHO, the stack trace should come from usptream Xen or need some 
>>> information to explain how this was reproduced.
>>>
>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in 
>>> theory) not get the correct BDF. But... I don't understand how this would 
>>> result to a data abort in the hypervisor.
>>>
>>> In fact, I think the vPCI code should be resilient enough to not crash if 
>>> we pass the wrong BDF.
>> Well, there is no (?) easy way to validate SBDF. And this could be a problem 
>> if we have a misbehaving
>> guest which may force Xen to access the memory beyond that of PCI host bridge
> How could that be? The ECAM region exposed to the guest you should be
> the same as the physical one for dom0?
Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
implement the driver for it, so I can be wrong here):
- Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
- "Client" ECAM area ("config")
So from Dom0 POV we have 2 ECAM regions and for the guest
we always emulate a single big region:
/*
  * 256 MB is reserved for VPCI configuration space based on calculation
  * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
  */
#define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
#define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)

So, we have the base address and size of the emulated ECAM space
not connected to the real host bridge
>
> And for domUs you really need to fix vpci_{read,write} to not
> passthrough accesses not explicitly handled.
Do you mean that we need to validate SBDFs there?
This can be tricky if we have a use-case when a PCI device being
passed through if not put at 0000:00:0.0, but requested to be, for
example, 0000:0d:0.0. So, we need to go over the list of virtual
devices and see if SBDF the guest is trying to access is a valid SBDF.
Is this what you mean?
>
>>> When there is a data abort in Xen, you should get a stack trace from where 
>>> this comes from. Can you paste it here?
>> (XEN) Data Abort Trap. Syndrome=0x6
>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>> (XEN) 0TH[0x0] = 0x00000000481d4f7f
>> (XEN) 1ST[0x1] = 0x00000000481d2f7f
>> (XEN) 2ND[0x33] = 0x0000000000000000
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> (XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
>> (XEN) LR:     000000000026d36c
>> (XEN) SP:     000080007ff97c00
>> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
>> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
>> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
>> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
>> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
>> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
>> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
>> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
>> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
>> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
>> (XEN)
>> (XEN)   VTCR_EL2: 00000000800a3558
>> (XEN)  VTTBR_EL2: 00010000bffba000
>> (XEN)
>> (XEN)  SCTLR_EL2: 0000000030cd183d
>> (XEN)    HCR_EL2: 00000000807c663f
>> (XEN)  TTBR0_EL2: 00000000481d5000
>> (XEN)
>> (XEN)    ESR_EL2: 0000000096000006
>> (XEN)  HPFAR_EL2: 0000000000e65d00
>> (XEN)    FAR_EL2: 00000000467a28bc
>> (XEN)
>> [snip]
>> (XEN) Xen call trace:
>> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
>> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
>> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
>> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
>> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
>> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
>> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
>> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
>> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
>> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
>> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
>> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> (XEN) ****************************************
> Are you exposing an ECAM region to the guest bigger than the
> underlying one, and that's why you get crashes? (because you get out of
> the hardware range)
Please see above
> I would assume physical accesses to the ECAM area reported by the
> hardware don't trigger traps?
No
>
> Roger.
Thank you,
Oleksandr

Reply via email to