On Tue, Apr 25, 2023 at 08:42:17PM -0400, Peter Xu wrote: > Hi, Michael, Jonathan, > > On Tue, Mar 07, 2023 at 08:13:53PM -0500, Michael S. Tsirkin wrote: > > From: Jonathan Cameron <jonathan.came...@huawei.com> > > > > This register in AER should be both writeable and should > > have a default value with a couple of the errors masked > > including the Uncorrectable Internal Error used by CXL for > > it's error reporting. > > > > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> > > Reviewed-by: Dave Jiang <dave.ji...@intel.com> > > Message-Id: <20230302133709.30373-2-jonathan.came...@huawei.com> > > Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > Reviewed-by: Fan Ni <fan...@samsung.com> > > --- > > include/hw/pci/pcie_regs.h | 3 +++ > > hw/pci/pcie_aer.c | 4 ++++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h > > index 1fe0bdd25b..4972106c42 100644 > > --- a/include/hw/pci/pcie_regs.h > > +++ b/include/hw/pci/pcie_regs.h > > @@ -141,6 +141,9 @@ typedef enum PCIExpLinkWidth { > > PCI_ERR_UNC_ATOP_EBLOCKED | \ > > PCI_ERR_UNC_TLP_PRF_BLOCKED) > > > > +#define PCI_ERR_UNC_MASK_DEFAULT (PCI_ERR_UNC_INTN | \ > > + PCI_ERR_UNC_TLP_PRF_BLOCKED) > > + > > #define PCI_ERR_UNC_SEVERITY_DEFAULT (PCI_ERR_UNC_DLP | \ > > PCI_ERR_UNC_SDN | \ > > PCI_ERR_UNC_FCP | \ > > diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c > > index 9a19be44ae..909e027d99 100644 > > --- a/hw/pci/pcie_aer.c > > +++ b/hw/pci/pcie_aer.c > > @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, > > uint16_t offset, > > > > pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, > > PCI_ERR_UNC_SUPPORTED); > > + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, > > + PCI_ERR_UNC_MASK_DEFAULT); > > + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, > > + PCI_ERR_UNC_SUPPORTED); > > This breaks the simplest migration from QEMU 8.0->7.2 binaries on all > machine types I think as long as the cap is present, e.g. the default > e1000e provided by the default q35 machine can already hit it with all > default cmdline: > > ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX] > > 7.2 binary will have empty wmask for PCI_ERR_UNCOR_MASK, meanwhile I think > it can also see a non-zero value, then the migration will fail at: > > vmstate_load 0000:00:02.0/e1000e, e1000e > > qemu-7.2: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 > cmask: ff wmask: 0 w1cmask:0 > qemu-7.2: Failed to load PCIDevice:config > qemu-7.2: Failed to load e1000e:parent_obj > > qemu-7.2: error while loading state for instance 0x0 of device > '0000:00:02.0/e1000e' > qemu-7.2: load of migration failed: Invalid argument > > We probably at least want to have the default value to be still zero, and > we'd need to make sure it'll not be modified by the guest, iiuc. > > Below oneliner works for me and makes the migration work again: > > ===8<=== > diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c > index 103667c368..563a37b79c 100644 > --- a/hw/pci/pcie_aer.c > +++ b/hw/pci/pcie_aer.c > @@ -113,7 +113,7 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, > uint16_t offset, > pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, > PCI_ERR_UNC_SUPPORTED); > pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, > - PCI_ERR_UNC_MASK_DEFAULT); > + 0/*PCI_ERR_UNC_MASK_DEFAULT*/); > pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, > PCI_ERR_UNC_SUPPORTED); > ===8<=== > > Anyone could have a look on a solid solution from PCI side? > > Copy Juan and Leonardo. > > Thanks,
My bad, I forgot about this 🤦. So we need a property and tweak it with compat machinery depending on machine type. Jonathan, can you work on this pls? Or I can revert for now to relieve the time pressure, redo the patch at your leasure. > -- > Peter Xu