On Mon, Jul 28, 2025 at 12:09:34PM +0200, Philippe Mathieu-Daudé wrote: > Hi Akihiko, Michael, > > On 27/7/25 08:50, Akihiko Odaki wrote: > > Fix issues in PCIe SR-IOV configuration register handling that caused > > inconsistent internal state due to improper write mask handling and > > incorrect migration behavior. > > > > Two main problems were identified: > > > > 1. VF Enable bit write mask handling: > > pcie_sriov_config_write() incorrectly assumed that its val parameter > > was already masked, causing it to ignore the actual write mask. > > This led to the VF Enable bit being processed even when masked, > > resulting in incorrect VF registration/unregistration. It is > > identified as CVE-2025-54567. > > > > 2. Migration state inconsistency: > > pcie_sriov_pf_post_load() unconditionally called register_vfs() > > regardless of the VF Enable bit state, creating inconsistent > > internal state when VFs should not be enabled. Additionally, > > it failed to properly update the NumVFs write mask based on > > the current configuration. It is identified as CVE-2025-54566. > > > > Root cause analysis revealed that both functions relied on incorrect > > special-case assumptions instead of properly reading and consuming > > the actual configuration values. This change introduces a unified > > consume_config() function that reads actual configuration values and > > synchronize the internal state without special-case assumptions. > > > > The solution only adds register read overhead in non-hot-path code > > while ensuring correct SR-IOV state management across configuration > > writes and migration scenarios. > > > > Fixes: 5e7dd17e4348 ("pcie_sriov: Remove num_vfs from PCIESriovPF") > > Fixes: f9efcd47110d ("pcie_sriov: Register VFs after migration") > > Fixes: CVE-2025-54566 > > Fixes: CVE-2025-54567 > > Cc: qemu-sta...@nongnu.org > > Reported-by: Corentin BAYET <corentin.ba...@reversetactics.com> > > Signed-off-by: Akihiko Odaki <od...@rsg.ci.i.u-tokyo.ac.jp> > > --- > > Changes in v2: > > - Changed to perform the VFEnable write mask update only when the bit is > > cleared. It clarifies the intention is to prevent setting the bit > > (i.e., the bit is currently cleared) when the NumVF holds an invalid > > value. The code execution when the bit is set will be also a bit > > shorter. > > - Added references to the relevant CVEs. > > - Link to v1: > > https://lore.kernel.org/qemu-devel/20250713-wmask-v1-1-4c744cdb3...@rsg.ci.i.u-tokyo.ac.jp > > --- > > hw/pci/pcie_sriov.c | 42 +++++++++++++++++++++++------------------- > > 1 file changed, 23 insertions(+), 19 deletions(-) > > > > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c > > index > > 3ad18744f4a8ed2b35144fafcdc8e7e00fec3672..8a4bf0d6f7c0c6e9ec30df2e9bc55967e48cf6c3 > > 100644 > > --- a/hw/pci/pcie_sriov.c > > +++ b/hw/pci/pcie_sriov.c > > @@ -64,6 +64,27 @@ static void unregister_vfs(PCIDevice *dev) > > pci_set_word(dev->wmask + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, > > 0xffff); > > } > > +static void consume_config(PCIDevice *dev) > > +{ > > + uint8_t *cfg = dev->config + dev->exp.sriov_cap; > > + > > + if (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) { > > + register_vfs(dev); > > + } else { > > + uint8_t *wmask = dev->wmask + dev->exp.sriov_cap; > > + uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF); > > + uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI; > > + > > + unregister_vfs(dev); > > + > > + if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) { > > + wmask_val |= PCI_SRIOV_CTRL_VFE; > > + } > > + > > + pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val); > > + } > > +} > > + > > static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset, > > uint16_t vf_dev_id, uint16_t > > init_vfs, > > uint16_t total_vfs, uint16_t > > vf_offset, > > @@ -416,30 +437,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t > > address, > > trace_sriov_config_write(dev->name, PCI_SLOT(dev->devfn), > > PCI_FUNC(dev->devfn), off, val, len); > > - if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) { > > - if (val & PCI_SRIOV_CTRL_VFE) { > > - register_vfs(dev); > > - } else { > > - unregister_vfs(dev); > > - } > > - } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) { > > - uint8_t *cfg = dev->config + sriov_cap; > > - uint8_t *wmask = dev->wmask + sriov_cap; > > - uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF); > > - uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI; > > - > > - if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) { > > - wmask_val |= PCI_SRIOV_CTRL_VFE; > > - } > > - > > - pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val); > > - } > > + consume_config(dev); > > As usual, this would be simpler to review extracting consume_config() in > a preliminary patch, then the real fix. > > > } > > void pcie_sriov_pf_post_load(PCIDevice *dev) > > { > > if (dev->exp.sriov_cap) { > > - register_vfs(dev); > > + consume_config(dev); > > } > > } > > Michael, do we want this for the 10.1 release? > > Regards, > > Phil.
Think so, I'm testing a pull with this included.