On Mon, Jul 28, 2025 at 12:09:34PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Akihiko, Michael,
> 
> On 27/7/25 08:50, Akihiko Odaki wrote:
> > Fix issues in PCIe SR-IOV configuration register handling that caused
> > inconsistent internal state due to improper write mask handling and
> > incorrect migration behavior.
> > 
> > Two main problems were identified:
> > 
> > 1. VF Enable bit write mask handling:
> >     pcie_sriov_config_write() incorrectly assumed that its val parameter
> >     was already masked, causing it to ignore the actual write mask.
> >     This led to the VF Enable bit being processed even when masked,
> >     resulting in incorrect VF registration/unregistration. It is
> >     identified as CVE-2025-54567.
> > 
> > 2. Migration state inconsistency:
> >     pcie_sriov_pf_post_load() unconditionally called register_vfs()
> >     regardless of the VF Enable bit state, creating inconsistent
> >     internal state when VFs should not be enabled. Additionally,
> >     it failed to properly update the NumVFs write mask based on
> >     the current configuration. It is identified as CVE-2025-54566.
> > 
> > Root cause analysis revealed that both functions relied on incorrect
> > special-case assumptions instead of properly reading and consuming
> > the actual configuration values. This change introduces a unified
> > consume_config() function that reads actual configuration values and
> > synchronize the internal state without special-case assumptions.
> > 
> > The solution only adds register read overhead in non-hot-path code
> > while ensuring correct SR-IOV state management across configuration
> > writes and migration scenarios.
> > 
> > Fixes: 5e7dd17e4348 ("pcie_sriov: Remove num_vfs from PCIESriovPF")
> > Fixes: f9efcd47110d ("pcie_sriov: Register VFs after migration")
> > Fixes: CVE-2025-54566
> > Fixes: CVE-2025-54567
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Corentin BAYET <corentin.ba...@reversetactics.com>
> > Signed-off-by: Akihiko Odaki <od...@rsg.ci.i.u-tokyo.ac.jp>
> > ---
> > Changes in v2:
> > - Changed to perform the VFEnable write mask update only when the bit is
> >    cleared. It clarifies the intention is to prevent setting the bit
> >    (i.e., the bit is currently cleared) when the NumVF holds an invalid
> >    value. The code execution when the bit is set will be also a bit
> >    shorter.
> > - Added references to the relevant CVEs.
> > - Link to v1: 
> > https://lore.kernel.org/qemu-devel/20250713-wmask-v1-1-4c744cdb3...@rsg.ci.i.u-tokyo.ac.jp
> > ---
> >   hw/pci/pcie_sriov.c | 42 +++++++++++++++++++++++-------------------
> >   1 file changed, 23 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> > index 
> > 3ad18744f4a8ed2b35144fafcdc8e7e00fec3672..8a4bf0d6f7c0c6e9ec30df2e9bc55967e48cf6c3
> >  100644
> > --- a/hw/pci/pcie_sriov.c
> > +++ b/hw/pci/pcie_sriov.c
> > @@ -64,6 +64,27 @@ static void unregister_vfs(PCIDevice *dev)
> >       pci_set_word(dev->wmask + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 
> > 0xffff);
> >   }
> > +static void consume_config(PCIDevice *dev)
> > +{
> > +    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> > +
> > +    if (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) {
> > +        register_vfs(dev);
> > +    } else {
> > +        uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> > +        uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > +        uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > +
> > +        unregister_vfs(dev);
> > +
> > +        if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > +            wmask_val |= PCI_SRIOV_CTRL_VFE;
> > +        }
> > +
> > +        pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > +    }
> > +}
> > +
> >   static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset,
> >                                         uint16_t vf_dev_id, uint16_t 
> > init_vfs,
> >                                         uint16_t total_vfs, uint16_t 
> > vf_offset,
> > @@ -416,30 +437,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t 
> > address,
> >       trace_sriov_config_write(dev->name, PCI_SLOT(dev->devfn),
> >                                PCI_FUNC(dev->devfn), off, val, len);
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (val & PCI_SRIOV_CTRL_VFE) {
> > -            register_vfs(dev);
> > -        } else {
> > -            unregister_vfs(dev);
> > -        }
> > -    } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
> > -        uint8_t *cfg = dev->config + sriov_cap;
> > -        uint8_t *wmask = dev->wmask + sriov_cap;
> > -        uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > -        uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > -
> > -        if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > -            wmask_val |= PCI_SRIOV_CTRL_VFE;
> > -        }
> > -
> > -        pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > -    }
> > +    consume_config(dev);
> 
> As usual, this would be simpler to review extracting consume_config() in
> a preliminary patch, then the real fix.
> 
> >   }
> >   void pcie_sriov_pf_post_load(PCIDevice *dev)
> >   {
> >       if (dev->exp.sriov_cap) {
> > -        register_vfs(dev);
> > +        consume_config(dev);
> >       }
> >   }
> 
> Michael, do we want this for the 10.1 release?
> 
> Regards,
> 
> Phil.

Think so, I'm testing a pull with this included.


Reply via email to