Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben: > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV > configurations to know the number of VFs being disabled due to SR-IOV > configuration writes, but the logic was flawed and resulted in > out-of-bound memory access. > > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled > VFs, but it actually doesn't in the following cases: > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been. > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set. > - VFs were only partially enabled because of realization failure. > > It is a responsibility of pcie_sriov to interpret SR-IOV configurations > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it > provides, to get the number of enabled VFs before and after SR-IOV > configuration writes. > > Cc: qemu-sta...@nongnu.org > Fixes: CVE-2024-26328 > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management > command") > Suggested-by: Michael S. Tsirkin <m...@redhat.com> > Signed-off-by: Akihiko Odaki <akihiko.od...@daynix.com> > --- > hw/nvme/ctrl.c | 26 ++++++++------------------ > 1 file changed, 8 insertions(+), 18 deletions(-) > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c > index f026245d1e9e..7a56e7b79b4d 100644 > --- a/hw/nvme/ctrl.c > +++ b/hw/nvme/ctrl.c > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev) > nvme_ctrl_reset(n, NVME_RESET_FUNCTION); > } > > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address, > - uint32_t val, int len) > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t > old_num_vfs) > { > NvmeCtrl *n = NVME(dev); > NvmeSecCtrlEntry *sctrl; > - uint16_t sriov_cap = dev->exp.sriov_cap; > - uint32_t off = address - sriov_cap; > - int i, num_vfs; > + int i; > > - if (!sriov_cap) { > - return; > - } > - > - if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) { > - if (!(val & PCI_SRIOV_CTRL_VFE)) { > - num_vfs = pci_get_word(dev->config + sriov_cap + > PCI_SRIOV_NUM_VF); > - for (i = 0; i < num_vfs; i++) { > - sctrl = &n->sec_ctrl_list.sec[i]; > - nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false); > - } > - } > + for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) { > + sctrl = &n->sec_ctrl_list.sec[i]; > + nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false); > } > }
Maybe I'm missing something, but if the concern is that 'i' could run beyond the end of the array, I don't see anything that limits pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec has. register_vfs() seems to just take whatever 16 bit value the guest wrote without imposing additional restrictions. If there is some mechanism that makes register_vf() fail if we exceed the limit, maybe an assertion with a comment would be in order because it doesn't seem obvious. I couldn't find any code that enforces it, sriov_max_vfs only ever seems to be used as a hint for the guest. If not, do we need another check that fails gracefully in the error case? Kevin