On 211217 0627, Qiuhao Li wrote: > Thanks Alex. It seems this patch sets and checks if the destination device is > busy. But how about the data transfers not triggered directly by PMIO/MMIO > handlers? For example: > > 1. Device A Timer's callback -> Device A MMIO handler > 2. Device A BH's callback -> Device A MMIO handler > > In these situations, when A launches a DMA to itself, the > dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we > should log the source and check the destination when we launch data > transfers. Is there a way to do that?
Ahh, I forgot that this can happen in BHs as well... I think the only place to do that, without major API changes, is at pci_dma_rw and dma_buffer_rw (where we can catch a glimpse of the PCIDevice/DeviceState). I'll send a V2. -Alex > > Below is a reproducer in NVMe which triggers DMA in a timer's callback > (nvme_process_sq). I can still trigger use-after-free exception with this > patch on qemu-6.1.0: > > cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \ > -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \ > -device nvme,drive=disk0,serial=1 -qtest stdio \ > > outl 0xcf8 0x80000810 /* MLBAR (BAR0) – Memory Register Base > Address, lower 32-bits */ > outl 0xcfc 0xe0000000 /* MMIO Base Address = 0xe0000000 */ > outl 0xcf8 0x80000804 /* CMD - Command */ > outw 0xcfc 0x06 /* Bus Master Enable, Memory Space Enable > */ > write 0xe0000024 0x4 0x02000200 /* [3] 3.1.8, Admin Queue Attributes */ > write 0xe0000028 0x4 0x00100000 /* asq = 0x1000 */ > write 0xe0000030 0x4 0x00200000 /* acq = 0x2000 */ > write 0xe0000014 0x4 0x01004600 /* [3] 3.1.5, Controller Configuration, > start ctrl */ > write 0xe0001000 0x1 0x01 /* [3] 3.1.24, SQyTDBL – Submission Queue > y Tail Doorbell */ > write 0x1000 0x1 0x02 /* cmd->opcode, > NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */ > write 0x1018 0x4 0x140000e0 /* prp1 = 0xe0000014, NVME_REG_CC, > nvme_ctrl_reset() */ > write 0x1028 0x4 0x03000004 /* cmd->cdw10, lid = 3 > NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */ > write 0x1030 0x4 0xfc010000 /* cmd->cdw12 = 0x1fc, Log Page Offset, > trans_len = sizeof(fw_log) - 0x1fc = 4 */ > clock_step > EOF > > CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the > reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556? > > ________________________________ > From: Alexander Bulekov <alx...@bu.edu> > Sent: Friday, December 17, 2021 11:08 > To: qemu-devel@nongnu.org <qemu-devel@nongnu.org> > Cc: Alexander Bulekov <alx...@bu.edu>; Philippe Mathieu-Daudé > <phi...@redhat.com>; Mauro Matteo Cascella <mcasc...@redhat.com>; Qiuhao Li > <qiuhao...@outlook.com>; Peter Xu <pet...@redhat.com>; Jason Wang > <jasow...@redhat.com>; David Hildenbrand <da...@redhat.com>; Gerd Hoffmann > <kra...@redhat.com>; Peter Maydell <peter.mayd...@linaro.org>; Li Qiang > <liq...@gmail.com>; Thomas Huth <th...@redhat.com>; Laurent Vivier > <lviv...@redhat.com>; Bandan Das <b...@redhat.com>; Edgar E . Iglesias > <edgar.igles...@gmail.com>; Darren Kenny <darren.ke...@oracle.com>; Bin Meng > <bin.m...@windriver.com>; Paolo Bonzini <pbonz...@redhat.com>; Stefan > Hajnoczi <stefa...@redhat.com>; Daniel P. Berrangé <berra...@redhat.com>; > Eduardo Habkost <edua...@habkost.net> > Subject: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level > > Here's my shot at fixing dma-reentracy issues. This patch adds a flag to > the DeviceState, which is set/checked when we call an accessor > associated with the device's IO MRs. > > The problem, in short, as I understand it: For the vast majority of > cases, we want to prevent a device from accessing it's own PIO/MMIO > regions over DMA. > > This patch/solution is based on some assumptions: > 1. DMA accesses that hit mmio regions are only dangerous if they end up > interacting with memory-regions belonging to the device initiating the > DMA. > Not dangerous: sdhci_pio->dma_write->e1000_mmio > Dangerous: sdhci_pio->dma_write->sdhci_mmio > > 2. Most devices do not interact with their own PIO/MMIO memory-regions > using DMA. > > 3. There is no way for there to be multiple simultaneous accesses to a > device's PIO/MMIO memory-regions. > > 4. All devices are QOMified :-) > > With this patch, I wasn't able to reproduce the issues being tracked > here, with QTest reproducers: > https://gitlab.com/qemu-project/qemu/-/issues/556 > > This passes the i386 qos/qtests for me and I was able to boot some > linux/windows > VMs with basic devices configured, without any apparent problems. > > Cc: Philippe Mathieu-Daudé <phi...@redhat.com> > Cc: Mauro Matteo Cascella <mcasc...@redhat.com> > Cc: Qiuhao Li <qiuhao...@outlook.com> > Cc: Peter Xu <pet...@redhat.com> > Cc: Jason Wang <jasow...@redhat.com> > Cc: David Hildenbrand <da...@redhat.com> > Cc: Gerd Hoffmann <kra...@redhat.com> > Cc: Peter Maydell <peter.mayd...@linaro.org> > Cc: Li Qiang <liq...@gmail.com> > Cc: Thomas Huth <th...@redhat.com> > Cc: Laurent Vivier <lviv...@redhat.com> > Cc: Bandan Das <b...@redhat.com> > Cc: Edgar E. Iglesias <edgar.igles...@gmail.com> > Cc: Darren Kenny <darren.ke...@oracle.com> > Cc: Bin Meng <bin.m...@windriver.com> > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: Stefan Hajnoczi <stefa...@redhat.com> > > Signed-off-by: Alexander Bulekov <alx...@bu.edu> > --- > include/hw/qdev-core.h | 1 + > softmmu/memory.c | 15 +++++++++++++++ > softmmu/trace-events | 1 + > 3 files changed, 17 insertions(+) > > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h > index 20d3066595..32f7c779ab 100644 > --- a/include/hw/qdev-core.h > +++ b/include/hw/qdev-core.h > @@ -193,6 +193,7 @@ struct DeviceState { > int instance_id_alias; > int alias_required_for_version; > ResettableState reset; > + int engaged_in_direct_io; > }; > > struct DeviceListener { > diff --git a/softmmu/memory.c b/softmmu/memory.c > index 7340e19ff5..255c3c602f 100644 > --- a/softmmu/memory.c > +++ b/softmmu/memory.c > @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr, > uint64_t access_mask; > unsigned access_size; > unsigned i; > + DeviceState *dev = NULL; > MemTxResult r = MEMTX_OK; > > if (!access_size_min) { > @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr, > access_size_max = 4; > } > > + /* Do not allow more than one simultanous access to a device's IO > Regions */ > + if (mr->owner && > + !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) { > + dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE); > + if (dev->engaged_in_direct_io) { > + trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, > size); > + return MEMTX_ERROR; > + } > + dev->engaged_in_direct_io = true; > + } > + > /* FIXME: support unaligned access? */ > access_size = MAX(MIN(size, access_size_max), access_size_min); > access_mask = MAKE_64BIT_MASK(0, access_size * 8); > @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr, > access_mask, attrs); > } > } > + if (dev) { > + dev->engaged_in_direct_io = false; > + } > return r; > } > > diff --git a/softmmu/trace-events b/softmmu/trace-events > index 9c88887b3c..d7228316db 100644 > --- a/softmmu/trace-events > +++ b/softmmu/trace-events > @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t > addr, uint64_t value, u > memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t > value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value > 0x%"PRIx64" size %u name '%s'" > memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, > uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value > 0x%"PRIx64" size %u" > memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, > uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value > 0x%"PRIx64" size %u" > +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, > unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u" > memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, > uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value > 0x%"PRIx64" size %u" > memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, > uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value > 0x%"PRIx64" size %u" > memory_region_sync_dirty(const char *mr, const char *listener, int global) > "mr '%s' listener '%s' synced (global=%d)" > -- > 2.33.0 >