Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump

Thorsten Kohfeldt Sat, 22 Oct 2016 02:15:48 -0700

Hi *,

this came to my mind when browsing the sources in the patch's vicinity.


It is just a collection of thoughts, so please don't feel offended
about how I phrased certain statements.


Questions

Is mr->opaque always unused ?
i.e. should we assert NULL before assignment ?

mr->ops vs. mr->iommu_ops
i.e. can we set mr->opaque if mr->iommu_ops is not NULL ?
or should we even assert mr->iommu_ops NULL, because a
skip_dump mr is not supposed to be addr-translated again ?

There is a _shared_ 'io_mem_unassigned' mr.
Are we in danger to modify it ?
Would that hurt ?

Are we generally switching mrops "back and forth",
or is this a first ?

Can we afford not to implement size 8 or should we rather
force 8 -> 2*4 by setting specific mrop flags if possible ?
Or just hard code case 8: handle longword[1]; fallthru 4:

When/where is memory_region_set_skip_dump() (supposed to be) called ?


Recommendations

Add comment in skip_dump_mem_read/write NOT to support 64b,
because an error will not be recognised unless specific HW is present
(maybe even give examples of specific HW combinations).

Add comments at more code locations that are break-subpage/mmap-sensible.
For example default vfio slow path mrops should also not support 64b ?

Add a trace message for each mrop.


Additional patch suggestion(s)

During former investigations I found it not easy to
identify runtime active/current mrops per mr, so:
Add .name to mr->ops/iommu_ops
    to be able to mon-list them together with mr names
OR
(this questions flag reuse/overlay)
skip_dump_flag should rather get a brother
    so (unamed) ops can be easily concluded for listing ?
    But is this the only mr<->mrop ambiguosity ?


Regards,

Thorsten


Am 21.10.2016 um 19:11 schrieb Alex Williamson:

With a vfio assigned device we lay down a base MemoryRegion registered
as an IO region, giving us read & write accessors.  If the region
supports mmap, we lay down a higher priority sub-region MemoryRegion
on top of the base layer initialized as a RAM pointer to the mmap.
Finally, if we have any quirks for the device (ie. address ranges that
need additional virtualization support), we put another IO sub-region
on top of the mmap MemoryRegion.  When this is flattened, we now
potentially have sub-page mmap MemoryRegions exposed which cannot be
directly mapped through KVM.

This is as expected, but a subtle detail of this is that we end up
with two different access mechanisms through QEMU.  If we disable the
mmap MemoryRegion, we make use of the IO MemoryRegion and service
accesses using pread and pwrite to the vfio device file descriptor.
If the mmap MemoryRegion is enabled and we end up in one of these
sub-page gaps, QEMU handles the access as RAM, using memcpy to the
mmap.  Using the mmap through QEMU is a subtle difference, but it's
fine, the problem is the memcpy.  My assumption is that memcpy makes
no guarantees about access width and potentially uses all sorts of
optimized memory transfers that are not intended for talking to device
MMIO.  It turns out that this has been a problem for Realtek NIC
assignment, which has such a quirk that creates a sub-page mmap
MemoryRegion access.

My proposal to fix this is to leverage the skip_dump flag that we
already use for special handling of these device-backed MMIO ranges.
When skip_dump is set for a MemoryRegion, we mark memory access as
non-direct and automatically insert MemoryRegionOps with basic
semantics to handle accesses.  Note that we only enable dword
accesses because some devices don't particularly like qword accesses
(Realtek NICs are such a device).  This actually also fixes memory
inspection via the xp command in the QEMU monitor as well.

Please comment.  Is this the best way to solve this problem?  Thanks

Reported-by: Thorsten Kohfeldt <thorsten.kohfe...@gmx.de>
Signed-off-by: Alex Williamson <alex.william...@redhat.com>
---
 include/exec/memory.h |    6 ++++--
 memory.c              |   44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 10d7eac..a4c3acf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1464,9 +1464,11 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr);
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
-        return memory_region_is_ram(mr) && !mr->readonly;
+        return memory_region_is_ram(mr) &&
+               !mr->readonly && !memory_region_is_skip_dump(mr);
     } else {
-        return memory_region_is_ram(mr) || memory_region_is_romd(mr);
+        return (memory_region_is_ram(mr) && !memory_region_is_skip_dump(mr)) ||
+               memory_region_is_romd(mr);
     }
 }

diff --git a/memory.c b/memory.c
index 58f9269..7ed7ca9 100644
--- a/memory.c
+++ b/memory.c
@@ -1136,6 +1136,46 @@ const MemoryRegionOps unassigned_mem_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };

+static uint64_t skip_dump_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t val = (uint64_t)~0;
+
+    switch (size) {
+    case 1:
+        val = *(uint8_t *)(opaque + addr);
+        break;
+    case 2:
+        val = *(uint16_t *)(opaque + addr);
+        break;
+    case 4:
+        val = *(uint32_t *)(opaque + addr);
+        break;
+    }
+
+    return val;
+}
+
+static void skip_dump_mem_write(void *opaque, hwaddr addr, uint64_t data, 
unsigned size)
+{
+    switch (size) {
+    case 1:
+        *(uint8_t *)(opaque + addr) = (uint8_t)data;
+        break;
+    case 2:
+        *(uint16_t *)(opaque + addr) = (uint16_t)data;
+        break;
+    case 4:
+        *(uint32_t *)(opaque + addr) = (uint32_t)data;
+        break;
+    }
+}
+
+const MemoryRegionOps skip_dump_mem_ops = {
+    .read = skip_dump_mem_read,
+    .write = skip_dump_mem_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 bool memory_region_access_valid(MemoryRegion *mr,
                                 hwaddr addr,
                                 unsigned size,
@@ -1366,6 +1406,10 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 void memory_region_set_skip_dump(MemoryRegion *mr)
 {
     mr->skip_dump = true;
+    if (mr->ram && mr->ops == &unassigned_mem_ops) {
+        mr->ops = &skip_dump_mem_ops;
+        mr->opaque = mr->ram_block->host;
+    }
 }

 void memory_region_init_alias(MemoryRegion *mr,

Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump

Reply via email to