On 2024/08/23 6:01, Peter Xu wrote:
On Thu, Aug 22, 2024 at 06:10:43PM +0100, Peter Maydell wrote:
On Thu, 27 Jun 2024 at 14:40, Akihiko Odaki <akihiko.od...@daynix.com> wrote:

A memory region does not use their own reference counters, but instead
piggybacks on another QOM object, "owner" (unless the owner is not the
memory region itself). When creating a subregion, a new reference to the
owner of the container must be created. However, if the subregion is
owned by the same QOM object, this result in a self-reference, and make
the owner immortal. Avoid such a self-reference.

Signed-off-by: Akihiko Odaki <akihiko.od...@daynix.com>
---
  system/memory.c | 11 +++++++++--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 74cd73ebc78b..949f5016a68d 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2638,7 +2638,10 @@ static void 
memory_region_update_container_subregions(MemoryRegion *subregion)

      memory_region_transaction_begin();

-    memory_region_ref(subregion);
+    if (mr->owner != subregion->owner) {
+        memory_region_ref(subregion);
+    }
+
      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
          if (subregion->priority >= other->priority) {
              QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2696,7 +2699,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
          assert(alias->mapped_via_alias >= 0);
      }
      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+
+    if (mr->owner != subregion->owner) {
+        memory_region_unref(subregion);
+    }
+
      memory_region_update_pending |= mr->enabled && subregion->enabled;
      memory_region_transaction_commit();
  }

I was having another look at leaks this week, and I tried
this patch to see how many of the leaks I was seeing it
fixed. I found however that for arm it results in an
assertion when the device-introspection-test exercises
the "imx7.analog" device. By-hand repro:

$ ./build/asan/qemu-system-aarch64 -display none -machine none -accel
qtest -monitor stdio
==712838==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
QEMU 9.0.92 monitor - type 'help' for more information
(qemu) device_add imx7.analog,help
qemu-system-aarch64: ../../system/memory.c:1777: void
memory_region_finalize(Object *): Assertion `!mr->container' failed.
Aborted (core dumped)

It may be well be that this is a preexisting bug that's only
exposed by this refcount change causing us to actually try
to dispose of the memory regions.

I think that what's happening here is that the device
object has multiple MemoryRegions, each of which is a child
QOM property. One of these MRs is a "container MR", and the
other three are actual-content MRs which the device put into
the container when it created them. When we deref the device,
we go through all the child QOM properties unparenting and
unreffing them. However, there's no particular ordering
here, and it happens that we try to unref one of the
actual-content MRs first. That MR is still inside the
container MR, so we hit the assert. If we had happened to
unref the container MR first then memory_region_finalize()
would have removed all the subregions from it, avoiding
the problem.

I'm not sure what the best fix would be here -- that assert
is there as a guard that the region isn't visible in
any address space, so maybe it needs to be made a bit
cleverer about the condition it checks? e.g. in this
example although mr->container is not NULL,
mr->container->container is NULL.

If we keep looking at ->container we'll always see NULL, IIUC, because
either it's removed from its parent MR so it's NULL already, or at some
point it can start to point to a root mr of an address space, where should
also be NULL, afaiu.

Or we could check whether the mr->container->owner is the same as the
mr->owner and allow a non-NULL mr->container in that case.  I don't know
this subsystem well enough so I'm just making random stabs here, though.

If with the assumption of this patch applied, then looks like it's pretty
legal a container MR and the child MRs be finalized in any order when the
owner device is being destroyed.

IIUC the MR should be destined to be invisible until this point, with or
without the fact that mr->container is NULL.  It's because anyone who
references the MR should do memory_region_ref() first, which takes the
owner's refcount.  Here if MR finalize() is reached I think it means the
owner refcount must be zero.  So it looks to me the only possible case when
mr->container is non-NULL is it's used internally like this.  Then it's
invisible and also safe to be detached even if container != NULL.

It is still nice if we can protect internal uses; it is theoretically possible for the owner to remove the subregion at runtime, and we want to ensure the subregion will be kept alive while its container is even in such a case. We can do so by creating a reference to the subregion itself instead of its owner. I sent v4 for this change.

Regards,
Akihiko Odaki

Reply via email to