Re: [PATCH v5 01/15] ui & main loop: Redesign of system-specific main thread event handling

2024-10-31 Thread Akihiko Odaki

On 2024/10/30 5:58, Phil Dennis-Jordan wrote:

macOS's Cocoa event handling must be done on the initial (main) thread
of the process. Furthermore, if library or application code uses
libdispatch, the main dispatch queue must be handling events on the main
thread as well.

So far, this has affected Qemu in both the Cocoa and SDL UIs, although
in different ways: the Cocoa UI replaces the default qemu_main function
with one that spins Qemu's internal main event loop off onto a
background thread. SDL (which uses Cocoa internally) on the other hand
uses a polling approach within Qemu's main event loop. Events are
polled during the SDL UI's dpy_refresh callback, which happens to run
on the main thread by default.

As UIs are mutually exclusive, this works OK as long as nothing else
needs platform-native event handling. In the next patch, a new device is
introduced based on the ParavirtualizedGraphics.framework in macOS.
This uses libdispatch internally, and only works when events are being
handled on the main runloop. With the current system, it works when
using either the Cocoa or the SDL UI. However, it does not when running
headless. Moreover, any attempt to install a similar scheme to the
Cocoa UI's main thread replacement fails when combined with the SDL
UI.

This change tidies up main thread management to be more flexible.

  * The qemu_main global function pointer is a custom function for the
main thread, and it may now be NULL. When it is, the main thread
runs the main Qemu loop. This represents the traditional setup.
  * When non-null, spawning the main Qemu event loop on a separate
thread is now done centrally rather than inside the Cocoa UI code.
  * For most platforms, qemu_main is indeed NULL by default, but on
Darwin, it defaults to a function that runs the CFRunLoop.
  * The Cocoa UI sets qemu_main to a function which runs the
NSApplication event handling runloop, as is usual for a Cocoa app.
  * The SDL UI overrides the qemu_main function to NULL, thus
specifying that Qemu's main loop must run on the main
thread.
  * For other UIs, or in the absence of UIs, the platform's default
behaviour is followed.

This means that on macOS, the platform's runloop events are always
handled, regardless of chosen UI. The new PV graphics device will
thus work in all configurations. There is no functional change on other
operating systems.

Signed-off-by: Phil Dennis-Jordan 
---

v5:

  * Simplified the way of setting/clearing the main loop by going back
to setting qemu_main directly, but narrowing the scope of what it
needs to do, and it can now be NULL.

  include/qemu-main.h |  3 +--
  include/qemu/typedefs.h |  1 +
  system/main.c   | 56 +++
  ui/cocoa.m  | 58 +++--
  ui/sdl2.c   |  4 +++
  5 files changed, 72 insertions(+), 50 deletions(-)

diff --git a/include/qemu-main.h b/include/qemu-main.h
index 940960a7dbc..4bd0d667edc 100644
--- a/include/qemu-main.h
+++ b/include/qemu-main.h
@@ -5,7 +5,6 @@
  #ifndef QEMU_MAIN_H
  #define QEMU_MAIN_H
  
-int qemu_default_main(void);

-extern int (*qemu_main)(void);
+extern qemu_main_fn qemu_main;
  
  #endif /* QEMU_MAIN_H */

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3d84efcac47..b02cfe1f328 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -131,5 +131,6 @@ typedef struct IRQState *qemu_irq;
   * Function types
   */
  typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
+typedef int (*qemu_main_fn)(void);
  
  #endif /* QEMU_TYPEDEFS_H */

diff --git a/system/main.c b/system/main.c
index 9b91d21ea8c..8c90b8d2ddf 100644
--- a/system/main.c
+++ b/system/main.c
@@ -24,13 +24,14 @@
  
  #include "qemu/osdep.h"

  #include "qemu-main.h"
+#include "qemu/main-loop.h"
  #include "sysemu/sysemu.h"
  
-#ifdef CONFIG_SDL

-#include 
+#ifdef CONFIG_DARWIN
+#include 
  #endif
  
-int qemu_default_main(void)

+static int qemu_default_main(void)
  {
  int status;
  
@@ -40,10 +41,55 @@ int qemu_default_main(void)

  return status;
  }
  
-int (*qemu_main)(void) = qemu_default_main;

+/*
+ * Various macOS system libraries, including the Cocoa UI and anything using
+ * libdispatch, such as ParavirtualizedGraphics.framework, requires that the
+ * main runloop, on the main (initial) thread be running or at least regularly
+ * polled for events. A special mode is therefore supported, where the QEMU
+ * main loop runs on a separate thread and the main thread handles the
+ * CF/Cocoa runloop.
+ */
+
+static void *call_qemu_default_main(void *opaque)
+{
+int status;
+
+bql_lock();
+status = qemu_default_main();
+bql_unlock();
+
+exit(status);
+}
+
+static void qemu_run_default_main_on_new_thread(void)
+{
+QemuThread thread;
+
+qemu_thread_create(&thread, "qemu_main", call_qemu_default_main,
+   NULL, QEMU_THREAD_DETACHED);


Th

Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

2024-10-31 Thread Akihiko Odaki

On 2024/10/30 6:16, Phil Dennis-Jordan wrote:



On Tue, 29 Oct 2024 at 08:42, Akihiko Odaki > wrote:


On 2024/10/29 6:06, Phil Dennis-Jordan wrote:
 >
 >
 > On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki
mailto:akihiko.od...@daynix.com>
 > >> wrote:
 >
 >     On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
 >      >
 >      >
 >      > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki
 >     mailto:akihiko.od...@daynix.com>
>
 >      > 
 >           >
 >      >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
 >      >      >
 >      >      >
 >      >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
 >      >     mailto:p...@philjordan.eu>
>
 >     
>>
 >      >      >  >
 >     
 wrote:
 >      >      >
 >      >      >
 >      >      >          >      >
 >      >      >          >      > Hmm. I think if we were to use
that, we
 >     would
 >      >     need to
 >      >      >         create a new
 >      >      >          >      > QemuEvent for every job and
destroy it
 >     afterward,
 >      >      >         which seems
 >      >      >          >     expensive.
 >      >      >          >      > We can't rule out multiple concurrent
 >     jobs being
 >      >      >         submitted, and the
 >      >      >          >      > QemuEvent system only supports a
single
 >     producer as
 >      >      >         far as I can
 >      >      >          >     tell.
 >      >      >          >      >
 >      >      >          >      > You can probably sort of hack
around it with
 >      >     just one
 >      >      >         QemuEvent by
 >      >      >          >      > putting the qemu_event_wait into
a loop
 >     and turning
 >      >      >         the job.done
 >      >      >          >     flag
 >      >      >          >      > into an atomic (because it would now
 >     need to be
 >      >      >         checked outside the
 >      >      >          >      > lock) but this all seems
unnecessarily
 >     complicated
 >      >      >         considering the
 >      >      >          >      > QemuEvent uses the same mechanism
QemuCond/
 >      >     QemuMutex
 >      >      >         internally
 >      >      >          >     on macOS
 >      >      >          >      > (the only platform relevant
here), except we
 >      >     can use it as
 >      >      >          >     intended with
 >      >      >          >      > QemuCond/QemuMutex rather than
having to
 >     work
 >      >     against the
 >      >      >          >     abstraction.
 >      >      >          >
 >      >      >          >     I don't think it's going to be used
 >     concurrently. It
 >      >      >         would be difficult
 >      >      >          >     to reason even for the framework if it
 >     performs memory
 >      >      >          >     unmapping/mapping/reading operations
 >     concurrently.
 >      >      >          >
 >      >      >          >
 >      >      >          > I've just performed a very quick test by
 >     wrapping the job
 >      >      >         submission/
 >      >      >          > wait in the 2 mapMemory callbacks and the 1
 >     readMemory
 >      >      >         callback with
 >      >      >          > atomic counters and logging whenever a
counter went
 >      >     above 1.
 >      >      >          >
 >      >      >          >   * Overall, concurrent callbacks across all
 >     types were
 >      >      >         common (many per
 >      >      >          > second when the VM is busy). It's not
exactly a
 >      >     "thundering
 >      >      >         herd" (I
 >      >      >          > never saw >2) but it's probably not a
bad idea
 >     to use
 >      >     a separate
 >      >      >          > condition variable for each job type.
(task map,
 >      >     surface map,
 >      >      >         memory read)
 >     

Re: [PATCH v5 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

2024-10-31 Thread Akihiko Odaki

On 2024/10/30 5:58, Phil Dennis-Jordan wrote:

MacOS provides a framework (library) that allows any vmm to implement a
paravirtualized 3d graphics passthrough to the host metal stack called
ParavirtualizedGraphics.Framework (PVG). The library abstracts away
almost every aspect of the paravirtualized device model and only provides
and receives callbacks on MMIO access as well as to share memory address
space between the VM and PVG.

This patch implements a QEMU device that drives PVG for the VMApple
variant of it.

Signed-off-by: Alexander Graf 
Co-authored-by: Alexander Graf 

Subsequent changes:

  * Cherry-pick/rebase conflict fixes, API use updates.
  * Moved from hw/vmapple/ (useful outside that machine type)
  * Overhaul of threading model, many thread safety improvements.
  * Asynchronous rendering.
  * Memory and object lifetime fixes.
  * Refactoring to split generic and (vmapple) MMIO variant specific
code.

Signed-off-by: Phil Dennis-Jordan 
---
v2:

  * Cherry-pick/rebase conflict fixes
  * BQL function renaming
  * Moved from hw/vmapple/ (useful outside that machine type)
  * Code review comments: Switched to DEFINE_TYPES macro & little endian
MMIO.
  * Removed some dead/superfluous code
  * Mad set_mode thread & memory safe
  * Added migration blocker due to lack of (de-)serialisation.
  * Fixes to ObjC refcounting and autorelease pool usage.
  * Fixed ObjC new/init misuse
  * Switched to ObjC category extension for private property.
  * Simplified task memory mapping and made it thread safe.
  * Refactoring to split generic and vmapple MMIO variant specific
code.
  * Switched to asynchronous MMIO writes on x86-64
  * Rendering and graphics update are now done asynchronously
  * Fixed cursor handling
  * Coding convention fixes
  * Removed software cursor compositing

v3:

  * Rebased on latest upstream, fixed breakages including switching to 
Resettable methods.
  * Squashed patches dealing with dGPUs, MMIO area size, and GPU picking.
  * Allow re-entrant MMIO; this simplifies the code and solves the divergence
between x86-64 and arm64 variants.

v4:

  * Renamed '-vmapple' device variant to '-mmio'
  * MMIO device type now requires aarch64 host and guest
  * Complete overhaul of the glue code for making Qemu's and
ParavirtualizedGraphics.framework's threading and synchronisation models
work together. Calls into PVG are from dispatch queues while the
BQL-holding initiating thread processes AIO context events; callbacks from
PVG are scheduled as BHs on the BQL/main AIO context, awaiting completion
where necessary.
  * Guest frame rendering state is covered by the BQL, with only the PVG calls
outside the lock, and serialised on the named render_queue.
  * Simplified logic for dropping frames in-flight during mode changes, fixed
bug in pending frames logic.
  * Addressed smaller code review notes such as: function naming, object type
declarations, type names/declarations/casts, code formatting, #include
order, over-cautious ObjC retain/release, what goes in init vs realize,
etc.

v5:

  * Smaller non-functional fixes in response to review comments, such as using
NULL for the AIO_WAIT_WHILE context argument, type name formatting,
deleting leftover debug code, logging improvements, state struct field
order and documentation improvements, etc.
  * Instead of a single condition variable for all synchronous BH job types,
there is now one for each callback block. This reduces the number
of threads being awoken unnecessarily to near zero.
  * MMIO device variant: Unified the BH job for raising interrupts.
  * Use DMA APIs for PVG framework's guest memory read requests.
  * Thread safety improvements: ensure mutable AppleGFXState fields are not
accessed outside the appropriate lock. Added dedicated mutex for the task
list.
  * Retain references to MemoryRegions for which there exist mappings in each
PGTask, and for IOSurface mappings.

  hw/display/Kconfig  |   9 +
  hw/display/apple-gfx-mmio.m | 387 ++
  hw/display/apple-gfx.h  |  79 
  hw/display/apple-gfx.m  | 773 
  hw/display/meson.build  |   4 +
  hw/display/trace-events |  28 ++
  meson.build |   4 +
  7 files changed, 1284 insertions(+)
  create mode 100644 hw/display/apple-gfx-mmio.m
  create mode 100644 hw/display/apple-gfx.h
  create mode 100644 hw/display/apple-gfx.m

diff --git a/hw/display/Kconfig b/hw/display/Kconfig
index 2250c740078..6a9b7b19ada 100644
--- a/hw/display/Kconfig
+++ b/hw/display/Kconfig
@@ -140,3 +140,12 @@ config XLNX_DISPLAYPORT
  
  config DM163

  bool
+
+config MAC_PVG
+bool
+default y
+
+config MAC_PVG_MMIO
+bool
+depends on MAC_PVG && AARCH64
+
diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
new file mode 100644
index 000..b0c5669a344
--- /dev/null
+++ b/hw/display/apple-gfx-mmio.m
@