apple-gfx: Introduce ParavirtualizedGraphics.Framework support

Phil Dennis-Jordan Sat, 26 Oct 2024 03:25:51 -0700

On Sat, 26 Oct 2024 at 06:40, Akihiko Odaki <akihiko.od...@daynix.com>
wrote:


> On 2024/10/26 4:43, Phil Dennis-Jordan wrote:
> >
> >
> > On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki <akihiko.od...@daynix.com
> > <mailto:akihiko.od...@daynix.com>> wrote:
> >
> >     On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> >      > +    /* For running PVG memory-mapping requests in the AIO
> context */
> >      > +    QemuCond job_cond;
> >      > +    QemuMutex job_mutex;
> >
> >     Use: QemuEvent
> >
> >
> > Hmm. I think if we were to use that, we would need to create a new
> > QemuEvent for every job and destroy it afterward, which seems expensive.
> > We can't rule out multiple concurrent jobs being submitted, and the
> > QemuEvent system only supports a single producer as far as I can tell.
> >
> > You can probably sort of hack around it with just one QemuEvent by
> > putting the qemu_event_wait into a loop and turning the job.done flag
> > into an atomic (because it would now need to be checked outside the
> > lock) but this all seems unnecessarily complicated considering the
> > QemuEvent uses the same mechanism QemuCond/QemuMutex internally on macOS
> > (the only platform relevant here), except we can use it as intended with
> > QemuCond/QemuMutex rather than having to work against the abstraction.
>
> I don't think it's going to be used concurrently. It would be difficult
> to reason even for the framework if it performs memory
> unmapping/mapping/reading operations concurrently.


I've just performed a very quick test by wrapping the job submission/wait
in the 2 mapMemory callbacks and the 1 readMemory callback with atomic
counters and logging whenever a counter went above 1.

 * Overall, concurrent callbacks across all types were common (many per
second when the VM is busy). It's not exactly a "thundering herd" (I never
saw >2) but it's probably not a bad idea to use a separate condition
variable for each job type. (task map, surface map, memory read)
 * While I did not observe any concurrent memory mapping operations *within*
a type of memory map (2 task mappings or 2 surface mappings) I did see very
occasional concurrent memory *read* callbacks. These would, as far as I can
tell, not be safe with QemuEvents, unless we placed the event inside the
job struct and init/destroyed it on every callback (which seems like
excessive overhead).

My recommendation would be to split it up into 3 pairs of mutex/cond; this
will almost entirely remove any contention, but continue to be safe for
when it does occur. I don't think QemuEvent is a realistic option (too
tricky to get right) for the observed-concurrent readMemory callback. I'm
nervous about assuming the mapMemory callbacks will NEVER be called
concurrently, but at a push I'll acquiesce to switching those to QemuEvent
in the absence of evidence of concurrency.


> PGDevice.h also notes
> raiseInterrupt needs to be thread-safe while it doesn't make such notes
> for memory operations. This actually makes sense.
>
> If it's ever going to be used concurrently, it's better to have
> QemuEvent for each job to avoid the thundering herd problem.
>
> >
> >      > +
> >      > +    dispatch_queue_t render_queue;
> >      > +    /* The following fields should only be accessed from the
> BQL: */
> >
> >     Perhaps it may be better to document fields that can be accessed
> >     *without* the BQL; most things in QEMU implicitly require the BQL.
> >
> >      > +    bool gfx_update_requested;
> >      > +    bool new_frame_ready;
> >      > +    bool using_managed_texture_storage;
> >      > +} AppleGFXState;
> >      > +
> >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
> >     char* obj_name);
> >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >     PGDeviceDescriptor *desc,
> >      > +                              Error **errp);
> >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >     guest_physical,
> >      > +                                               uint64_t length,
> >     bool read_only);
> >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> *job_done_flag);
> >      > +
> >      > +#endif
> >      > +
> >      > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> >      > new file mode 100644
> >      > index 00000000000..46be9957f69
> >      > --- /dev/null
> >      > +++ b/hw/display/apple-gfx.m
> >      > @@ -0,0 +1,713 @@
> >      > +/*
> >      > + * QEMU Apple ParavirtualizedGraphics.framework device
> >      > + *
> >      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
> >     Rights Reserved.
> >      > + *
> >      > + * This work is licensed under the terms of the GNU GPL, version
> >     2 or later.
> >      > + * See the COPYING file in the top-level directory.
> >      > + *
> >      > + * ParavirtualizedGraphics.framework is a set of libraries that
> >     macOS provides
> >      > + * which implements 3d graphics passthrough to the host as well
> as a
> >      > + * proprietary guest communication channel to drive it. This
> >     device model
> >      > + * implements support to drive that library from within QEMU.
> >      > + */
> >      > +
> >      > +#include "qemu/osdep.h"
> >      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> >      > +#include <mach/mach_vm.h>
> >      > +#include "apple-gfx.h"
> >      > +#include "trace.h"
> >      > +#include "qemu-main.h"
> >      > +#include "exec/address-spaces.h"
> >      > +#include "migration/blocker.h"
> >      > +#include "monitor/monitor.h"
> >      > +#include "qemu/main-loop.h"
> >      > +#include "qemu/cutils.h"
> >      > +#include "qemu/log.h"
> >      > +#include "qapi/visitor.h"
> >      > +#include "qapi/error.h"
> >      > +#include "ui/console.h"
> >      > +
> >      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
> >      > +    { .x = 1440, .y = 1080 },
> >      > +    { .x = 1280, .y = 1024 },
> >      > +};
> >      > +
> >      > +/* This implements a type defined in <ParavirtualizedGraphics/
> >     PGDevice.h>
> >      > + * which is opaque from the framework's point of view. Typedef
> >     PGTask_t already
> >      > + * exists in the framework headers. */
> >      > +struct PGTask_s {
> >      > +    QTAILQ_ENTRY(PGTask_s) node;
> >      > +    mach_vm_address_t address;
> >      > +    uint64_t len;
> >      > +};
> >      > +
> >      > +static Error *apple_gfx_mig_blocker;
> >
> >     This does not have to be a static variable.
> >
> >
> > Hmm, the first 5 or so examples of migration blockers in other devices
> > etc. I could find were all declared in this way. What are you suggesting
> > as the alternative? And why not use the same pattern as in most of the
> > rest of the code base?
>
> I was wrong. This is better to be a static variable to ensure we won't
> add the same blocker in case we have two device instances.
>
> >
> >      > +
> >      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> >      > +                                             uint32_t width,
> >     uint32_t height);
> >      > +
> >      > +static inline dispatch_queue_t get_background_queue(void)
> >
> >     Don't add inline. The only effect for modern compilers of inline is
> to
> >     suppress the unused function warnings.
> >
> >      > +{
> >      > +    return
> >     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> >      > +}
> >      > +
> >      > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t
> len)
> >      > +{
> >      > +    mach_vm_address_t task_mem;
> >      > +    PGTask_t *task;
> >      > +    kern_return_t r;
> >      > +
> >      > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
> >     VM_FLAGS_ANYWHERE);
> >      > +    if (r != KERN_SUCCESS || task_mem == 0) {
> >
> >     Let's remove the check for task_mem == 0. We have no reason to
> >     reject it
> >     if the platform insists it allocated a memory at address 0 though
> >     such a
> >     situation should never happen in practice.
> >
> >      > +        return NULL;
> >      > +    }
> >      > +
> >      > +    task = g_new0(PGTask_t, 1);
> >      > +
> >      > +    task->address = task_mem;
> >      > +    task->len = len;
> >      > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
> >      > +
> >      > +    return task;
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXIOJob {
> >      > +    AppleGFXState *state;
> >      > +    uint64_t offset;
> >      > +    uint64_t value;
> >      > +    bool completed;
> >      > +} AppleGFXIOJob;
> >      > +
> >      > +static void apple_gfx_do_read(void *opaque)
> >      > +{
> >      > +    AppleGFXIOJob *job = opaque;
> >      > +    job->value = [job->state->pgdev
> mmioReadAtOffset:job->offset];
> >      > +    qatomic_set(&job->completed, true);
> >      > +    aio_wait_kick();
> >      > +}
> >      > +
> >      > +static uint64_t apple_gfx_read(void *opaque, hwaddr offset,
> >     unsigned size)
> >      > +{
> >      > +    AppleGFXIOJob job = {
> >      > +        .state = opaque,
> >      > +        .offset = offset,
> >      > +        .completed = false,
> >      > +    };
> >      > +    AioContext *context = qemu_get_aio_context();
> >      > +    dispatch_queue_t queue = get_background_queue();
> >      > +
> >      > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
> >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> >      > +
> >      > +    trace_apple_gfx_read(offset, job.value);
> >      > +    return job.value;
> >      > +}
> >      > +
> >      > +static void apple_gfx_do_write(void *opaque)
> >      > +{
> >      > +    AppleGFXIOJob *job = opaque;
> >      > +    [job->state->pgdev mmioWriteAtOffset:job->offset value:job-
> >      >value];
> >      > +    qatomic_set(&job->completed, true);
> >      > +    aio_wait_kick();
> >      > +}
> >      > +
> >      > +static void apple_gfx_write(void *opaque, hwaddr offset,
> >     uint64_t val,
> >      > +                            unsigned size)
> >      > +{
> >      > +    /* The methods mmioReadAtOffset: and especially
> >     mmioWriteAtOffset: can
> >      > +     * trigger and block on operations on other dispatch queues,
> >     which in turn
> >      > +     * may call back out on one or more of the callback blocks.
> >     For this reason,
> >      > +     * and as we are holding the BQL, we invoke the I/O methods
> >     on a pool
> >      > +     * thread and handle AIO tasks while we wait. Any work in
> >     the callbacks
> >      > +     * requiring the BQL will in turn schedule BHs which this
> >     thread will
> >      > +     * process while waiting. */
> >      > +    AppleGFXIOJob job = {
> >      > +        .state = opaque,
> >      > +        .offset = offset,
> >      > +        .value = val,
> >      > +        .completed = false,
> >      > +    };
> >      > +    AioContext *context = qemu_get_current_aio_context();
> >      > +    dispatch_queue_t queue = get_background_queue();
> >      > +
> >      > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
> >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> >      > +
> >      > +    trace_apple_gfx_write(offset, val);
> >      > +}
> >      > +
> >      > +static const MemoryRegionOps apple_gfx_ops = {
> >      > +    .read = apple_gfx_read,
> >      > +    .write = apple_gfx_write,
> >      > +    .endianness = DEVICE_LITTLE_ENDIAN,
> >      > +    .valid = {
> >      > +        .min_access_size = 4,
> >      > +        .max_access_size = 8,
> >      > +    },
> >      > +    .impl = {
> >      > +        .min_access_size = 4,
> >      > +        .max_access_size = 4,
> >      > +    },
> >      > +};
> >      > +
> >      > +static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState
> *s)
> >      > +{
> >      > +    BOOL r;
> >      > +    uint32_t width = surface_width(s->surface);
> >      > +    uint32_t height = surface_height(s->surface);
> >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> >      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
> >     commandBuffer];
> >      > +    id<MTLTexture> texture = s->texture;
> >      > +
> >      > +    assert(bql_locked());
> >      > +    [texture retain];
> >      > +
> >      > +    bql_unlock();
> >      > +
> >      > +    /* This is not safe to call from the BQL due to PVG-internal
> >     locks causing
> >      > +     * deadlocks. */
> >      > +    r = [s->pgdisp
> encodeCurrentFrameToCommandBuffer:command_buffer
> >      > +                                             texture:texture
> >      > +                                              region:region];
> >      > +    if (!r) {
> >      > +        [texture release];
> >      > +        bql_lock();
> >      > +        --s->pending_frames;
> >      > +        bql_unlock();
> >      > +        qemu_log_mask(LOG_GUEST_ERROR,
> >     "apple_gfx_render_new_frame_bql_unlock: "
> >
> >     Use: __func__
> >
> >      > +
> >     "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
> >      > +        return;
> >      > +    }
> >      > +
> >      > +    if (s->using_managed_texture_storage) {
> >      > +        /* "Managed" textures exist in both VRAM and RAM and
> >     must be synced. */
> >      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
> >     blitCommandEncoder];
> >      > +        [blit synchronizeResource:texture];
> >      > +        [blit endEncoding];
> >      > +    }
> >      > +    [texture release];
> >      > +    [command_buffer addCompletedHandler:
> >      > +        ^(id<MTLCommandBuffer> cb)
> >      > +        {
> >      > +            dispatch_async(s->render_queue, ^{
> >      > +                apple_gfx_render_frame_completed(s, width,
> height);
> >      > +            });
> >      > +        }];
> >      > +    [command_buffer commit];
> >      > +}
> >      > +
> >      > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture>
> >     texture, void *vram)
> >      > +{
> >      > +    /* TODO: Skip this entirely on a pure Metal or headless/
> >     guest-only
> >      > +     * rendering path, else use a blit command encoder? Needs
> >     careful
> >      > +     * (double?) buffering design. */
> >      > +    size_t width = texture.width, height = texture.height;
> >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> >      > +    [texture getBytes:vram
> >      > +          bytesPerRow:(width * 4)
> >      > +        bytesPerImage:(width * height * 4)
> >      > +           fromRegion:region
> >      > +          mipmapLevel:0
> >      > +                slice:0];
> >      > +}copy_mtl_texture_to_surface_mem
> >      > +
> >      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> >      > +                                             uint32_t width,
> >     uint32_t height)
> >      > +{
> >      > +    bql_lock();
> >      > +    --s->pending_frames;
> >      > +    assert(s->pending_frames >= 0);
> >      > +
> >      > +    /* Only update display if mode hasn't changed since we
> >     started rendering. */
> >      > +    if (width == surface_width(s->surface) &&
> >      > +        height == surface_height(s->surface)) {
> >      > +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
> >      > +        if (s->gfx_update_requested) {
> >      > +            s->gfx_update_requested = false;
> >      > +            dpy_gfx_update_full(s->con);
> >      > +            graphic_hw_update_done(s->con);
> >      > +            s->new_frame_ready = false;
> >      > +        } else {
> >      > +            s->new_frame_ready = true;
> >      > +        }
> >      > +    }
> >      > +    if (s->pending_frames > 0) {
> >      > +        apple_gfx_render_new_frame_bql_unlock(s);
> >      > +    } else {
> >      > +        bql_unlock();
> >      > +    }
> >      > +}
> >      > +
> >      > +static void apple_gfx_fb_update_display(void *opaque)
> >      > +{
> >      > +    AppleGFXState *s = opaque;
> >      > +
> >      > +    assert(bql_locked());
> >      > +    if (s->new_frame_ready) {
> >      > +        dpy_gfx_update_full(s->con);
> >      > +        s->new_frame_ready = false;
> >      > +        graphic_hw_update_done(s->con);
> >      > +    } else if (s->pending_frames > 0) {
> >      > +        s->gfx_update_requested = true;
> >      > +    } else {
> >      > +        graphic_hw_update_done(s->con);
> >      > +    }
> >      > +}
> >      > +
> >      > +static const GraphicHwOps apple_gfx_fb_ops = {
> >      > +    .gfx_update = apple_gfx_fb_update_display,
> >      > +    .gfx_update_async = true,
> >      > +};
> >      > +
> >      > +static void update_cursor(AppleGFXState *s)
> >      > +{
> >      > +    assert(bql_locked());
> >      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
> >      > +                  s->pgdisp.cursorPosition.y, s->cursor_show);
> >      > +}
> >      > +
> >      > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t
> >     height)
> >      > +{
> >      > +    MTLTextureDescriptor *textureDescriptor;
> >      > +
> >      > +    if (s->surface &&
> >      > +        width == surface_width(s->surface) &&
> >      > +        height == surface_height(s->surface)) {
> >      > +        return;
> >      > +    }
> >      > +
> >      > +    g_free(s->vram);
> >      > +    [s->texture release];
> >      > +
> >      > +    s->vram = g_malloc0_n(width * height, 4);
> >      > +    s->surface = qemu_create_displaysurface_from(width, height,
> >     PIXMAN_LE_a8r8g8b8,
> >      > +                                                 width * 4, s-
> >      >vram);> +> +    @autoreleasepool {
> >      > +        textureDescriptor =
> >      > +            [MTLTextureDescriptor
> >      > +
> >     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
> >      > +                                             width:width
> >      > +                                            height:height
> >      > +                                         mipmapped:NO];
> >      > +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
> >      > +        s->texture = [s->mtl
> >     newTextureWithDescriptor:textureDescriptor];
> >
> >
> >     What about creating pixman_image_t from s->texture.buffer.contents?
> >     This
> >     should save memory usage by removing the duplication of texture.
> >
> >
> > We need explicit control over when the GPU vs when the CPU may access
> > the texture - only one of them may access them at a time. As far as I
> > can tell, we can't control when the rest of Qemu might access the
> > pixman_image used in the console surface?
>
> You are right; we need to have duplicate buffers. We can still avoid
> copying by using two MTLTextures for double-buffering instead of having
> a MTLTexture and a pixman_image and copying between them for
> MTLStorageModeManaged.
>

Do I understand correctly that you intend to swap the surface->image on
every frame, or even the surface->image->data? If so, it's my understanding
from reading the source of a bunch of UI implementations a few weeks ago
that this is neither supported nor safe, as some implementations take
long-lived references to these internal data structures until a
dpy_gfx_switch callback. And the implementations for those callbacks are in
turn very expensive in some cases. This is why my conclusion in the v4
thread was that double-buffering was infeasible with the current
architecture.


> >
> >      > +    }
> >      > +
> >      > +    s->using_managed_texture_storage =
> >      > +        (s->texture.storageMode == MTLStorageModeManaged);
> >      > +    dpy_gfx_replace_surface(s->con, s->surface);
> >      > +}
> >      > +
> >      > +static void create_fb(AppleGFXState *s)
> >      > +{
> >      > +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
> >      > +    set_mode(s, 1440, 1080);
> >      > +
> >      > +    s->cursor_show = true;
> >      > +}
> >      > +
> >      > +static size_t apple_gfx_get_default_mmio_range_size(void)
> >      > +{
> >      > +    size_t mmio_range_size;
> >      > +    @autoreleasepool {
> >      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> >      > +        mmio_range_size = desc.mmioLength;
> >      > +        [desc release];
> >      > +    }
> >      > +    return mmio_range_size;
> >      > +}
> >      > +
> >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
> >     char* obj_name)
> >      > +{
> >      > +    size_t mmio_range_size =
> >     apple_gfx_get_default_mmio_range_size();
> >      > +
> >      > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
> >      > +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s,
> >     obj_name,
> >      > +                          mmio_range_size);
> >      > +
> >      > +    /* TODO: PVG framework supports serialising device state:
> >     integrate it! */
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXMapMemoryJob {
> >      > +    AppleGFXState *state;
> >      > +    PGTask_t *task;
> >      > +    uint64_t virtual_offset;
> >      > +    PGPhysicalMemoryRange_t *ranges;
> >      > +    uint32_t range_count;
> >      > +    bool read_only;
> >      > +    bool success;
> >      > +    bool done;
> >      > +} AppleGFXMapMemoryJob;
> >      > +
> >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >     guest_physical,
> >      > +                                               uint64_t length,
> >     bool read_only)
> >      > +{
> >      > +    MemoryRegion *ram_region;
> >      > +    uintptr_t host_address;
> >      > +    hwaddr ram_region_offset = 0;
> >      > +    hwaddr ram_region_length = length;
> >      > +
> >      > +    ram_region = address_space_translate(&address_space_memory,
> >      > +                                         guest_physical,
> >      > +                                         &ram_region_offset,
> >      > +                                         &ram_region_length, !
> >     read_only,
> >      > +                                         MEMTXATTRS_UNSPECIFIED);
> >
> >     Call memory_region_ref() so that it won't go away.
> >
> >      > +
> >      > +    if (!ram_region || ram_region_length < length ||
> >      > +        !memory_access_is_direct(ram_region, !read_only)) {
> >      > +        return 0;
> >      > +    }
> >      > +
> >      > +    host_address =
> >     (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
> >
> >     host_address is typed as uintptr_t, not mach_vm_address_t.
> >
> >      > +    if (host_address == 0) {
> >      > +        return 0;
> >      > +    }
> >      > +    host_address += ram_region_offset;
> >      > +
> >      > +    return host_address;
> >      > +}
> >      > +
> >      > +static void apple_gfx_map_memory(void *opaque)
> >      > +{
> >      > +    AppleGFXMapMemoryJob *job = opaque;
> >      > +    AppleGFXState *s = job->state;
> >      > +    PGTask_t *task                  = job->task;
> >      > +    uint32_t range_count            = job->range_count;
> >      > +    uint64_t virtual_offset         = job->virtual_offset;
> >      > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
> >      > +    bool read_only                  = job->read_only;
> >      > +    kern_return_t r;
> >      > +    mach_vm_address_t target, source;
> >      > +    vm_prot_t cur_protection, max_protection;
> >      > +    bool success = true;
> >      > +
> >      > +    g_assert(bql_locked());
> >      > +
> >      > +    trace_apple_gfx_map_memory(task, range_count,
> >     virtual_offset, read_only);
> >      > +    for (int i = 0; i < range_count; i++) {
> >      > +        PGPhysicalMemoryRange_t *range = &ranges[i];
> >      > +
> >      > +        target = task->address + virtual_offset;
> >      > +        virtual_offset += range->physicalLength;
> >      > +
> >      > +        trace_apple_gfx_map_memory_range(i,
> range->physicalAddress,
> >      > +                                         range->physicalLength);
> >      > +
> >      > +        source = apple_gfx_host_address_for_gpa_range(range-
> >      >physicalAddress,
> >      > +                                                      range-
> >      >physicalLength,
> >      > +                                                      read_only);
> >      > +        if (source == 0) {
> >      > +            success = false;
> >      > +            continue;
> >      > +        }
> >      > +
> >      > +        MemoryRegion* alt_mr = NULL;
> >      > +        mach_vm_address_t alt_source =
> >     (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress, range-
> >      >physicalLength, NULL);
> >      > +        g_assert(alt_source == source);
> >
> >     Remove this; I guess this is for debugging.
> >
> >      > +
> >      > +        cur_protection = 0;
> >      > +        max_protection = 0;
> >      > +        // Map guest RAM at range->physicalAddress into PG task
> >     memory range
> >      > +        r = mach_vm_remap(mach_task_self(),
> >      > +                          &target, range->physicalLength,
> >     vm_page_size - 1,
> >      > +                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
> >      > +                          mach_task_self(),
> >      > +                          source, false /* shared mapping, no
> >     copy */,
> >      > +                          &cur_protection, &max_protection,
> >      > +                          VM_INHERIT_COPY);
> >      > +        trace_apple_gfx_remap(r, source, target);
> >      > +        g_assert(r == KERN_SUCCESS);
> >      > +    }
> >      > +
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    job->success = success;
> >      > +    job->done = true;
> >      > +    qemu_cond_broadcast(&s->job_cond);
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> *job_done_flag)
> >      > +{
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    while (!*job_done_flag) {
> >      > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
> >      > +    }
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXReadMemoryJob {
> >      > +    AppleGFXState *s;
> >      > +    hwaddr physical_address;
> >      > +    uint64_t length;
> >      > +    void *dst;
> >      > +    bool done;
> >      > +} AppleGFXReadMemoryJob;
> >      > +
> >      > +static void apple_gfx_do_read_memory(void *opaque)
> >      > +{
> >      > +    AppleGFXReadMemoryJob *job = opaque;
> >      > +    AppleGFXState *s = job->s;
> >      > +
> >      > +    cpu_physical_memory_read(job->physical_address, job->dst,
> >     job->length);
> >
> >     Use: dma_memory_read()
> >
> >      > +
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    job->done = true;
> >      > +    qemu_cond_broadcast(&s->job_cond);
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
> >     physical_address,
> >      > +                                  uint64_t length, void *dst)
> >      > +{
> >      > +    AppleGFXReadMemoryJob job = {
> >      > +        s, physical_address, length, dst
> >      > +    };
> >      > +
> >      > +    trace_apple_gfx_read_memory(physical_address, length, dst);
> >      > +
> >      > +    /* Traversing the memory map requires RCU/BQL, so do it in a
> >     BH. */
> >      > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >     apple_gfx_do_read_memory,
> >      > +                            &job);
> >      > +    apple_gfx_await_bh_job(s, &job.done);
> >      > +}
> >      > +
> >      > +static void
> >     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
> >      > +
> >       PGDeviceDescriptor *desc)
> >      > +{
> >      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable *
> >     _Nonnull baseAddress) {
> >      > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
> >      > +        *baseAddress = (void *)task->address;
> >      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
> >      > +        return task;
> >      > +    };
> >      > +
> >      > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
> >      > +        trace_apple_gfx_destroy_task(task);
> >      > +        QTAILQ_REMOVE(&s->tasks, task, node);
> >      > +        mach_vm_deallocate(mach_task_self(), task->address,
> >     task->len);
> >      > +        g_free(task);
> >      > +    };
> >      > +
> >      > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t
> >     range_count,
> >      > +                       uint64_t virtual_offset, bool read_only,
> >      > +                       PGPhysicalMemoryRange_t * _Nonnull
> ranges) {
> >      > +        AppleGFXMapMemoryJob job = {
> >      > +            .state = s,
> >      > +            .task = task, .ranges = ranges, .range_count =
> >     range_count,
> >      > +            .read_only = read_only, .virtual_offset =
> >     virtual_offset,
> >      > +            .done = false, .success = true,
> >      > +        };
> >      > +        if (range_count > 0) {
> >      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >      > +                                    apple_gfx_map_memory, &job);
> >      > +            apple_gfx_await_bh_job(s, &job.done);
> >      > +        }
> >      > +        return job.success;
> >      > +    };
> >      > +
> >      > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t
> >     virtualOffset,
> >      > +                         uint64_t length) {
> >      > +        kern_return_t r;
> >      > +        mach_vm_address_t range_address;
> >      > +
> >      > +        trace_apple_gfx_unmap_memory(task, virtualOffset,
> length);
> >      > +
> >      > +        /* Replace task memory range with fresh pages, undoing
> >     the mapping
> >      > +         * from guest RAM. */
> >      > +        range_address = task->address + virtualOffset;
> >      > +        r = mach_vm_allocate(mach_task_self(), &range_address,
> >     length,
> >      > +                             VM_FLAGS_FIXED |
> VM_FLAGS_OVERWRITE);
> >      > +        g_assert(r == KERN_SUCCESS);error_setg
> >
> >     An extra error_setg
> >
> >      > +
> >      > +        return true;
> >      > +    };
> >      > +
> >      > +    desc.readMemory = ^bool(uint64_t physical_address, uint64_t
> >     length,
> >      > +                            void * _Nonnull dst) {
> >      > +        apple_gfx_read_memory(s, physical_address, length, dst);
> >      > +        return true;
> >      > +    };
> >      > +}
> >      > +
> >      > +static PGDisplayDescriptor
> >     *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
> >      > +{
> >      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
> >      > +
> >      > + disp_desc.name <http://disp_desc.name> = @"QEMU display";
> >      > +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A
> >     20" display */
> >      > +    disp_desc.queue = dispatch_get_main_queue();
> >      > +    disp_desc.newFrameEventHandler = ^(void) {
> >      > +        trace_apple_gfx_new_frame();
> >      > +        dispatch_async(s->render_queue, ^{
> >      > +            /* Drop frames if we get too far ahead. */
> >      > +            bql_lock();
> >      > +            if (s->pending_frames >= 2) {
> >      > +                bql_unlock();
> >      > +                return;
> >      > +            }
> >      > +            ++s->pending_frames;
> >      > +            if (s->pending_frames > 1) {
> >      > +                bql_unlock();
> >      > +                return;
> >      > +            }
> >      > +            @autoreleasepool {
> >      > +                apple_gfx_render_new_frame_bql_unlock(s);
> >      > +            }
> >      > +        });
> >      > +    };
> >      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t
> sizeInPixels,
> >      > +                                    OSType pixelFormat) {
> >      > +        trace_apple_gfx_mode_change(sizeInPixels.x,
> sizeInPixels.y);
> >      > +
> >      > +        BQL_LOCK_GUARD();
> >      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
> >      > +    };
> >      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
> >      > +                                     PGDisplayCoord_t hotSpot) {
> >      > +        [glyph retain];
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            uint32_t bpp = glyph.bitsPerPixel;
> >      > +            size_t width = glyph.pixelsWide;
> >      > +            size_t height = glyph.pixelsHigh;
> >      > +            size_t padding_bytes_per_row = glyph.bytesPerRow -
> >     width * 4;
> >      > +            const uint8_t* px_data = glyph.bitmapData;
> >      > +
> >      > +            trace_apple_gfx_cursor_set(bpp, width, height);
> >      > +
> >      > +            if (s->cursor) {
> >      > +                cursor_unref(s->cursor);
> >      > +                s->cursor = NULL;
> >      > +            }
> >      > +
> >      > +            if (bpp == 32) { /* Shouldn't be anything else, but
> >     just to be safe...*/
> >      > +                s->cursor = cursor_alloc(width, height);
> >      > +                s->cursor->hot_x = hotSpot.x;
> >      > +                s->cursor->hot_y = hotSpot.y;
> >      > +
> >      > +                uint32_t *dest_px = s->cursor->data;
> >      > +
> >      > +                for (size_t y = 0; y < height; ++y) {
> >      > +                    for (size_t x = 0; x < width; ++x) {
> >      > +                        /* NSBitmapImageRep's red & blue
> >     channels are swapped
> >      > +                         * compared to QEMUCursor's. */
> >      > +                        *dest_px =
> >      > +                            (px_data[0] << 16u) |
> >      > +                            (px_data[1] <<  8u) |
> >      > +                            (px_data[2] <<  0u) |
> >      > +                            (px_data[3] << 24u);
> >      > +                        ++dest_px;
> >      > +                        px_data += 4;
> >      > +                    }
> >      > +                    px_data += padding_bytes_per_row;
> >      > +                }
> >      > +                dpy_cursor_define(s->con, s->cursor);
> >      > +                update_cursor(s);
> >      > +            }
> >      > +            [glyph release];
> >      > +        });
> >      > +    };
> >      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            trace_apple_gfx_cursor_show(show);
> >      > +            s->cursor_show = show;
> >      > +            update_cursor(s);
> >      > +        });
> >      > +    };
> >      > +    disp_desc.cursorMoveHandler = ^(void) {
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            trace_apple_gfx_cursor_move();
> >      > +            update_cursor(s);
> >      > +        });
> >      > +    };
> >      > +
> >      > +    return disp_desc;
> >      > +}
> >      > +
> >      > +static NSArray<PGDisplayMode*>*
> >     apple_gfx_prepare_display_mode_array(void)
> >      > +{
> >      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> >      > +    NSArray<PGDisplayMode*>* mode_array = nil;
> >      > +    int i;
> >      > +
> >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      > +        modes[i] =
> >      > +            [[PGDisplayMode alloc]
> >     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> >      > +    }
> >      > +
> >      > +    mode_array = [NSArray arrayWithObjects:modes
> >     count:ARRAY_SIZE(apple_gfx_modes)];
> >      > +
> >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      > +        [modes[i] release];
> >      > +        modes[i] = nil;
> >      > +    }
> >      > +
> >      > +    return mode_array;
> >      > +}
> >      > +
> >      > +static id<MTLDevice> copy_suitable_metal_device(void)
> >      > +{
> >      > +    id<MTLDevice> dev = nil;
> >      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
> >      > +
> >      > +    /* Prefer a unified memory GPU. Failing that, pick a non-
> >     removable GPU. */
> >      > +    for (size_t i = 0; i < devs.count; ++i) {
> >      > +        if (devs[i].hasUnifiedMemory) {
> >      > +            dev = devs[i];
> >      > +            break;
> >      > +        }
> >      > +        if (!devs[i].removable) {
> >      > +            dev = devs[i];
> >      > +        }
> >      > +    }
> >      > +
> >      > +    if (dev != nil) {
> >      > +        [dev retain];
> >      > +    } else {
> >      > +        dev = MTLCreateSystemDefaultDevice();
> >      > +    }
> >      > +    [devs release];
> >      > +
> >      > +    return dev;
> >      > +}
> >      > +
> >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >     PGDeviceDescriptor *desc,
> >      > +                              Error **errp)
> >      > +{
> >      > +    PGDisplayDescriptor *disp_desc = nil;
> >      > +
> >      > +    if (apple_gfx_mig_blocker == NULL) {
> >      > +        error_setg(&apple_gfx_mig_blocker,
> >      > +                  "Migration state blocked by apple-gfx display
> >     device");
> >      > +        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) <
> 0) {
> >      > +            return;
> >      > +        }
> >      > +    }
> >      > +
> >      > +    QTAILQ_INIT(&s->tasks);
> >      > +    s->render_queue = dispatch_queue_create("apple-gfx.render",
> >      > +
> DISPATCH_QUEUE_SERIAL);
> >      > +    s->mtl = copy_suitable_metal_device();
> >      > +    s->mtl_queue = [s->mtl newCommandQueue];
> >      > +
> >      > +    desc.device = s->mtl;
> >      > +
> >      > +    apple_gfx_register_task_mapping_handlers(s, desc);
> >      > +
> >      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
> >      > +
> >      > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
> >      > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
> >      > +                                              port:0
> >     serialNum:1234];
> >      > +    [disp_desc release];
> >      > +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
> >      > +
> >      > +    create_fb(s);
> >      > +
> >      > +    qemu_mutex_init(&s->job_mutex);
> >      > +    qemu_cond_init(&s->job_cond);
> >      > +}
> >      > diff --git a/hw/display/meson.build b/hw/display/meson.build
> >      > index 20a94973fa2..619e642905a 100644
> >      > --- a/hw/display/meson.build
> >      > +++ b/hw/display/meson.build
> >      > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true:
> >     files('artist.c'))
> >      >
> >      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c',
> >     'ati_2d.c', 'ati_dbg.c'), pixman])
> >      >
> >      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
> >     [files('apple-gfx.m'), pvg, metal])
> >      > +if cpu == 'aarch64'
> >      > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
> >     [files('apple-gfx-mmio.m'), pvg, metal])
> >      > +endif
> >      >
> >      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
> >      >     virtio_gpu_ss = ss.source_set()
> >      > diff --git a/hw/display/trace-events b/hw/display/trace-events
> >      > index 781f8a33203..214998312b9 100644
> >      > --- a/hw/display/trace-events
> >      > +++ b/hw/display/trace-events
> >      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
> >     "dest_width : %u"
> >      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
> >      >   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
> >      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
> >      > +
> >      > +# apple-gfx.m
> >      > +apple_gfx_read(uint64_t offset, uint64_t res)
> >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      > +apple_gfx_write(uint64_t offset, uint64_t val)
> >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x
> >     base_addr=%p"
> >      > +apple_gfx_destroy_task(void *task) "task=%p"
> >      > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t
> >     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
> >     virtual_offset=0x%"PRIx64" read_only=%d"
> >      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
> >     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
> >      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
> >     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
> >      > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset,
> >     uint64_t length) "task=%p virtual_offset=0x%"PRIx64"
> length=0x%"PRIx64
> >      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length,
> >     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
> >      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
> >      > +apple_gfx_new_frame(void) ""
> >      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
> >     y=%"PRId64
> >      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t
> >     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
> >      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
> >      > +apple_gfx_cursor_move(void) ""
> >      > +apple_gfx_common_init(const char *device_name, size_t mmio_size)
> >     "device: %s; MMIO size: %zu bytes"
> >      > +
> >      > +# apple-gfx-mmio.m
> >      > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
> >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
> >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t
> >     ro, void *va, void *e, void *f, void* va_result, int success)
> >     "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p,
> >     success = %d"
> >      > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d,
> >     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
> >      > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
> >      > +
> >      > diff --git a/meson.build b/meson.build
> >      > index d26690ce204..0e124eff13f 100644
> >      > --- a/meson.build
> >      > +++ b/meson.build
> >      > @@ -761,6 +761,8 @@ socket = []
> >      >   version_res = []
> >      >   coref = []
> >      >   iokit = []
> >      > +pvg = []
> >      > +metal = []
> >      >   emulator_link_args = []
> >      >   midl = not_found
> >      >   widl = not_found
> >      > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
> >      >     coref = dependency('appleframeworks', modules:
> 'CoreFoundation')
> >      >     iokit = dependency('appleframeworks', modules: 'IOKit',
> >     required: false)
> >      >     host_dsosuf = '.dylib'
> >      > +  pvg = dependency('appleframeworks', modules:
> >     'ParavirtualizedGraphics')
> >      > +  metal = dependency('appleframeworks', modules: 'Metal')
> >      >   elif host_os == 'sunos'
> >      >     socket = [cc.find_library('socket'),
> >      >               cc.find_library('nsl'),
> >
>
>

Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

Reply via email to