Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: > On 1/23/25 14:58, Alex Bennée wrote: >> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: >> >>> On 1/22/25 20:00, Alex Bennée wrote: >>>> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: >>>> >>>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu. >>>>> >>>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs, >>>>> DRM native context [1] mediates lower level kernel driver UAPI, which >>>>> reflects in a less CPU overhead and less/simpler code needed to support >>>>> it. >>>>> DRM context consists of a host and guest parts that have to be implemented >>>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU >>>>> as >>>>> a real/native host GPU device for GL/VK applications. >>>>> >>>>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ >>>>> >>>>> Today there are four known DRM native context drivers existing in a wild: >>>>> >>>>> - Freedreno (Qualcomm SoC GPUs), completely upstreamed >>>>> - AMDGPU, mostly merged into upstreams >>>> >>>> I tried my AMD system today with: >>>> >>>> Host: >>>> Aarch64 AVA system >>>> Trixie >>>> virglrenderer @ v1.1.0/99557f5aa130930d11f04ffeb07f3a9aa5963182 >>>> -display sdl,gl=on (gtk,gl=on also came up but handled window resizing >>>> poorly) >>>> >>>> KVM Guest >>>> >>>> Aarch64 >>>> Trixie >>>> mesa @ main/d27748a76f7dd9236bfcf9ef172dc13b8c0e170f >>>> -Dvulkan-drivers=virtio,amd -Dgallium-drivers=virgl,radeonsi >>>> -Damdgpu-virtio=true >>>> >>>> However when I ran vulkan-info --summary KVM faulted with: >>>> >>>> debian-trixie login: error: kvm run failed Bad address >>>> PC=0000ffffb9aa1eb0 X00=0000ffffba0450a4 X01=0000aaaaf7f32400 >>>> X02=000000000000013c X03=0000ffffba045098 X04=0000aaaaf7f3253c >>>> X05=0000ffffba0451d4 X06=00000000c0016900 X07=000000000000000e >>>> X08=0000000000000014 X09=00000000000000ff X10=0000aaaaf7f32500 >>>> X11=0000aaaaf7e4d028 X12=0000aaaaf7edbcb0 X13=0000000000000001 >>>> X14=000000000000000c X15=0000000000007718 X16=0000ffffb93601f0 >>>> X17=0000ffffb9aa1dc0 X18=00000000000076f0 X19=0000aaaaf7f31330 >>>> X20=0000aaaaf7f323f0 X21=0000aaaaf7f235e0 X22=000000000000004c >>>> X23=0000aaaaf7f2b5e0 X24=0000aaaaf7ee0cb0 X25=00000000000000ff >>>> X26=0000000000000076 X27=0000ffffcd2b18a8 X28=0000aaaaf7ee0cb0 >>>> X29=0000ffffcd2b0bd0 X30=0000ffffb86c8b98 SP=0000ffffcd2b0bd0 >>>> PSTATE=20001000 --C- EL0t >>>> QEMU 9.2.50 monitor - type 'help' for more information >>>> (qemu) quit >>>> >>>> Which looks very much like the PFN locking failure. However booting up >>>> with venus=on instead works. Could there be any differences in the way >>>> device memory is mapped in the two cases? >>> >>> Memory mapping works exactly the same for nctx and venus. Are you on >>> 6.13 host kernel? >> >> Yes - with the Altra PCI workaround patches on both host and guest >> kernel. >> >> Is there anyway to trace the sharing of device memory on the host so I >> can verify its an attempt at device access? The PC looks like its in >> user-space but once this fails the guest is suspended so I can't poke >> around in its environment. > > I'm adding printk's to kernel in a such cases. Likely there is no other > better way to find why it fails. > > Does your ARM VM and host both use 4k page size? > > Well, if it's a page refcounting bug on ARM/KMV, then applying [1] to > the host driver will make it work and we will know where the problem is. > Please try. > > [1] > https://patchwork.kernel.org/project/kvm/patch/20220815095423.11131-1-dmitry.osipe...@collabora.com/
That makes no difference. AFAICT the fault is triggered in userspace: error: kvm run failed Bad address PC=0000ffffb1911eb0 X00=0000ffffb1eb60a4 X01=0000aaaaeb1f5400 X02=000000000000013c X03=0000ffffb1eb6098 X04=0000aaaaeb1f553c X05=0000ffffb1eb61d4 X06=00000000c0016900 X07=000000000000000e X08=0000000000000014 X09=00000000000000ff X10=0000aaaaeb1f5500 X11=0000aaaaeb110028 X12=0000aaaaeb19ecb0 X13=0000000000000001 X14=000000000000000c X15=0000000000007718 X16=0000ffffb11d01f0 X17=0000ffffb1911dc0 X18=00000000000076f0 X19=0000aaaaeb1f4330 X20=0000aaaaeb1f53f0 X21=0000aaaaeb1e65e0 X22=000000000000004c X23=0000aaaaeb1ee5e0 X24=0000aaaaeb1a3cb0 X25=00000000000000ff X26=0000000000000076 X27=0000ffffc7db4e58 X28=0000aaaaeb1a3cb0 X29=0000ffffc7db4180 X30=0000ffffb0538b98 SP=0000ffffc7db4180 PSTATE=20001000 --C- EL0t QEMU 9.2.50 monitor - type 'help' for more information (qemu) quit Thread 4 received signal SIGABRT, Aborted. [Switching to Thread 1.4] cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32 32 arm_cpuidle_restore_irq_context(&context); (gdb) alex Undefined command: "alex". Try "help". (gdb) bt #0 cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32 #1 0xffff800081962180 in arch_cpu_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:44 #2 0xffff8000819622c4 in default_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:117 #3 0xffff80008013af8c in cpuidle_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:185 #4 do_idle () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:325 #5 0xffff80008013b208 in cpu_startup_entry (state=state@entry=CPUHP_AP_ONLINE_IDLE) at /home/alex/lsrc/linux.git/kernel/sched/idle.c:423 #6 0xffff800080043668 in secondary_start_kernel () at /home/alex/lsrc/linux.git/arch/arm64/kernel/smp.c:279 #7 0xffff800080051f78 in __secondary_switched () at /home/alex/lsrc/linux.git/arch/arm64/kernel/head.S:420 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info threads Id Target Id Frame 1 Thread 1.1 (CPU#0 [running]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32 2 Thread 1.2 (CPU#1 [halted ]) 0x0000ffffb1911eb0 in ?? () 3 Thread 1.3 (CPU#2 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32 * 4 Thread 1.4 (CPU#3 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32 (gdb) thread 2 [Switching to thread 2 (Thread 1.2)] #0 0x0000ffffb1911eb0 in ?? () (gdb) bt #0 0x0000ffffb1911eb0 in ?? () #1 0x0000aaaaeb1ea5e0 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) frame 0 #0 0x0000ffffb1911eb0 in ?? () (gdb) x/5i $pc => 0xffffb1911eb0: str q3, [x0] 0xffffb1911eb4: ldp q2, q3, [x1, #48] 0xffffb1911eb8: subs x2, x2, #0x90 0xffffb1911ebc: b.ls 0xffffb1911ee0 // b.plast 0xffffb1911ec0: stp q0, q1, [x3, #16] (gdb) p/x $x0 $1 = 0xffffb1eb60a4 I suspect that is memcpy again but I'll try and track it down. The only other note is: [ 411.509647] kvm [7713]: Unsupported FSC: EC=0x24 xFSC=0x21 ESR_EL2=0x92000061 Which is: EC 0x24 - Data Abort from lower EL DFSC 0x21 - Alignment fault WnR 1 - Caused by write -- Alex Bennée Virtualisation Tech Lead @ Linaro