Daniel P. Berrangé <berra...@redhat.com> writes: > On Mon, Feb 24, 2025 at 10:56:12AM +0000, Alex Bennée wrote: >> Daniel P. Berrangé <berra...@redhat.com> writes: >> >> > On Fri, Feb 21, 2025 at 04:01:01PM +0000, Alex Bennée wrote: >> >> While running the new GPU tests it was noted that the proprietary >> >> nVidia driver barfed when run under the sanitiser: >> >> >> >> 2025-02-20 11:13:08,226: [11:13:07.782] Output 'headless' attempts >> >> EOTF mode SDR and colorimetry mode default. >> >> 2025-02-20 11:13:08,227: [11:13:07.784] Output 'headless' using color >> >> profile: stock sRGB color profile >> >> >> >> and that's the last thing it outputs. >> >> >> >> The sanitizer reports that when the framework sends the SIGTERM >> >> because of the timeout we get a write to a NULL pointer (but >> >> interesting not this time in an atexit callback): >> >> >> >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> >> ==471863==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address >> >> 0x000000000000 (pc 0x7a18ceaafe80 bp 0x000000000000 sp 0x7ffe8e3ff6d0 >> >> T471863) >> >> ==471863==The signal is caused by a WRITE memory access. >> >> ==471863==Hint: address points to the zero page. >> >> #0 0x7a18ceaafe80 >> >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x16afe80) >> >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) >> >> #1 0x7a18ce9e72c0 >> >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15e72c0) >> >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) >> >> #2 0x7a18ce9f11bb >> >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15f11bb) >> >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) >> >> #3 0x7a18ce6dc9d1 >> >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x12dc9d1) >> >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) >> >> #4 0x7a18e7d15326 in vrend_renderer_create_fence >> >> >> >> /usr/src/virglrenderer-1.0.0-1ubuntu2/obj-x86_64-linux-gnu/../src/vrend_renderer.c:10883:26 >> >> #5 0x55bfb6621871 in virtio_gpu_virgl_process_cmd >> >> >> >> The #dri-devel channel confirmed: >> >> >> >> <digetx> stsquad: nv driver is known to not work with venus, don't use >> >> it for testing >> >> >> >> So lets implement a blocklist to stop users starting a known bad >> >> setup. >> > >> > I don't much like the conceptual idea of blocking usage of QEMU itself >> > based on current point-in-time bugs in the host OS driver stack, because >> > it is making an assertion that all future versions of the driver will >> > also be broken and that's not generally valid. >> > >> > If the user chose to use a dodgy graphics driver, they can deal with >> > the consequences of their choice. >> > >> > Skipping only the functional test, without any qemu-system code changes >> > though is more palettable as that's not a hard block on usage. >> >> Well how do you do one without the other? I don't want to always skip the >> vulkan testing because some developer setups have broken drivers. Unless >> you are suggesting something like: >> >> -device virtio-vga-gl,hostmem=4G,blob=on,venus=on,ignore-nvidia=on >> >> or something like that? > > I was thinking that test_aarch64_virt_gpu.py would dynamically check > the kernel driver and use that in its @skip annotation.
If we can make the vulkan-info tool a dependency we could certainly do that - otherwise the host-gpu code would need to be built as a command line helper. > > > With regards, > Daniel -- Alex Bennée Virtualisation Tech Lead @ Linaro