On Mon, Feb 24, 2025 at 10:56:12AM +0000, Alex Bennée wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Fri, Feb 21, 2025 at 04:01:01PM +0000, Alex Bennée wrote: > >> While running the new GPU tests it was noted that the proprietary > >> nVidia driver barfed when run under the sanitiser: > >> > >> 2025-02-20 11:13:08,226: [11:13:07.782] Output 'headless' attempts > >> EOTF mode SDR and colorimetry mode default. > >> 2025-02-20 11:13:08,227: [11:13:07.784] Output 'headless' using color > >> profile: stock sRGB color profile > >> > >> and that's the last thing it outputs. > >> > >> The sanitizer reports that when the framework sends the SIGTERM > >> because of the timeout we get a write to a NULL pointer (but > >> interesting not this time in an atexit callback): > >> > >> UndefinedBehaviorSanitizer:DEADLYSIGNAL > >> ==471863==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address > >> 0x000000000000 (pc 0x7a18ceaafe80 bp 0x000000000000 sp 0x7ffe8e3ff6d0 > >> T471863) > >> ==471863==The signal is caused by a WRITE memory access. > >> ==471863==Hint: address points to the zero page. > >> #0 0x7a18ceaafe80 > >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x16afe80) > >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) > >> #1 0x7a18ce9e72c0 > >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15e72c0) > >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) > >> #2 0x7a18ce9f11bb > >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15f11bb) > >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) > >> #3 0x7a18ce6dc9d1 > >> (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x12dc9d1) > >> (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db) > >> #4 0x7a18e7d15326 in vrend_renderer_create_fence > >> > >> /usr/src/virglrenderer-1.0.0-1ubuntu2/obj-x86_64-linux-gnu/../src/vrend_renderer.c:10883:26 > >> #5 0x55bfb6621871 in virtio_gpu_virgl_process_cmd > >> > >> The #dri-devel channel confirmed: > >> > >> <digetx> stsquad: nv driver is known to not work with venus, don't use > >> it for testing > >> > >> So lets implement a blocklist to stop users starting a known bad > >> setup. > > > > I don't much like the conceptual idea of blocking usage of QEMU itself > > based on current point-in-time bugs in the host OS driver stack, because > > it is making an assertion that all future versions of the driver will > > also be broken and that's not generally valid. > > > > If the user chose to use a dodgy graphics driver, they can deal with > > the consequences of their choice. > > > > Skipping only the functional test, without any qemu-system code changes > > though is more palettable as that's not a hard block on usage. > > Well how do you do one without the other? I don't want to always skip the > vulkan testing because some developer setups have broken drivers. Unless > you are suggesting something like: > > -device virtio-vga-gl,hostmem=4G,blob=on,venus=on,ignore-nvidia=on > > or something like that?
I was thinking that test_aarch64_virt_gpu.py would dynamically check the kernel driver and use that in its @skip annotation. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|