Re: [RFC PATCH] hw/display: add blocklist for known bad drivers

Daniel P . Berrangé Mon, 24 Feb 2025 03:08:45 -0800

On Mon, Feb 24, 2025 at 10:56:12AM +0000, Alex Bennée wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Fri, Feb 21, 2025 at 04:01:01PM +0000, Alex Bennée wrote:
> >> While running the new GPU tests it was noted that the proprietary
> >> nVidia driver barfed when run under the sanitiser:
> >> 
> >>   2025-02-20 11:13:08,226: [11:13:07.782] Output 'headless' attempts
> >>   EOTF mode SDR and colorimetry mode default.
> >>   2025-02-20 11:13:08,227: [11:13:07.784] Output 'headless' using color
> >>   profile: stock sRGB color profile
> >> 
> >>   and that's the last thing it outputs.
> >> 
> >>   The sanitizer reports that when the framework sends the SIGTERM
> >>   because of the timeout we get a write to a NULL pointer (but
> >>   interesting not this time in an atexit callback):
> >> 
> >>   UndefinedBehaviorSanitizer:DEADLYSIGNAL
> >>   ==471863==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address
> >>   0x000000000000 (pc 0x7a18ceaafe80 bp 0x000000000000 sp 0x7ffe8e3ff6d0
> >>   T471863)
> >>   ==471863==The signal is caused by a WRITE memory access.
> >>   ==471863==Hint: address points to the zero page.
> >>       #0 0x7a18ceaafe80
> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x16afe80)
> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
> >>       #1 0x7a18ce9e72c0
> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15e72c0)
> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
> >>       #2 0x7a18ce9f11bb
> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15f11bb)
> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
> >>       #3 0x7a18ce6dc9d1
> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x12dc9d1)
> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
> >>       #4 0x7a18e7d15326 in vrend_renderer_create_fence
> >>   
> >> /usr/src/virglrenderer-1.0.0-1ubuntu2/obj-x86_64-linux-gnu/../src/vrend_renderer.c:10883:26
> >>       #5 0x55bfb6621871 in virtio_gpu_virgl_process_cmd
> >> 
> >> The #dri-devel channel confirmed:
> >> 
> >>   <digetx> stsquad: nv driver is known to not work with venus, don't use
> >>       it for testing
> >> 
> >> So lets implement a blocklist to stop users starting a known bad
> >> setup.
> >
> > I don't much like the conceptual idea of blocking usage of QEMU itself
> > based on current point-in-time bugs in the host OS driver stack, because
> > it is making an assertion that all future versions of the driver will
> > also be broken and that's not generally valid.
> >
> > If the user chose to use a dodgy graphics driver, they can deal with
> > the consequences of their choice.
> >
> > Skipping only the functional test, without any qemu-system code changes
> > though is more palettable as that's not a hard block on usage.
> 
> Well how do you do one without the other? I don't want to always skip the
> vulkan testing because some developer setups have broken drivers. Unless
> you are suggesting something like:
> 
>   -device virtio-vga-gl,hostmem=4G,blob=on,venus=on,ignore-nvidia=on
> 
> or something like that?


I was thinking that test_aarch64_virt_gpu.py would dynamically check
the kernel driver and use that in its @skip annotation.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH] hw/display: add blocklist for known bad drivers

Reply via email to