Daniel P. Berrangé <berra...@redhat.com> writes:

> On Mon, Feb 24, 2025 at 10:56:12AM +0000, Alex Bennée wrote:
>> Daniel P. Berrangé <berra...@redhat.com> writes:
>> 
>> > On Fri, Feb 21, 2025 at 04:01:01PM +0000, Alex Bennée wrote:
>> >> While running the new GPU tests it was noted that the proprietary
>> >> nVidia driver barfed when run under the sanitiser:
>> >> 
>> >>   2025-02-20 11:13:08,226: [11:13:07.782] Output 'headless' attempts
>> >>   EOTF mode SDR and colorimetry mode default.
>> >>   2025-02-20 11:13:08,227: [11:13:07.784] Output 'headless' using color
>> >>   profile: stock sRGB color profile
>> >> 
>> >>   and that's the last thing it outputs.
>> >> 
>> >>   The sanitizer reports that when the framework sends the SIGTERM
>> >>   because of the timeout we get a write to a NULL pointer (but
>> >>   interesting not this time in an atexit callback):
>> >> 
>> >>   UndefinedBehaviorSanitizer:DEADLYSIGNAL
>> >>   ==471863==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address
>> >>   0x000000000000 (pc 0x7a18ceaafe80 bp 0x000000000000 sp 0x7ffe8e3ff6d0
>> >>   T471863)
>> >>   ==471863==The signal is caused by a WRITE memory access.
>> >>   ==471863==Hint: address points to the zero page.
>> >>       #0 0x7a18ceaafe80
>> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x16afe80)
>> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
>> >>       #1 0x7a18ce9e72c0
>> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15e72c0)
>> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
>> >>       #2 0x7a18ce9f11bb
>> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15f11bb)
>> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
>> >>       #3 0x7a18ce6dc9d1
>> >>   (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x12dc9d1)
>> >>   (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
>> >>       #4 0x7a18e7d15326 in vrend_renderer_create_fence
>> >>   
>> >> /usr/src/virglrenderer-1.0.0-1ubuntu2/obj-x86_64-linux-gnu/../src/vrend_renderer.c:10883:26
>> >>       #5 0x55bfb6621871 in virtio_gpu_virgl_process_cmd
>> >> 
>> >> The #dri-devel channel confirmed:
>> >> 
>> >>   <digetx> stsquad: nv driver is known to not work with venus, don't use
>> >>       it for testing
>> >> 
>> >> So lets implement a blocklist to stop users starting a known bad
>> >> setup.
>> >
>> > I don't much like the conceptual idea of blocking usage of QEMU itself
>> > based on current point-in-time bugs in the host OS driver stack, because
>> > it is making an assertion that all future versions of the driver will
>> > also be broken and that's not generally valid.
>> >
>> > If the user chose to use a dodgy graphics driver, they can deal with
>> > the consequences of their choice.
>> >
>> > Skipping only the functional test, without any qemu-system code changes
>> > though is more palettable as that's not a hard block on usage.
>> 
>> Well how do you do one without the other? I don't want to always skip the
>> vulkan testing because some developer setups have broken drivers. Unless
>> you are suggesting something like:
>> 
>>   -device virtio-vga-gl,hostmem=4G,blob=on,venus=on,ignore-nvidia=on
>> 
>> or something like that?
>
> I was thinking that test_aarch64_virt_gpu.py would dynamically check
> the kernel driver and use that in its @skip annotation.

If we can make the vulkan-info tool a dependency we could certainly do
that - otherwise the host-gpu code would need to be built as a command
line helper.

>
>
> With regards,
> Daniel

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to