Hi
Am 04.06.25 um 23:43 schrieb Michael Kelley:
From: Simona Vetter <simona.vet...@ffwll.ch> Sent: Wednesday, June 4, 2025 7:46
AM
On Wed, Jun 04, 2025 at 10:12:45AM +0200, Thomas Zimmermann wrote:
Hi
Am 03.06.25 um 19:50 schrieb Michael Kelley:
From: Thomas Zimmermann <tzimmerm...@suse.de> Sent: Monday, June 2, 2025 11:25
PM
Hi
Am 03.06.25 um 03:49 schrieb Michael Kelley:
[...]
What is the motivation behind this work? The driver or fbdev as a whole
does not have much of a future anyway.
I'd like to suggest removing hyperv_fb entirely in favor of hypervdrm?
Yes, I think that's the longer term direction. A couple months ago I had an
email conversation with Saurabh Sengar from the Microsoft Linux team where
he raised this idea. I think the Microsoft folks will need to drive the
deprecation
process, as they need to coordinate with the distro vendors who publish
images for running on local Hyper-V and in the Azure cloud. And my
understanding is that the Linux kernel process would want the driver to
be available but marked "deprecated" for a year or so before it actually
goes away.
We (DRM upstream) recently considered moving some fbdev drivers to
drivers/staging or marking them with !DRM if a DRM driver is available.
Hyverv_fb would be a candidate.
At least at SUSE, we ship hypervdrm instead of hyperv_fb. This works well on
the various generations of the hyperv system. Much of our userspace would
not be able to use hyperv_fb anyway.
Good to know. Red Hat has made the switch as well. The Ubuntu images
in Azure have both hyperv_fb and hyperv_drm. I don't know what other
distros have done.
Yeah investing into fbdev drivers, especially when some mm surgery seems
needed, does not sound like a good idea to me overall.
I do have some concerns about the maturity of the hyperv_drm driver
"around the edges". For example, somebody just recently submitted a
patch to flush output on panic. I have less familiarity hyperv_drm vs.
hyperv_fb, so some of my concern is probably due to that. We might
need to do review of hyperv_drm and see if there's anything else to
deal with before hyperv_fb goes away.
The panic output is a feature that we recently added to the kernel. It
allows a DRM driver to display a final error message in the case of a kernel
panic (think of blue screens on Windows). Drivers require a minimum of
support to make it work. That's what the hypervdrm patches were about.
I'm also happy to help with any other issues and shortfalls of drm vs
fbdev. There are some, but I thought it was mostly around some of the low
bit color formats that really old devices want, and not anything that
hyperv would need.
You've set me up perfectly to raise an issue. :-) I'm still relatively new
to the hyperv_drm driver and DRM in general, compared with hyperv_fb.
One capability in fbdev is deferred I/O, which is what this entire patch
series is about. The hyperv_drm driver doesn't currently use anything
similar to deferred I/O like hyperv_fb. I don't know if that's because
hyperv_drm doesn't make use of what DRM has to offer, or if DRM doesn't
have a deferred I/O framework like fbdev. Do you know what the situation
is? Or could you point me to an example of doing deferred I/O with DRM
that hyperv_drm should be following?
Fbdev deferred I/O is a workaround for the fact that fbdev does not
require a flush operation on its I/O buffers. Writing to an mmaped
buffer is expected to go to hardware immediately. On devices where this
is not the case, deferred I/O tracks written pages and writes them back
to hardware at intervals.
For DRM, there's the MODE_DIRTYFB ioctl [1] that all userspace has to
call after writing to mmap'ed buffers. So regular DRM doesn't need
deferred I/O as userspace triggers writeback explicitly.
[1]
https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_ioctl.c#L686
I ran a quick performance test comparing hyperv_drm with hyperv_fb.
The test does "cat" of a big text file in the Hyper-V graphics console. The
file has 1024 * 1024 lines, each with 64 characters, so total file size is
64 MiB.
With hyperv_fb the test completes in 24 seconds elapsed time, with
24 seconds of system CPU time. With hyperv_drm, it takes 34 seconds
elapsed time, but with about the same 24 seconds of system CPU time.
Overall this difference isn't huge, and probably isn't that noticeable
when doing human-scale work (i.e., 'dmesg' outputting several
hundred lines in 0.19 seconds vs. my test doing 1M lines) on the Hyper-V
graphics console. To me, the console doesn't feel slow with hyperv_drm
compared to hyperv_fb, which is good.
DRM consoles are technically an fbdev device that operates on a DRM
device. Both, DRM and fbdev, have some differences that can make this
problematic. I'm not surprised that there are issues.
Nonetheless, there's an underlying issue. A main cause of the difference
is the number of messages to Hyper-V to update dirty regions. With
hyperv_fb using deferred I/O, the messages are limited 20/second, so
the total number of messages to Hyper-V is about 480. But hyperv_drm
appears to send 3 messages to Hyper-V for each line of output, or a total of
about 3,000,000 messages (~90K/second). That's a lot of additional load
on the Hyper-V host, and it adds the 10 seconds of additional elapsed
time seen in the guest. There also this ugly output in dmesg because the
ring buffer for sending messages to the Hyper-V host gets full -- Hyper-V
doesn't always keep up, at least not on my local laptop where I'm
testing:
[12574.327615] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12574.327684] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12574.327760] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12574.327841] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12597.016128] hyperv_sendpacket: 6211 callbacks suppressed
[12597.016133] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12597.016172] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12597.016220] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
[12597.016267] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] *ERROR*
Unable to send packet via vmbus; error -11
hyperv_drm could be fixed to not output the ugly messages, but there's
still the underlying issue of overrunning the ring buffer, and excessively
hammering on the host. If we could get hyperv_drm doing deferred I/O, I
would feel much better about going full-on with deprecating hyperv_fb.
Thanks for debugging this. A number of things are playing into this.
- DRM performs display output along vblank IRQs. For example, if the
display runs with 60 Hz there should be no more than 60 display updates
per second. From what I can tell, there's no IRQ support in hypervdrm
(or HyperV in general?). Without IRQ support, drivers output to hardware
ASAP, which can result in large numbers of buffer updates per second.
I've heard about this problem in other context [2] and you're likely
seeing a similar issue.
- DRM's console also needs better support for vblank interrupts. It
currently sends out updates ASAP as well.
Both points are not much of a problem on most desktop and server
systems, but can be an be an issue with virtualization.
[2] https://bugzilla.suse.com/show_bug.cgi?id=1189174
Best regards
Thomas
Michael
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)