On 03.06.25 03:49, Michael Kelley wrote:
From: David Hildenbrand <da...@redhat.com> Sent: Monday, June 2, 2025 2:48 AM

On 23.05.25 18:15, mhkelle...@gmail.com wrote:
From: Michael Kelley <mhkli...@outlook.com>

Current defio code works only for framebuffer memory that is allocated
with vmalloc(). The code assumes that the underlying page refcount can
be used by the mm subsystem to manage each framebuffer page's lifecycle,
including freeing the page if the refcount goes to 0. This approach is
consistent with vmalloc'ed memory, but not with contiguous kernel memory
allocated via alloc_pages() or similar. The latter such memory pages
usually have a refcount of 0 when allocated, and would be incorrectly
freed page-by-page if used with defio. That free'ing corrupts the memory
free lists and Linux eventually panics. Simply bumping the refcount after
allocation doesn’t work because when the framebuffer memory is freed,
__free_pages() complains about non-zero refcounts.

Commit 37b4837959cb ("video: deferred io with physically contiguous
memory") from the year 2008 purported to add support for contiguous
kernel memory framebuffers. The motivating device, sh_mobile_lcdcfb, uses
dma_alloc_coherent() to allocate framebuffer memory, which is likely to
use alloc_pages(). It's unclear to me how this commit actually worked at
the time, unless dma_alloc_coherent() was pulling from a CMA pool instead
of alloc_pages(). Or perhaps alloc_pages() worked differently or on the
arm32 architecture on which sh_mobile_lcdcfb is used.

In any case, for x86 and arm64 today, commit 37b4837959cb9 is not
sufficient to support contiguous kernel memory framebuffers. The problem
can be seen with the hyperv_fb driver, which may allocate the framebuffer
memory using vmalloc() or alloc_pages(), depending on the configuration
of the Hyper-V guest VM (Gen 1 vs. Gen 2) and the size of the framebuffer.

Fix this limitation by adding defio support for contiguous kernel memory
framebuffers. A driver with a framebuffer allocated from contiguous
kernel memory must set the FBINFO_KMEMFB flag to indicate such.

Tested with the hyperv_fb driver in both configurations -- with a vmalloc()
framebuffer and with an alloc_pages() framebuffer on x86. Also verified a
vmalloc() framebuffer on arm64. Hardware is not available to me to verify
that the older arm32 devices still work correctly, but the path for
vmalloc() framebuffers is essentially unchanged.

Even with these changes, defio does not support framebuffers in MMIO
space, as defio code depends on framebuffer memory pages having
corresponding 'struct page's.

Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb on HyperV 
Gen 1 VMs.")
Signed-off-by: Michael Kelley <mhkli...@outlook.com>
---
Changes in v3:
* Moved definition of FBINFO_KMEMFB flag to a separate patch
    preceeding this one in the patch set [Helge Deller]
Changes in v2:
* Tweaked code comments regarding framebuffers allocated with
    dma_alloc_coherent() [Christoph Hellwig]

   drivers/video/fbdev/core/fb_defio.c | 128 +++++++++++++++++++++++-----
   1 file changed, 108 insertions(+), 20 deletions(-)

diff --git a/drivers/video/fbdev/core/fb_defio.c 
b/drivers/video/fbdev/core/fb_defio.c
index 4fc93f253e06..f8ae91a1c4df 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -8,11 +8,40 @@
    * for more details.
    */

+/*
+ * Deferred I/O ("defio") allows framebuffers that are mmap()'ed to user space
+ * to batch user space writes into periodic updates to the underlying
+ * framebuffer hardware or other implementation (such as with a virtualized
+ * framebuffer in a VM). At each batch interval, a callback is invoked in the
+ * framebuffer's kernel driver, and the callback is supplied with a list of
+ * pages that have been modified in the preceding interval. The callback can
+ * use this information to update the framebuffer hardware as necessary. The
+ * batching can improve performance and reduce the overhead of updating the
+ * hardware.
+ *
+ * Defio is supported on framebuffers allocated using vmalloc() and allocated
+ * as contiguous kernel memory using alloc_pages() or kmalloc(). These
+ * memory allocations all have corresponding "struct page"s. Framebuffers
+ * allocated using dma_alloc_coherent() should not be used with defio.
+ * Such allocations should be treated as a black box owned by the DMA
+ * layer, and should not be deconstructed into individual pages as defio
+ * does. Framebuffers in MMIO space are *not* supported because MMIO space
+ * does not have corrresponding "struct page"s.
+ *
+ * For framebuffers allocated using vmalloc(), struct fb_info must have
+ * "screen_buffer" set to the vmalloc address of the framebuffer. For
+ * framebuffers allocated from contiguous kernel memory, FBINFO_KMEMFB must
+ * be set, and "fix.smem_start" must be set to the physical address of the
+ * frame buffer. In both cases, "fix.smem_len" must be set to the framebuffer
+ * size in bytes.
+ */
+
   #include <linux/module.h>
   #include <linux/kernel.h>
   #include <linux/errno.h>
   #include <linux/string.h>
   #include <linux/mm.h>
+#include <linux/pfn_t.h>
   #include <linux/vmalloc.h>
   #include <linux/delay.h>
   #include <linux/interrupt.h>
@@ -37,7 +66,7 @@ static struct page *fb_deferred_io_get_page(struct fb_info 
*info, unsigned long
        else if (info->fix.smem_start)
                page = pfn_to_page((info->fix.smem_start + offs) >> PAGE_SHIFT);

-       if (page)
+       if (page && !(info->flags & FBINFO_KMEMFB))
                get_page(page);

        return page;
@@ -137,6 +166,15 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault 
*vmf)

        BUG_ON(!info->fbdefio->mapping);

+       if (info->flags & FBINFO_KMEMFB)
+               /*
+                * In this path, the VMA is marked VM_PFNMAP, so mm assumes
+                * there is no struct page associated with the page. The
+                * PFN must be directly inserted and the created PTE will be
+                * marked "special".
+                */
+               return vmf_insert_pfn(vmf->vma, vmf->address, 
page_to_pfn(page));
+
        vmf->page = page;
        return 0;
   }
@@ -163,13 +201,14 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_fsync);

   /*
    * Adds a page to the dirty list. Call this from struct
- * vm_operations_struct.page_mkwrite.
+ * vm_operations_struct.page_mkwrite or .pfn_mkwrite.
    */
-static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned 
long offset,
+static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, struct 
vm_fault *vmf,
                                            struct page *page)
   {
        struct fb_deferred_io *fbdefio = info->fbdefio;
        struct fb_deferred_io_pageref *pageref;
+       unsigned long offset = vmf->pgoff << PAGE_SHIFT;
        vm_fault_t ret;

        /* protect against the workqueue changing the page list */
@@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct 
fb_info *info, unsigned long
        }

        /*
-        * We want the page to remain locked from ->page_mkwrite until
-        * the PTE is marked dirty to avoid mapping_wrprotect_range()
-        * being called before the PTE is updated, which would leave
-        * the page ignored by defio.
-        * Do this by locking the page here and informing the caller
-        * about it with VM_FAULT_LOCKED.
+        * The PTE must be marked writable before the defio deferred work runs
+        * again and potentially marks the PTE write-protected. If the order
+        * should be switched, the PTE would become writable without defio
+        * tracking the page, leaving the page forever ignored by defio.
+        *
+        * For vmalloc() framebuffers, the associated struct page is locked
+        * before releasing the defio lock. mm will later mark the PTE writaable
+        * and release the struct page lock. The struct page lock prevents
+        * the page from being prematurely being marked write-protected.
+        *
+        * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page,
+        * so the PTE must be marked writable while the defio lock is held.
         */
-       lock_page(pageref->page);
+       if (info->flags & FBINFO_KMEMFB) {
+               unsigned long pfn = page_to_pfn(pageref->page);
+
+               ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address,
+                                              __pfn_to_pfn_t(pfn, 
PFN_SPECIAL));

Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a
horrible hack.

In another thread, you mention that you use PFN_SPECIAL to bypass the
check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set?

The VMA has VM_PFNMAP set, not VM_MIXEDMAP.  It seemed like
VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's
a wrong impression.

VM_PFNMAP: nothing is refcounted except anon pages

VM_MIXEDMAP: anything with a "struct page" (pfn_valid()) is refcounted


pte_special() is a way for GUP-fast to distinguish these refcounted (can GUP) from non-refcounted (camnnot GUP) pages mapped by PTEs without any locks or the VMA being available.


Setting pte_special() in VM_MIXEDMAP on ptes that have a "struct page" (pfn_valid()) is likely very bogus.

vm_mixed_ok() does a thorough job of validating the
use of __vm_insert_mixed(), and since what I did was allowed, I thought
perhaps it was OK. Your feedback has set me straight, and that's what I
needed. :-)

What exactly are you trying to achieve? :)

If it's mapping a page with a "struct page" and *not* refcounting it, then vmf_insert_pfn() is the current way to achieve that in a VM_PFNMAP mapping. It will set pte_special() automatically for you.


But the whole approach is moot with Alistair Popple's patch set that
eliminates pfn_t. Is there an existing mm API that will do mkwrite on a
special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed
it. If there's not one, I'll take a crack at adding it in the next version of my
patch set.

I assume you'd want vmf_insert_pfn_mkwrite(), correct? Probably vmf_insert_pfn_prot() can be used by adding PAGE_WRITE to pgprot. (maybe :) )

--
Cheers,

David / dhildenb

Reply via email to