From: David Hildenbrand <da...@redhat.com> Sent: Monday, June 2, 2025 2:48 AM
> 
> On 23.05.25 18:15, mhkelle...@gmail.com wrote:
> > From: Michael Kelley <mhkli...@outlook.com>
> >
> > Current defio code works only for framebuffer memory that is allocated
> > with vmalloc(). The code assumes that the underlying page refcount can
> > be used by the mm subsystem to manage each framebuffer page's lifecycle,
> > including freeing the page if the refcount goes to 0. This approach is
> > consistent with vmalloc'ed memory, but not with contiguous kernel memory
> > allocated via alloc_pages() or similar. The latter such memory pages
> > usually have a refcount of 0 when allocated, and would be incorrectly
> > freed page-by-page if used with defio. That free'ing corrupts the memory
> > free lists and Linux eventually panics. Simply bumping the refcount after
> > allocation doesn’t work because when the framebuffer memory is freed,
> > __free_pages() complains about non-zero refcounts.
> >
> > Commit 37b4837959cb ("video: deferred io with physically contiguous
> > memory") from the year 2008 purported to add support for contiguous
> > kernel memory framebuffers. The motivating device, sh_mobile_lcdcfb, uses
> > dma_alloc_coherent() to allocate framebuffer memory, which is likely to
> > use alloc_pages(). It's unclear to me how this commit actually worked at
> > the time, unless dma_alloc_coherent() was pulling from a CMA pool instead
> > of alloc_pages(). Or perhaps alloc_pages() worked differently or on the
> > arm32 architecture on which sh_mobile_lcdcfb is used.
> >
> > In any case, for x86 and arm64 today, commit 37b4837959cb9 is not
> > sufficient to support contiguous kernel memory framebuffers. The problem
> > can be seen with the hyperv_fb driver, which may allocate the framebuffer
> > memory using vmalloc() or alloc_pages(), depending on the configuration
> > of the Hyper-V guest VM (Gen 1 vs. Gen 2) and the size of the framebuffer.
> >
> > Fix this limitation by adding defio support for contiguous kernel memory
> > framebuffers. A driver with a framebuffer allocated from contiguous
> > kernel memory must set the FBINFO_KMEMFB flag to indicate such.
> >
> > Tested with the hyperv_fb driver in both configurations -- with a vmalloc()
> > framebuffer and with an alloc_pages() framebuffer on x86. Also verified a
> > vmalloc() framebuffer on arm64. Hardware is not available to me to verify
> > that the older arm32 devices still work correctly, but the path for
> > vmalloc() framebuffers is essentially unchanged.
> >
> > Even with these changes, defio does not support framebuffers in MMIO
> > space, as defio code depends on framebuffer memory pages having
> > corresponding 'struct page's.
> >
> > Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb 
> > on HyperV Gen 1 VMs.")
> > Signed-off-by: Michael Kelley <mhkli...@outlook.com>
> > ---
> > Changes in v3:
> > * Moved definition of FBINFO_KMEMFB flag to a separate patch
> >    preceeding this one in the patch set [Helge Deller]
> > Changes in v2:
> > * Tweaked code comments regarding framebuffers allocated with
> >    dma_alloc_coherent() [Christoph Hellwig]
> >
> >   drivers/video/fbdev/core/fb_defio.c | 128 +++++++++++++++++++++++-----
> >   1 file changed, 108 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/video/fbdev/core/fb_defio.c 
> > b/drivers/video/fbdev/core/fb_defio.c
> > index 4fc93f253e06..f8ae91a1c4df 100644
> > --- a/drivers/video/fbdev/core/fb_defio.c
> > +++ b/drivers/video/fbdev/core/fb_defio.c
> > @@ -8,11 +8,40 @@
> >    * for more details.
> >    */
> >
> > +/*
> > + * Deferred I/O ("defio") allows framebuffers that are mmap()'ed to user 
> > space
> > + * to batch user space writes into periodic updates to the underlying
> > + * framebuffer hardware or other implementation (such as with a virtualized
> > + * framebuffer in a VM). At each batch interval, a callback is invoked in 
> > the
> > + * framebuffer's kernel driver, and the callback is supplied with a list of
> > + * pages that have been modified in the preceding interval. The callback 
> > can
> > + * use this information to update the framebuffer hardware as necessary. 
> > The
> > + * batching can improve performance and reduce the overhead of updating the
> > + * hardware.
> > + *
> > + * Defio is supported on framebuffers allocated using vmalloc() and 
> > allocated
> > + * as contiguous kernel memory using alloc_pages() or kmalloc(). These
> > + * memory allocations all have corresponding "struct page"s. Framebuffers
> > + * allocated using dma_alloc_coherent() should not be used with defio.
> > + * Such allocations should be treated as a black box owned by the DMA
> > + * layer, and should not be deconstructed into individual pages as defio
> > + * does. Framebuffers in MMIO space are *not* supported because MMIO space
> > + * does not have corrresponding "struct page"s.
> > + *
> > + * For framebuffers allocated using vmalloc(), struct fb_info must have
> > + * "screen_buffer" set to the vmalloc address of the framebuffer. For
> > + * framebuffers allocated from contiguous kernel memory, FBINFO_KMEMFB must
> > + * be set, and "fix.smem_start" must be set to the physical address of the
> > + * frame buffer. In both cases, "fix.smem_len" must be set to the 
> > framebuffer
> > + * size in bytes.
> > + */
> > +
> >   #include <linux/module.h>
> >   #include <linux/kernel.h>
> >   #include <linux/errno.h>
> >   #include <linux/string.h>
> >   #include <linux/mm.h>
> > +#include <linux/pfn_t.h>
> >   #include <linux/vmalloc.h>
> >   #include <linux/delay.h>
> >   #include <linux/interrupt.h>
> > @@ -37,7 +66,7 @@ static struct page *fb_deferred_io_get_page(struct 
> > fb_info *info, unsigned long
> >     else if (info->fix.smem_start)
> >             page = pfn_to_page((info->fix.smem_start + offs) >> PAGE_SHIFT);
> >
> > -   if (page)
> > +   if (page && !(info->flags & FBINFO_KMEMFB))
> >             get_page(page);
> >
> >     return page;
> > @@ -137,6 +166,15 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault 
> > *vmf)
> >
> >     BUG_ON(!info->fbdefio->mapping);
> >
> > +   if (info->flags & FBINFO_KMEMFB)
> > +           /*
> > +            * In this path, the VMA is marked VM_PFNMAP, so mm assumes
> > +            * there is no struct page associated with the page. The
> > +            * PFN must be directly inserted and the created PTE will be
> > +            * marked "special".
> > +            */
> > +           return vmf_insert_pfn(vmf->vma, vmf->address, 
> > page_to_pfn(page));
> > +
> >     vmf->page = page;
> >     return 0;
> >   }
> > @@ -163,13 +201,14 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_fsync);
> >
> >   /*
> >    * Adds a page to the dirty list. Call this from struct
> > - * vm_operations_struct.page_mkwrite.
> > + * vm_operations_struct.page_mkwrite or .pfn_mkwrite.
> >    */
> > -static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned 
> > long offset,
> > +static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, struct 
> > vm_fault *vmf,
> >                                         struct page *page)
> >   {
> >     struct fb_deferred_io *fbdefio = info->fbdefio;
> >     struct fb_deferred_io_pageref *pageref;
> > +   unsigned long offset = vmf->pgoff << PAGE_SHIFT;
> >     vm_fault_t ret;
> >
> >     /* protect against the workqueue changing the page list */
> > @@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct 
> > fb_info *info, unsigned long
> >     }
> >
> >     /*
> > -    * We want the page to remain locked from ->page_mkwrite until
> > -    * the PTE is marked dirty to avoid mapping_wrprotect_range()
> > -    * being called before the PTE is updated, which would leave
> > -    * the page ignored by defio.
> > -    * Do this by locking the page here and informing the caller
> > -    * about it with VM_FAULT_LOCKED.
> > +    * The PTE must be marked writable before the defio deferred work runs
> > +    * again and potentially marks the PTE write-protected. If the order
> > +    * should be switched, the PTE would become writable without defio
> > +    * tracking the page, leaving the page forever ignored by defio.
> > +    *
> > +    * For vmalloc() framebuffers, the associated struct page is locked
> > +    * before releasing the defio lock. mm will later mark the PTE writaable
> > +    * and release the struct page lock. The struct page lock prevents
> > +    * the page from being prematurely being marked write-protected.
> > +    *
> > +    * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page,
> > +    * so the PTE must be marked writable while the defio lock is held.
> >      */
> > -   lock_page(pageref->page);
> > +   if (info->flags & FBINFO_KMEMFB) {
> > +           unsigned long pfn = page_to_pfn(pageref->page);
> > +
> > +           ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address,
> > +                                          __pfn_to_pfn_t(pfn, 
> > PFN_SPECIAL));
> 
> Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a
> horrible hack.
> 
> In another thread, you mention that you use PFN_SPECIAL to bypass the
> check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set?

The VMA has VM_PFNMAP set, not VM_MIXEDMAP.  It seemed like
VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's
a wrong impression. vm_mixed_ok() does a thorough job of validating the
use of __vm_insert_mixed(), and since what I did was allowed, I thought
perhaps it was OK. Your feedback has set me straight, and that's what I 
needed. :-)

But the whole approach is moot with Alistair Popple's patch set that
eliminates pfn_t. Is there an existing mm API that will do mkwrite on a
special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed
it. If there's not one, I'll take a crack at adding it in the next version of my
patch set.

Michael

Reply via email to