From: David Hildenbrand <da...@redhat.com> Sent: Monday, June 2, 2025 2:48 AM > > On 23.05.25 18:15, mhkelle...@gmail.com wrote: > > From: Michael Kelley <mhkli...@outlook.com> > > > > Current defio code works only for framebuffer memory that is allocated > > with vmalloc(). The code assumes that the underlying page refcount can > > be used by the mm subsystem to manage each framebuffer page's lifecycle, > > including freeing the page if the refcount goes to 0. This approach is > > consistent with vmalloc'ed memory, but not with contiguous kernel memory > > allocated via alloc_pages() or similar. The latter such memory pages > > usually have a refcount of 0 when allocated, and would be incorrectly > > freed page-by-page if used with defio. That free'ing corrupts the memory > > free lists and Linux eventually panics. Simply bumping the refcount after > > allocation doesn’t work because when the framebuffer memory is freed, > > __free_pages() complains about non-zero refcounts. > > > > Commit 37b4837959cb ("video: deferred io with physically contiguous > > memory") from the year 2008 purported to add support for contiguous > > kernel memory framebuffers. The motivating device, sh_mobile_lcdcfb, uses > > dma_alloc_coherent() to allocate framebuffer memory, which is likely to > > use alloc_pages(). It's unclear to me how this commit actually worked at > > the time, unless dma_alloc_coherent() was pulling from a CMA pool instead > > of alloc_pages(). Or perhaps alloc_pages() worked differently or on the > > arm32 architecture on which sh_mobile_lcdcfb is used. > > > > In any case, for x86 and arm64 today, commit 37b4837959cb9 is not > > sufficient to support contiguous kernel memory framebuffers. The problem > > can be seen with the hyperv_fb driver, which may allocate the framebuffer > > memory using vmalloc() or alloc_pages(), depending on the configuration > > of the Hyper-V guest VM (Gen 1 vs. Gen 2) and the size of the framebuffer. > > > > Fix this limitation by adding defio support for contiguous kernel memory > > framebuffers. A driver with a framebuffer allocated from contiguous > > kernel memory must set the FBINFO_KMEMFB flag to indicate such. > > > > Tested with the hyperv_fb driver in both configurations -- with a vmalloc() > > framebuffer and with an alloc_pages() framebuffer on x86. Also verified a > > vmalloc() framebuffer on arm64. Hardware is not available to me to verify > > that the older arm32 devices still work correctly, but the path for > > vmalloc() framebuffers is essentially unchanged. > > > > Even with these changes, defio does not support framebuffers in MMIO > > space, as defio code depends on framebuffer memory pages having > > corresponding 'struct page's. > > > > Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb > > on HyperV Gen 1 VMs.") > > Signed-off-by: Michael Kelley <mhkli...@outlook.com> > > --- > > Changes in v3: > > * Moved definition of FBINFO_KMEMFB flag to a separate patch > > preceeding this one in the patch set [Helge Deller] > > Changes in v2: > > * Tweaked code comments regarding framebuffers allocated with > > dma_alloc_coherent() [Christoph Hellwig] > > > > drivers/video/fbdev/core/fb_defio.c | 128 +++++++++++++++++++++++----- > > 1 file changed, 108 insertions(+), 20 deletions(-) > > > > diff --git a/drivers/video/fbdev/core/fb_defio.c > > b/drivers/video/fbdev/core/fb_defio.c > > index 4fc93f253e06..f8ae91a1c4df 100644 > > --- a/drivers/video/fbdev/core/fb_defio.c > > +++ b/drivers/video/fbdev/core/fb_defio.c > > @@ -8,11 +8,40 @@ > > * for more details. > > */ > > > > +/* > > + * Deferred I/O ("defio") allows framebuffers that are mmap()'ed to user > > space > > + * to batch user space writes into periodic updates to the underlying > > + * framebuffer hardware or other implementation (such as with a virtualized > > + * framebuffer in a VM). At each batch interval, a callback is invoked in > > the > > + * framebuffer's kernel driver, and the callback is supplied with a list of > > + * pages that have been modified in the preceding interval. The callback > > can > > + * use this information to update the framebuffer hardware as necessary. > > The > > + * batching can improve performance and reduce the overhead of updating the > > + * hardware. > > + * > > + * Defio is supported on framebuffers allocated using vmalloc() and > > allocated > > + * as contiguous kernel memory using alloc_pages() or kmalloc(). These > > + * memory allocations all have corresponding "struct page"s. Framebuffers > > + * allocated using dma_alloc_coherent() should not be used with defio. > > + * Such allocations should be treated as a black box owned by the DMA > > + * layer, and should not be deconstructed into individual pages as defio > > + * does. Framebuffers in MMIO space are *not* supported because MMIO space > > + * does not have corrresponding "struct page"s. > > + * > > + * For framebuffers allocated using vmalloc(), struct fb_info must have > > + * "screen_buffer" set to the vmalloc address of the framebuffer. For > > + * framebuffers allocated from contiguous kernel memory, FBINFO_KMEMFB must > > + * be set, and "fix.smem_start" must be set to the physical address of the > > + * frame buffer. In both cases, "fix.smem_len" must be set to the > > framebuffer > > + * size in bytes. > > + */ > > + > > #include <linux/module.h> > > #include <linux/kernel.h> > > #include <linux/errno.h> > > #include <linux/string.h> > > #include <linux/mm.h> > > +#include <linux/pfn_t.h> > > #include <linux/vmalloc.h> > > #include <linux/delay.h> > > #include <linux/interrupt.h> > > @@ -37,7 +66,7 @@ static struct page *fb_deferred_io_get_page(struct > > fb_info *info, unsigned long > > else if (info->fix.smem_start) > > page = pfn_to_page((info->fix.smem_start + offs) >> PAGE_SHIFT); > > > > - if (page) > > + if (page && !(info->flags & FBINFO_KMEMFB)) > > get_page(page); > > > > return page; > > @@ -137,6 +166,15 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault > > *vmf) > > > > BUG_ON(!info->fbdefio->mapping); > > > > + if (info->flags & FBINFO_KMEMFB) > > + /* > > + * In this path, the VMA is marked VM_PFNMAP, so mm assumes > > + * there is no struct page associated with the page. The > > + * PFN must be directly inserted and the created PTE will be > > + * marked "special". > > + */ > > + return vmf_insert_pfn(vmf->vma, vmf->address, > > page_to_pfn(page)); > > + > > vmf->page = page; > > return 0; > > } > > @@ -163,13 +201,14 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_fsync); > > > > /* > > * Adds a page to the dirty list. Call this from struct > > - * vm_operations_struct.page_mkwrite. > > + * vm_operations_struct.page_mkwrite or .pfn_mkwrite. > > */ > > -static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned > > long offset, > > +static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, struct > > vm_fault *vmf, > > struct page *page) > > { > > struct fb_deferred_io *fbdefio = info->fbdefio; > > struct fb_deferred_io_pageref *pageref; > > + unsigned long offset = vmf->pgoff << PAGE_SHIFT; > > vm_fault_t ret; > > > > /* protect against the workqueue changing the page list */ > > @@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct > > fb_info *info, unsigned long > > } > > > > /* > > - * We want the page to remain locked from ->page_mkwrite until > > - * the PTE is marked dirty to avoid mapping_wrprotect_range() > > - * being called before the PTE is updated, which would leave > > - * the page ignored by defio. > > - * Do this by locking the page here and informing the caller > > - * about it with VM_FAULT_LOCKED. > > + * The PTE must be marked writable before the defio deferred work runs > > + * again and potentially marks the PTE write-protected. If the order > > + * should be switched, the PTE would become writable without defio > > + * tracking the page, leaving the page forever ignored by defio. > > + * > > + * For vmalloc() framebuffers, the associated struct page is locked > > + * before releasing the defio lock. mm will later mark the PTE writaable > > + * and release the struct page lock. The struct page lock prevents > > + * the page from being prematurely being marked write-protected. > > + * > > + * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page, > > + * so the PTE must be marked writable while the defio lock is held. > > */ > > - lock_page(pageref->page); > > + if (info->flags & FBINFO_KMEMFB) { > > + unsigned long pfn = page_to_pfn(pageref->page); > > + > > + ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, > > + __pfn_to_pfn_t(pfn, > > PFN_SPECIAL)); > > Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a > horrible hack. > > In another thread, you mention that you use PFN_SPECIAL to bypass the > check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set?
The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's a wrong impression. vm_mixed_ok() does a thorough job of validating the use of __vm_insert_mixed(), and since what I did was allowed, I thought perhaps it was OK. Your feedback has set me straight, and that's what I needed. :-) But the whole approach is moot with Alistair Popple's patch set that eliminates pfn_t. Is there an existing mm API that will do mkwrite on a special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed it. If there's not one, I'll take a crack at adding it in the next version of my patch set. Michael