On Mon, 2024-04-22 at 10:51 +0800, Tiwei Bie wrote: > On 4/18/24 5:23 PM, benja...@sipsolutions.net wrote: > > diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h > > index 37eb6e89e79a..bf8da736609c 100644 > > --- a/arch/um/include/asm/mmu.h > > +++ b/arch/um/include/asm/mmu.h > > @@ -10,6 +10,10 @@ > > > > typedef struct mm_context { > > struct mm_id id; > > + > > + /* Address range in need of a TLB sync */ > > + long int sync_tlb_range_from; > > + long int sync_tlb_range_to; > > Why not "unsigned long"?
Oops, yes, it should be "unsigned long". > > > } mm_context_t; > > > > extern void __switch_mm(struct mm_id * mm_idp); > > diff --git a/arch/um/include/asm/pgtable.h > > b/arch/um/include/asm/pgtable.h > > index e1ece21dbe3f..5bb397b65efb 100644 > > --- a/arch/um/include/asm/pgtable.h > > +++ b/arch/um/include/asm/pgtable.h > > @@ -244,6 +244,38 @@ static inline void set_pte(pte_t *pteptr, > > pte_t pteval) > > > > #define PFN_PTE_SHIFT PAGE_SHIFT > > > > +static inline void um_tlb_mark_sync(struct mm_struct *mm, unsigned > > long start, > > + unsigned long end) > > +{ > > + if (!mm->context.sync_tlb_range_to) { > > + mm->context.sync_tlb_range_from = start; > > + mm->context.sync_tlb_range_to = end; > > + } else { > > + if (start < mm->context.sync_tlb_range_from) > > + mm->context.sync_tlb_range_from = start; > > + if (end > mm->context.sync_tlb_range_to) > > + mm->context.sync_tlb_range_to = end; > > + } > > +} > > IIUC, in some cases, the range [sync_tlb_range_from, sync_tlb_range_to) > might become very large when merging non-adjacent ranges? Could that > be an issue? I figured it is not a big problem. It will result in scanning the entire page table once to check whether the NEW_PAGE bit is set on any PTE. I am assuming that this will happen almost never and scanning the page table (but not doing syscalls) is reasonably cheap at the end. > > diff --git a/arch/um/include/asm/tlbflush.h b/arch/um/include/asm/tlbflush.h > > index d7cf82023b74..62816f6f1c91 100644 > > --- a/arch/um/include/asm/tlbflush.h > > +++ b/arch/um/include/asm/tlbflush.h > > @@ -9,24 +9,50 @@ > > #include <linux/mm.h> > > > > /* > > - * TLB flushing: > > + * In UML, we need to sync the TLB over by using mmap/munmap/mprotect > > syscalls > > + * from the process handling the MM (which can be the kernel itself). > > + * > > + * To track updates, we can hook into set_ptes and flush_tlb_*. With > > set_ptes > > + * we catch all PTE transitions where memory that was unusable becomes > > usable. > > + * While with flush_tlb_* we can track any memory that becomes unusable and > > + * even if a higher layer of the page table was modified. > > + * > > + * So, we simply track updates using both methods and mark the memory area > > to > > + * be synced later on. The only special case is that flush_tlb_kern_* > > needs to > > + * be executed immediately as there is no good synchronization point in > > that > > + * case. In contrast, in the set_ptes case we can wait for the next kernel > > + * segfault before we do the synchornization. > > * > > - * - flush_tlb() flushes the current mm struct TLBs > > * - flush_tlb_all() flushes all processes TLBs > > * - flush_tlb_mm(mm) flushes the specified mm context TLB's > > * - flush_tlb_page(vma, vmaddr) flushes one page > > - * - flush_tlb_kernel_vm() flushes the kernel vm area > > * - flush_tlb_range(vma, start, end) flushes a range of pages > > + * - flush_tlb_kernel_range(start, end) flushes a range of kernel pages > > */ > > > > +extern int um_tlb_sync(struct mm_struct *mm); > > + > > extern void flush_tlb_all(void); > > extern void flush_tlb_mm(struct mm_struct *mm); > > -extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long > > start, > > - unsigned long end); > > -extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long > > address); > > -extern void flush_tlb_kernel_vm(void); > > -extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); > > -extern void __flush_tlb_one(unsigned long addr); > > + > > +static void flush_tlb_page(struct vm_area_struct *vma, unsigned long > > address) > > +{ > > + um_tlb_mark_sync(vma->vm_mm, address, address + PAGE_SIZE); > > +} > > + > > +static void flush_tlb_range(struct vm_area_struct *vma, unsigned long > > start, > > + unsigned long end) > > +{ > > + um_tlb_mark_sync(vma->vm_mm, start, end); > > +} > > + > > +static void flush_tlb_kernel_range(unsigned long start, unsigned long end) > > +{ > > + um_tlb_mark_sync(&init_mm, start, end); > > + > > + /* Kernel needs to be synced immediately */ > > + um_tlb_sync(&init_mm); > > +} > > Nit: this is a header file, these functions should be defined as inline > functions. Yup, thanks! > > diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c > > index c137ff6f84dd..232aa7601d5d 100644 > > --- a/arch/um/kernel/tlb.c > > +++ b/arch/um/kernel/tlb.c > [...] > > > > -void flush_tlb_kernel_range(unsigned long start, unsigned long > > end) > > -{ > > - flush_tlb_kernel_range_common(start, end); > > -} > > - > > -void flush_tlb_kernel_vm(void) > > -{ > > - flush_tlb_kernel_range_common(start_vm, end_vm); > > -} > > The build breaks with this change, as there is still a call to > flush_tlb_kernel_vm() in ubd. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/ubd_kern.c?id=fb5d1d389c9e78d68f1f71f926d6251017579f5b#n774 Oh, thanks for the pointer! I do not see a good reason for that call to even exist. My best theory right now is that it existed to avoid later pagefaults for new memory regions (the vmalloc?). So a workaround that is not needed anymore with this patch. Benjamin