On 22/04/2024 08:22, Benjamin Berg wrote:
On Mon, 2024-04-22 at 10:51 +0800, Tiwei Bie wrote:
On 4/18/24 5:23 PM, benja...@sipsolutions.net wrote:
diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
index 37eb6e89e79a..bf8da736609c 100644
--- a/arch/um/include/asm/mmu.h
+++ b/arch/um/include/asm/mmu.h
@@ -10,6 +10,10 @@
typedef struct mm_context {
struct mm_id id;
+
+ /* Address range in need of a TLB sync */
+ long int sync_tlb_range_from;
+ long int sync_tlb_range_to;
Why not "unsigned long"?
Oops, yes, it should be "unsigned long".
} mm_context_t;
extern void __switch_mm(struct mm_id * mm_idp);
diff --git a/arch/um/include/asm/pgtable.h
b/arch/um/include/asm/pgtable.h
index e1ece21dbe3f..5bb397b65efb 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -244,6 +244,38 @@ static inline void set_pte(pte_t *pteptr,
pte_t pteval)
#define PFN_PTE_SHIFT PAGE_SHIFT
+static inline void um_tlb_mark_sync(struct mm_struct *mm, unsigned
long start,
+ unsigned long end)
+{
+ if (!mm->context.sync_tlb_range_to) {
+ mm->context.sync_tlb_range_from = start;
+ mm->context.sync_tlb_range_to = end;
+ } else {
+ if (start < mm->context.sync_tlb_range_from)
+ mm->context.sync_tlb_range_from = start;
+ if (end > mm->context.sync_tlb_range_to)
+ mm->context.sync_tlb_range_to = end;
+ }
+}
IIUC, in some cases, the range [sync_tlb_range_from, sync_tlb_range_to)
might become very large when merging non-adjacent ranges? Could that
be an issue?
I figured it is not a big problem. It will result in scanning the
entire page table once to check whether the NEW_PAGE bit is set on any
PTE. I am assuming that this will happen almost never and scanning the
page table (but not doing syscalls) is reasonably cheap at the end.
diff --git a/arch/um/include/asm/tlbflush.h b/arch/um/include/asm/tlbflush.h
index d7cf82023b74..62816f6f1c91 100644
--- a/arch/um/include/asm/tlbflush.h
+++ b/arch/um/include/asm/tlbflush.h
@@ -9,24 +9,50 @@
#include <linux/mm.h>
/*
- * TLB flushing:
+ * In UML, we need to sync the TLB over by using mmap/munmap/mprotect syscalls
+ * from the process handling the MM (which can be the kernel itself).
+ *
+ * To track updates, we can hook into set_ptes and flush_tlb_*. With set_ptes
+ * we catch all PTE transitions where memory that was unusable becomes usable.
+ * While with flush_tlb_* we can track any memory that becomes unusable and
+ * even if a higher layer of the page table was modified.
+ *
+ * So, we simply track updates using both methods and mark the memory area to
+ * be synced later on. The only special case is that flush_tlb_kern_* needs to
+ * be executed immediately as there is no good synchronization point in that
+ * case. In contrast, in the set_ptes case we can wait for the next kernel
+ * segfault before we do the synchornization.
*
- * - flush_tlb() flushes the current mm struct TLBs
* - flush_tlb_all() flushes all processes TLBs
* - flush_tlb_mm(mm) flushes the specified mm context TLB's
* - flush_tlb_page(vma, vmaddr) flushes one page
- * - flush_tlb_kernel_vm() flushes the kernel vm area
* - flush_tlb_range(vma, start, end) flushes a range of pages
+ * - flush_tlb_kernel_range(start, end) flushes a range of kernel pages
*/
+extern int um_tlb_sync(struct mm_struct *mm);
+
extern void flush_tlb_all(void);
extern void flush_tlb_mm(struct mm_struct *mm);
-extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end);
-extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long address);
-extern void flush_tlb_kernel_vm(void);
-extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
-extern void __flush_tlb_one(unsigned long addr);
+
+static void flush_tlb_page(struct vm_area_struct *vma, unsigned long address)
+{
+ um_tlb_mark_sync(vma->vm_mm, address, address + PAGE_SIZE);
+}
+
+static void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
+{
+ um_tlb_mark_sync(vma->vm_mm, start, end);
+}
+
+static void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+ um_tlb_mark_sync(&init_mm, start, end);
+
+ /* Kernel needs to be synced immediately */
+ um_tlb_sync(&init_mm);
+}
Nit: this is a header file, these functions should be defined as inline
functions.
Yup, thanks!
diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
index c137ff6f84dd..232aa7601d5d 100644
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
[...]
-void flush_tlb_kernel_range(unsigned long start, unsigned long
end)
-{
- flush_tlb_kernel_range_common(start, end);
-}
-
-void flush_tlb_kernel_vm(void)
-{
- flush_tlb_kernel_range_common(start_vm, end_vm);
-}
The build breaks with this change, as there is still a call to
flush_tlb_kernel_vm() in ubd.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/ubd_kern.c?id=fb5d1d389c9e78d68f1f71f926d6251017579f5b#n774
Oh, thanks for the pointer!
I do not see a good reason for that call to even exist. My best theory
right now is that it existed to avoid later pagefaults for new memory
regions (the vmalloc?). So a workaround that is not needed anymore with
this patch.
It is there since prehistoric times.
No idea why and what it's doing. IMHO it is not needed.
Benjamin
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/