Re: [RFC PATCH v3 08/15] sched/clock, x86: Make __sched_clock_stable forceful

2024-11-21 Thread Peter Zijlstra
On Wed, Nov 20, 2024 at 05:34:32PM +0100, Valentin Schneider wrote: > On 20/11/24 15:59, Peter Zijlstra wrote: > > On Tue, Nov 19, 2024 at 04:34:55PM +0100, Valentin Schneider wrote: > >> Later commits will cause objtool to warn about non __ro_after_init static > >> keys being used in .noinstr sect

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

2024-11-21 Thread Peter Zijlstra
On Wed, Nov 20, 2024 at 06:24:56PM +0100, Valentin Schneider wrote: > > Oh gawd, just having looked at xen_write_cr3() this might not be > > entirely trivial to mark noinstr :/ > > ... I hadn't even seen that. > > AIUI the CR3 RMW is not "enough" if we have PGE enabled, because then > global pag

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

2024-11-21 Thread Dave Hansen
On 11/21/24 03:12, Peter Zijlstra wrote: >> I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag, >> and it correctly uses the non-deferrable flush_tlb_kernel_range(). > > I always forget what we use global pages for, dhansen might know, but > let me try and have a look. > >

Re: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type

2024-11-21 Thread Valentin Schneider
On 21/11/24 12:00, Peter Zijlstra wrote: > On Wed, Nov 20, 2024 at 08:55:15AM -0800, Josh Poimboeuf wrote: >> On Wed, Nov 20, 2024 at 03:57:46PM +0100, Peter Zijlstra wrote: >> > On Wed, Nov 20, 2024 at 03:56:49PM +0100, Peter Zijlstra wrote: >> > >> > > But I think we can make the fall-back safer,

Re: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type

2024-11-21 Thread Josh Poimboeuf
On Thu, Nov 21, 2024 at 12:00:20PM +0100, Peter Zijlstra wrote: > But yeah, this is not quite the same as not marking anything and simply > forcing the IPI when the target address is noinstr. > > And having written all that; perhaps that is the better solution, it > sticks the logic in text_poke a

[RFC][PATCH v4 3/9] ima: Add digest_cache_measure/appraise boot-time built-in policies

2024-11-21 Thread Roberto Sassu
From: Roberto Sassu Specify the 'digest_cache_measure' boot-time policy with 'ima_policy=' in the kernel command line to add the following rule at the beginning of the IMA policy, before other rules: measure func=DIGEST_LIST_CHECK pcr=12 which will measure digest lists into PCR 12 (or the value

Re: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type

2024-11-21 Thread Peter Zijlstra
On Wed, Nov 20, 2024 at 08:55:15AM -0800, Josh Poimboeuf wrote: > On Wed, Nov 20, 2024 at 03:57:46PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 20, 2024 at 03:56:49PM +0100, Peter Zijlstra wrote: > > > > > But I think we can make the fall-back safer, we can simply force the IPI > > > when we pok

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

2024-11-21 Thread Peter Zijlstra
On Thu, Nov 21, 2024 at 07:07:44AM -0800, Dave Hansen wrote: > On 11/21/24 03:12, Peter Zijlstra wrote: > >> I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag, > >> and it correctly uses the non-deferrable flush_tlb_kernel_range(). > > > > I always forget what we use global

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

2024-11-21 Thread Peter Zijlstra
On Tue, Nov 19, 2024 at 04:35:00PM +0100, Valentin Schneider wrote: > @@ -418,9 +419,20 @@ static inline void cpu_tlbstate_update_lam(unsigned long > lam, u64 untag_mask) > #endif > #endif /* !MODULE */ > > +#define __NATIVE_TLB_FLUSH_GLOBAL(suffix, cr4) \ > + native_write_c

[PATCH v3 07/25] fs/dax: Ensure all pages are idle prior to filesystem unmount

2024-11-21 Thread Alistair Popple
File systems call dax_break_mapping() prior to reallocating file system blocks to ensure the page is not undergoing any DMA or other accesses. Generally this is needed when a file is truncated to ensure that if a block is reallocated nothing is writing to it. However filesystems currently don't cal

[PATCH v3 06/25] fs/dax: Always remove DAX page-cache entries when breaking layouts

2024-11-21 Thread Alistair Popple
Prior to any truncation operations file systems call dax_break_mapping() to ensure pages in the range are not under going DMA. Later DAX page-cache entries will be removed by truncate_folio_batch_exceptionals() in the generic page-cache code. However this makes it possible for folios to be removed

[PATCH v3 08/25] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag

2024-11-21 Thread Alistair Popple
PAGE_MAPPING_DAX_SHARED is the same as PAGE_MAPPING_ANON. This isn't currently a problem because FS DAX pages are treated specially. However a future change will make FS DAX pages more like normal pages, so folio_test_anon() must not return true for a FS DAX page. We could explicitly test for a FS

[PATCH v3 11/25] mm: Allow compound zone device pages

2024-11-21 Thread Alistair Popple
Zone device pages are used to represent various type of device memory managed by device drivers. Currently compound zone device pages are not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only user of higher order zone device pages and have their own page reference counting. A futu

[PATCH v3 10/25] pci/p2pdma: Don't initialise page refcount to one

2024-11-21 Thread Alistair Popple
The reference counts for ZONE_DEVICE private pages should be initialised by the driver when the page is actually allocated by the driver allocator, not when they are first created. This is currently the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages but not MEMORY_DEVICE_PCI_P2PDMA

[PATCH v3 12/25] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings

2024-11-21 Thread Alistair Popple
In preparation for using insert_page() for DAX, enhance insert_page_into_pte_locked() to handle establishing writable mappings. Recall that DAX returns VM_FAULT_NOPAGE after installing a PTE which bypasses the typical set_pte_range() in finish_fault. Signed-off-by: Alistair Popple Suggested-by:

[PATCH v3 15/25] huge_memory: Allow mappings of PMD sized pages

2024-11-21 Thread Alistair Popple
Currently DAX folio/page reference counts are managed differently to normal pages. To allow these to be managed the same as normal pages introduce vmf_insert_folio_pmd. This will map the entire PMD-sized folio and take references as it would for a normally mapped page. This is distinct from the cu

[PATCH v3 13/25] mm/memory: Add vmf_insert_page_mkwrite()

2024-11-21 Thread Alistair Popple
Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This creates a special devmap PTE entry for the pfn but does not take a reference on the underlying struct page for the mapping. This is because DAX page refcounts are treated specially, as indicated by the presence of a devmap entry.

[PATCH v3 16/25] memremap: Add is_device_dax_page() and is_fsdax_page() helpers

2024-11-21 Thread Alistair Popple
Add helpers to determine if a page or folio is a device dax or fs dax page or folio. Signed-off-by: Alistair Popple --- include/linux/memremap.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 0256a42..f2a8d13

Re: [PATCH v3 17/25] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages

2024-11-21 Thread John Hubbard
On 11/21/24 5:40 PM, Alistair Popple wrote: Longterm pinning of FS DAX pages should already be disallowed by various pXX_devmap checks. However a future change will cause these checks to be invalid for FS DAX pages so make folio_is_longterm_pinnable() return false for FS DAX pages. Signed-off-by

[PATCH v3 00/25] fs/dax: Fix ZONE_DEVICE page reference counts

2024-11-21 Thread Alistair Popple
Main updates since v2: - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX and have them pass the vmf struct. - Seperate out the device DAX changes. - Restore the page share mapping counting and associated warnings. - Rework truncate to require file-systems to have previ

[PATCH v3 01/25] fuse: Fix dax truncate/punch_hole fault path

2024-11-21 Thread Alistair Popple
FS DAX requires file systems to call into the DAX layout prior to unlinking inodes to ensure there is no ongoing DMA or other remote access to the direct mapped page. The fuse file system implements fuse_dax_break_layouts() to do this which includes a comment indicating that passing dmap_end == 0 l

[PATCH v3 02/25] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()

2024-11-21 Thread Alistair Popple
dax_layout_busy_page_range() is used by file systems to scan the DAX page-cache to unmap mapping pages from user-space and to determine if any pages in the given range are busy, either due to ongoing DMA or other get_user_pages() usage. Currently it checks to see the file mapping is mapped into us

[PATCH v3 03/25] fs/dax: Don't skip locked entries when scanning entries

2024-11-21 Thread Alistair Popple
Several functions internal to FS DAX use the following pattern when trying to obtain an unlocked entry: xas_for_each(&xas, entry, end_idx) { if (dax_is_locked(entry)) entry = get_unlocked_entry(&xas, 0); This is problematic because get_unlocked_entry() will get the next pr

[PATCH v3 04/25] fs/dax: Refactor wait for dax idle page

2024-11-21 Thread Alistair Popple
A FS DAX page is considered idle when its refcount drops to one. This is currently open-coded in all file systems supporting FS DAX. Move the idle detection to a common function to make future changes easier. Signed-off-by: Alistair Popple Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Re

[PATCH v3 05/25] fs/dax: Create a common implementation to break DAX layouts

2024-11-21 Thread Alistair Popple
Prior to freeing a block file systems supporting FS DAX must check that the associated pages are both unmapped from user-space and not undergoing DMA or other access from eg. get_user_pages(). This is achieved by unmapping the file range and scanning the FS DAX page-cache to see if any pages within

[PATCH v3 21/25] fs/dax: Properly refcount fs dax pages

2024-11-21 Thread Alistair Popple
Currently fs dax pages are considered free when the refcount drops to one and their refcounts are not increased when mapped via PTEs or decreased when unmapped. This requires special logic in mm paths to detect that these pages should not be properly refcounted, and to detect when the refcount drop

[PATCH v3 18/25] proc/task_mmu: Ignore ZONE_DEVICE pages

2024-11-21 Thread Alistair Popple
The procfs mmu files such as smaps currently ignore device dax and fs dax pages because these pages are considered special. To maintain existing behaviour once these pages are treated as normal pages and returned from vm_normal_page() add tests to explicitly skip them. Signed-off-by: Alistair Popp

[PATCH v3 17/25] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages

2024-11-21 Thread Alistair Popple
Longterm pinning of FS DAX pages should already be disallowed by various pXX_devmap checks. However a future change will cause these checks to be invalid for FS DAX pages so make folio_is_longterm_pinnable() return false for FS DAX pages. Signed-off-by: Alistair Popple --- include/linux/mm.h | 4

[PATCH v3 20/25] mm/mlock: Skip ZONE_DEVICE PMDs during mlock

2024-11-21 Thread Alistair Popple
At present mlock skips ptes mapping ZONE_DEVICE pages. A future change to remove pmd_devmap will allow pmd_trans_huge_lock() to return ZONE_DEVICE folios so make sure we continue to skip those. Signed-off-by: Alistair Popple --- mm/mlock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm

[PATCH v3 19/25] memcontrol-v1: Ignore ZONE_DEVICE pages

2024-11-21 Thread Alistair Popple
memcontrol currently ignores device dax and fs dax pages because these pages are considered special. To maintain existing behaviour once these pages are treated as normal pages and returned from vm_normal_page() add a test to explicitly skip charging them. Signed-off-by: Alistair Popple --- mm/m

[PATCH v3 23/25] mm: Remove pXX_devmap callers

2024-11-21 Thread Alistair Popple
The devmap PTE special bit was used to detect mappings of FS DAX pages. This tracking was required to ensure the generic mm did not manipulate the page reference counts as FS DAX implemented it's own reference counting scheme. Now that FS DAX pages have their references counted the same way as nor

[PATCH v3 22/25] device/dax: Properly refcount device dax pages when mapping

2024-11-21 Thread Alistair Popple
Device DAX pages are currently not reference counted when mapped, instead relying on the devmap PTE bit to ensure mapping code will not get/put references. This requires special handling in various page table walkers, particularly GUP, to manage references on the underlying pgmap to ensure the page

[PATCH v3 24/25] mm: Remove devmap related functions and page table bits

2024-11-21 Thread Alistair Popple
Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Signed-off-by: Alistair Popple A

[PATCH v3 25/25] Revert "riscv: mm: Add support for ZONE_DEVICE"

2024-11-21 Thread Alistair Popple
DEVMAP PTEs are no longer required to support ZONE_DEVICE so remove them. Signed-off-by: Alistair Popple Suggested-by: Chunyan Zhang --- arch/riscv/Kconfig| 1 - arch/riscv/include/asm/pgtable-64.h | 20 arch/riscv/include/asm/pgtable-bits.h | 1 - a

[PATCH v3 14/25] huge_memory: Allow mappings of PUD sized pages

2024-11-21 Thread Alistair Popple
Currently DAX folio/page reference counts are managed differently to normal pages. To allow these to be managed the same as normal pages introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio and take references as it would for a normally mapped page. This is distinct from the cu

[PATCH v3 09/25] mm/gup.c: Remove redundant check for PCI P2PDMA page

2024-11-21 Thread Alistair Popple
PCI P2PDMA pages are not mapped with pXX_devmap PTEs therefore the check in __gup_device_huge() is redundant. Remove it Signed-off-by: Alistair Popple Reviewed-by: Jason Gunthorpe Reviewed-by: Dan Wiliams Acked-by: David Hildenbrand --- mm/gup.c | 5 - 1 file changed, 5 deletions(-) diff

Re: [PATCH v3 05/25] fs/dax: Create a common implementation to break DAX layouts

2024-11-21 Thread John Hubbard
On 11/21/24 5:40 PM, Alistair Popple wrote: Prior to freeing a block file systems supporting FS DAX must check that the associated pages are both unmapped from user-space and not undergoing DMA or other access from eg. get_user_pages(). This is achieved by unmapping the file range and scanning th

Re: [PATCH v3 05/25] fs/dax: Create a common implementation to break DAX layouts

2024-11-21 Thread Alistair Popple
John Hubbard writes: > On 11/21/24 5:40 PM, Alistair Popple wrote: >> Prior to freeing a block file systems supporting FS DAX must check >> that the associated pages are both unmapped from user-space and not >> undergoing DMA or other access from eg. get_user_pages(). This is >> achieved by unm

Re: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type

2024-11-21 Thread Josh Poimboeuf
On Thu, Nov 21, 2024 at 04:51:09PM +0100, Valentin Schneider wrote: > Okay so forcing the IPI for .noinstr patching lets us get rid of all the > force_ipi faff; however I would still want the special marking to tell > objtool "yep we're okay with this one", and still get warnings when a new > .noin

Fixing Spelling Mistake in rculist_nulls.rst

2024-11-21 Thread Vyshnav Ajith
NUll is a special marker and not maker I believe. Fixing typo with this patch. Signed-off-by: Vyshnav Ajith --- Documentation/RCU/rculist_nulls.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst inde