From: "Paul E. McKenney"
The rcu_barrier_sched(), synchronize_sched(), and synchronize_rcu_bh()
RCU API members have been gone for many years. This commit therefore
removes non-historical instances of them.
Reported-by: Joe Perches
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
--
From: "Paul E. McKenney"
This commit wordsmiths the RCU_LAZY and RCU_LAZY_DEFAULT_OFF Kconfig
options' help text.
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
---
kernel/rcu/Kconfig | 20 +---
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/kernel/r
From: "Paul E. McKenney"
This commit causes the call_srcu() kernel-doc header to reference that
of call_rcu() for detailed memory-ordering guarantees.
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
---
kernel/rcu/srcutree.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions
From: "Paul E. McKenney"
Most of the this_cpu_*() operations may be used in preemptible code,
but not this_cpu_ptr(), and for good reasons. Therefore, better explain
the reasons and call out raw_cpu_ptr() as an alternative in certain very
special cases.
Signed-off-by: Paul E. McKenney
Cc: Jona
From: "Paul E. McKenney"
This commit documents the fact that a given RCU callback function can
repost itself.
Reported-by: Jens Axboe
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
---
kernel/rcu/tree.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/ker
From: "Paul E. McKenney"
Currently, stallwarn.rst does not mention the fact that timer bugs can
result in false-positive RCU CPU stall warnings. This commit therefore
adds this to the list.
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
---
Documentation/RCU/stallwarn.rst | 7
Hi,
Please find the upcoming changes in RCU documentation for v6.15. The
changes can also be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git
docs.2025.02.04a
Regards,
Boqun
Paul E. McKenney (7):
doc: Add broken-timing possibility to stallwarn.rst
docs: Imp
From: "Paul E. McKenney"
This commit adds a description of the energy-efficiency delays that
call_rcu() can impose, along with a pointer to call_rcu_hurry() for
latency-sensitive kernel code.
Signed-off-by: Paul E. McKenney
Signed-off-by: Boqun Feng
---
kernel/rcu/tree.c | 7 +++
1 file c
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> When attaching a device to a vIOMMU-based nested domain, vdev_id must
> be
> present. Add a piece of code hard-requesting it, preparing for a vEVENTQ
> support in the following patch. Then, update the TEST_F.
>
> A HWPT-based ne
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Similar to iommu_report_device_fault, this allows IOMMU drivers to report
> vIOMMU events from threaded IRQ handlers to user space hypervisors.
>
> Reviewed-by: Lu Baolu
> Signed-off-by: Nicolin Chen
Reviewed-by: Kevin Tian
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> There is a DoS concern on the shared hardware event queue among devices
> passed through to VMs, that too many translation failures that belong to
> VMs could overflow the shared hardware event queue if those VMs or their
> VMMs
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Aside from the IOPF framework, iommufd provides an additional pathway to
> report hardware events, via the vEVENTQ of vIOMMU infrastructure.
>
> Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1
> events
> in
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Trigger vEVENTs by feeding an idev ID and validating the returned output
> virt_ids whether they equal to the value that was set to the vDEVICE.
>
> Signed-off-by: Nicolin Chen
Reviewed-by: Kevin Tian
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> The handler will get vDEVICE object from the given mdev and convert it to
> its per-vIOMMU virtual ID to mimic a real IOMMU driver.
>
> Signed-off-by: Nicolin Chen
Reviewed-by: Kevin Tian
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
> +
> +/*
> + * An iommufd_veventq object represents an interface to deliver vIOMMU
> events to
> + * the user space. It is created/destroyed by the user space and associated
> with
> + * vIOMMU object(s) during the allocations.
s/ob
> From: Nicolin Chen
> Sent: Saturday, January 25, 2025 8:31 AM
>
> There is no need to keep them in the header. The vEVENTQ version of these
> two functions will turn out to be a different implementation and will not
> share with this fault version. Thus, move them out of the header.
>
> Signed
Prior to freeing a block file systems supporting FS DAX must check
that the associated pages are both unmapped from user-space and not
undergoing DMA or other access from eg. get_user_pages(). This is
achieved by unmapping the file range and scanning the FS DAX
page-cache to see if any pages within
dax_layout_busy_page_range() is used by file systems to scan the DAX
page-cache to unmap mapping pages from user-space and to determine if
any pages in the given range are busy, either due to ongoing DMA or
other get_user_pages() usage.
Currently it checks to see the file mapping is mapped into us
Longterm pinning of FS DAX pages should already be disallowed by
various pXX_devmap checks. However a future change will cause these
checks to be invalid for FS DAX pages so make
folio_is_longterm_pinnable() return false for FS DAX pages.
Signed-off-by: Alistair Popple
Reviewed-by: John Hubbard
Currently DAX folio/page reference counts are managed differently to normal
pages. To allow these to be managed the same as normal pages introduce
vmf_insert_folio_pmd. This will map the entire PMD-sized folio and take
references as it would for a normally mapped page.
This is distinct from the cu
From: Dan Williams
The dcssblk driver has long needed special case supoprt to enable
limited dax operation, so called CONFIG_FS_DAX_LIMITED. This mode
works around the incomplete support for ZONE_DEVICE on s390 by forgoing
the ability of dax-mapped pages to support GUP.
Now, pending cleanups to
Device DAX pages are currently not reference counted when mapped,
instead relying on the devmap PTE bit to ensure mapping code will not
get/put references. This requires special handling in various page
table walkers, particularly GUP, to manage references on the
underlying pgmap to ensure the page
Currently fs dax pages are considered free when the refcount drops to
one and their refcounts are not increased when mapped via PTEs or
decreased when unmapped. This requires special logic in mm paths to
detect that these pages should not be properly refcounted, and to
detect when the refcount drop
In preparation for using insert_page() for DAX, enhance
insert_page_into_pte_locked() to handle establishing writable
mappings. Recall that DAX returns VM_FAULT_NOPAGE after installing a
PTE which bypasses the typical set_pte_range() in finish_fault.
Signed-off-by: Alistair Popple
Suggested-by:
The rmap doesn't currently support adding a PUD mapping of a
folio. This patch adds support for entire PUD mappings of folios,
primarily to allow for more standard refcounting of device DAX
folios. Currently DAX is the only user of this and it doesn't require
support for partially mapped PUD-sized
Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This
creates a special devmap PTE entry for the pfn but does not take a
reference on the underlying struct page for the mapping. This is
because DAX page refcounts are treated specially, as indicated by the
presence of a devmap entry.
Currently DAX folio/page reference counts are managed differently to
normal pages. To allow these to be managed the same as normal pages
introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio
and take references as it would for a normally mapped page.
This is distinct from the cu
File systems call dax_break_mapping() prior to reallocating file system
blocks to ensure the page is not undergoing any DMA or other
accesses. Generally this is needed when a file is truncated to ensure that
if a block is reallocated nothing is writing to it. However filesystems
currently don't cal
Currently ZONE_DEVICE page reference counts are initialised by core
memory management code in __init_zone_device_page() as part of the
memremap() call which driver modules make to obtain ZONE_DEVICE
pages. This initialises page refcounts to 1 before returning them to
the driver.
This was presumabl
Prior to any truncation operations file systems call
dax_break_mapping() to ensure pages in the range are not under going
DMA. Later DAX page-cache entries will be removed by
truncate_folio_batch_exceptionals() in the generic page-cache code.
However this makes it possible for folios to be removed
PCI P2PDMA pages are not mapped with pXX_devmap PTEs therefore the
check in __gup_device_huge() is redundant. Remove it
Signed-off-by: Alistair Popple
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Wiliams
Acked-by: David Hildenbrand
---
mm/gup.c | 5 -
1 file changed, 5 deletions(-)
diff
Zone device pages are used to represent various type of device memory
managed by device drivers. Currently compound zone device pages are
not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only
user of higher order zone device pages and have their own page
reference counting.
A futu
The page ->mapping pointer can have magic values like
PAGE_MAPPING_DAX_SHARED and PAGE_MAPPING_ANON for page owner specific
usage. Currently PAGE_MAPPING_DAX_SHARED and PAGE_MAPPING_ANON alias to the
same value. This isn't a problem because FS DAX pages are never seen by the
anonymous mapping code
Several functions internal to FS DAX use the following pattern when
trying to obtain an unlocked entry:
xas_for_each(&xas, entry, end_idx) {
if (dax_is_locked(entry))
entry = get_unlocked_entry(&xas, 0);
This is problematic because get_unlocked_entry() will get the next
pr
A FS DAX page is considered idle when its refcount drops to one. This
is currently open-coded in all file systems supporting FS DAX. Move
the idle detection to a common function to make future changes easier.
Signed-off-by: Alistair Popple
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Re
FS DAX requires file systems to call into the DAX layout prior to unlinking
inodes to ensure there is no ongoing DMA or other remote access to the
direct mapped page. The fuse file system implements
fuse_dax_break_layouts() to do this which includes a comment indicating
that passing dmap_end == 0 l
Main updates since v6:
- Clean ups and fixes based on feedback from David and Dan.
- Rebased from next-20241216 to v6.14-rc1. No conflicts.
- Dropped the PTE bit removals and clean-ups - will post this as a
separate series to be merged after this one as Dan wanted it split
up more and t
This patch adjust the example code with following two purpose:
* reduce the confusion on not releasing e->lock
* emphasize e is valid and not stale with e->lock held
Signed-off-by: Wei Yang
CC: Boqun Feng
CC: Alan Huang
---
v2:
* add the missing parameter *key
* make function return s
On Mon, Feb 17, 2025 at 02:30:59PM -0800, Boqun Feng wrote:
>On Mon, Feb 17, 2025 at 09:18:42AM +, Wei Yang wrote:
>> On Mon, Feb 17, 2025 at 04:02:53PM +0800, Alan Huang wrote:
>> >On Feb 17, 2025, at 15:41, Wei Yang wrote:
>> >>
>> >> On Mon, Feb 17, 2025 at 10:22:53AM +0800, Alan Huang wro
On Thu, Feb 13, 2025 at 5:17 AM Pavel Begunkov wrote:
>
> On 2/12/25 19:18, Mina Almasry wrote:
> > On Wed, Feb 12, 2025 at 7:52 AM Pavel Begunkov
> > wrote:
> >>
> >> On 2/10/25 21:09, Mina Almasry wrote:
> >>> On Wed, Feb 5, 2025 at 4:20 AM Pavel Begunkov
> >>> wrote:
>
> On 2/3/25
On Mon, Feb 17, 2025 at 09:18:42AM +, Wei Yang wrote:
> On Mon, Feb 17, 2025 at 04:02:53PM +0800, Alan Huang wrote:
> >On Feb 17, 2025, at 15:41, Wei Yang wrote:
> >>
> >> On Mon, Feb 17, 2025 at 10:22:53AM +0800, Alan Huang wrote:
> >>> On Feb 17, 2025, at 10:12, Boqun Feng wrote:
>
>
On 17.02.25 05:29, Alistair Popple wrote:
On Mon, Feb 10, 2025 at 07:45:09PM +0100, David Hildenbrand wrote:
On 04.02.25 23:48, Alistair Popple wrote:
Currently DAX folio/page reference counts are managed differently to normal
pages. To allow these to be managed the same as normal pages introdu
On Mon, Feb 17, 2025 at 8:14 AM Usama Arif wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > Now that we have mTHP support in khugepaged, lets add it to the
> > transhuge admin guide to provide proper guidance.
> >
>
> I think you should move this patch to the mTHP khugepaged series, and j
On Mon, Feb 17, 2025 at 8:04 AM Usama Arif wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > The new transparent_hugepage=defer option allows for a more conservative
> > approach to THPs. Document its usage in the transhuge admin-guide.
> >
> > Signed-off-by: Nico Pache
> > ---
> > Docum
On Mon, Feb 17, 2025 at 7:59 AM Usama Arif wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > setting /transparent_hugepages/enabled=always allows applications
> > to benefit from THPs without having to madvise. However, the pf handler
> > takes very few considerations to decide weather or
On Mon, Feb 17, 2025 at 7:54 AM Usama Arif wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > This series is a follow-up to [1], which adds mTHP support to khugepaged.
> > mTHP khugepaged support was necessary for the global="defer" and
> > mTHP="inherit" case (and others) to make sense.
>
If included in patch descriptions, this will function much like the
--ignore flag.
It requires some rather obscure Git features to take advantage of
this, so provide some examples of how to do that.
Signed-off-by: Brendan Jackman
---
Documentation/dev-tools/checkpatch.rst | 46 +
Checkpatch sometimes has false positives. This makes it less useful for
automatic usage: tools like b4 [0] can run checkpatch on all of your
patches and give you a quick overview. When iterating on a branch, it's
tiresome to manually re-check that any errors are known false positives.
This patch a
Checkpatch sometimes has false positives. This makes it less useful for
automatic usage: tools like b4 [0] can run checkpatch on all of your
patches and give you a quick overview. When iterating on a branch, it's
tiresome to manually re-check that any errors are known false positives.
This patch a
On 11/02/2025 00:40, Nico Pache wrote:
> This series is a follow-up to [1], which adds mTHP support to khugepaged.
> mTHP khugepaged support was necessary for the global="defer" and
> mTHP="inherit" case (and others) to make sense.
>
Hi Nico,
Thanks for the patches!
Why is mTHP khugepaged a
On 11/02/2025 00:40, Nico Pache wrote:
> setting /transparent_hugepages/enabled=always allows applications
> to benefit from THPs without having to madvise. However, the pf handler
> takes very few considerations to decide weather or not to actually use a
> THP. This can lead to a lot of wasted
On 11/02/2025 00:40, Nico Pache wrote:
> The new transparent_hugepage=defer option allows for a more conservative
> approach to THPs. Document its usage in the transhuge admin-guide.
>
> Signed-off-by: Nico Pache
> ---
> Documentation/admin-guide/mm/transhuge.rst | 22 +-
>
On 11/02/2025 00:40, Nico Pache wrote:
> Now that we have mTHP support in khugepaged, lets add it to the
> transhuge admin guide to provide proper guidance.
>
I think you should move this patch to the mTHP khugepaged series, and just send
THP=defer separately from mTHP khguepaged.
> Signed-of
Show that the selftests are executed from a fairly "normal"
userspace context.
Signed-off-by: Thomas Weißschuh
---
lib/kunit/kunit-uapi-example.c | 40 +++-
1 file changed, 39 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/kunit-uapi-example.c b/lib/kuni
Currently testing of userspace and in-kernel API use two different
frameworks. kselftests for the userspace ones and Kunit for the
in-kernel ones. Besides their different scopes, both have different
strengths and limitations:
Kunit:
* Tests are normal kernel code.
* They use the regular kernel too
Reuse the general CONFIG_WERROR to also apply to userprogs.
Signed-off-by: Thomas Weißschuh
---
scripts/Makefile.userprogs | 4
1 file changed, 4 insertions(+)
diff --git a/scripts/Makefile.userprogs b/scripts/Makefile.userprogs
index
f3a7e1ef3753b54303718fae97f4b3c9d4eac07c..debbf083bcfa
UAPI selftests may expect a "normal" userspace environment.
For example the normal kernel API pseudo-filesystems should be mounted.
This could be done from kernel code but it is non-idiomatic.
Add a preinit userspace executable which performs these setup steps
before running the final test executa
Userprogs are built with the regular kernel compiler $CC.
A kernel compiler does not necessarily contain a libc which is required
for a normal userspace application.
However the kernel tree does contain a minimal libc implementation
"nolibc" which can be used to build userspace applications.
Intro
Extend the example to show how to run a userspace executable.
Signed-off-by: Thomas Weißschuh
---
lib/kunit/Makefile | 8 +++-
lib/kunit/kunit-example-test.c | 17 +
lib/kunit/kunit-uapi-example.c | 20
3 files changed, 44 insertions(+), 1 de
If a subtest itself reports success, but the outer testcase fails,
the whole testcase should be reported as a failure.
However the status is recalculated based on the test counts,
overwriting the outer test result.
Synthesize a failed test in this case to make sure the failure is not
swallowed.
Si
Enable running UAPI tests as part of kunit.
The selftests are embedded into the kernel image and their output is
forwarded to kunit for unified reporting.
The implementation reuses parts of usermode drivers and usermode
helpers. However these frameworks are not used directly as they make it
imposs
userprogs sometimes need access to UAPI headers.
This is currently not possible for Usermode Linux, as UM is only
a pseudo architecture built on top of a regular architecture and does
not have its own UAPI.
Instead use the UAPI headers from the underlying regular architecture.
Signed-off-by: Thoma
Currently there is no test validating the result reporting from nested
tests. Add one, it will also be used to validate upcoming changes to the
nested test parsing.
Signed-off-by: Thomas Weißschuh
---
tools/testing/kunit/kunit_tool_test.py | 9 +
.../kunit/test_
Various subsystems embed non-code build artifacts into the kernel,
for example the initramfs, /proc/config.gz, vDSO image, etc.
Currently each user has their own implementation for that.
Add a common "blob" framework to provide this functionality.
It provides standard kbuild and C APIs to embed an
Skipped tests reported by kselftest.h use a different format than KTAP,
there is no explicit test name. Normally the test name is part of the
free-form string after the SKIP keyword:
ok 3 # SKIP test: some reason
Extend the parser to handle those correctly. Use the free-form string as
tes
Nolibc does not support all architectures.
Add a kconfig option, so users can know where it is available.
The new option is maintained inside tools/include/nolibc/ as only that
directory is responsible for nolibc's availability.
Signed-off-by: Thomas Weißschuh
---
init/Kconfig
On Fri, 14 Feb 2025 15:13:52 +,
Mark Brown wrote:
>
> On Fri, Feb 14, 2025 at 09:24:03AM +, Marc Zyngier wrote:
> > Mark Brown wrote:
>
> > Just to be clear: I do not intend to review a series that doesn't
> > cover the full gamut of KVM from day 1. Protected mode is an absolute
> > req
On Mon, Feb 17, 2025 at 04:02:53PM +0800, Alan Huang wrote:
>On Feb 17, 2025, at 15:41, Wei Yang wrote:
>>
>> On Mon, Feb 17, 2025 at 10:22:53AM +0800, Alan Huang wrote:
>>> On Feb 17, 2025, at 10:12, Boqun Feng wrote:
Hi Wei,
The change loosk good to me, thanks!
On Feb 17, 2025, at 15:41, Wei Yang wrote:
>
> On Mon, Feb 17, 2025 at 10:22:53AM +0800, Alan Huang wrote:
>> On Feb 17, 2025, at 10:12, Boqun Feng wrote:
>>>
>>> Hi Wei,
>>>
>>> The change loosk good to me, thanks!
>>>
>>> I queued the patch for futher reviews and tests with some changes in
69 matches
Mail list logo