On Thu, Jan 09, 2025 at 08:56:37AM +0100, Greg Kroah-Hartman wrote:
> The "pointless" penalty will go away once we convert all instances, and
> really, it's just one pointer check, sysfs files should NOT be a hot
> path for anything real, and one more pointer check should be cached and
> not measur
On Thu, Jan 09, 2025 at 12:06:09AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 09, 2025 at 08:56:37AM +0100, Greg Kroah-Hartman wrote:
> > The "pointless" penalty will go away once we convert all instances, and
> > really, it's just one pointer check, sysfs files should NOT be a hot
> > path for
On Thu, Jan 09, 2025 at 09:12:03AM +0100, Greg Kroah-Hartman wrote:
> > Hey, when I duplicated the method to convert sysfs over to a proper
> > seq_file based approach that avoids buffer overflows you basically
> > came up with the same line that Alexei had here.
>
> I did? Sorry about that, I do
Alistair Popple wrote:
> Main updates since v5:
>
> - Reworked patch 1 based on Dan's feedback.
>
> - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
>is no defined.
>
> - Minor comment formatting and documentation fixes.
>
> - Remove PTE_DEVMAP definitions from Loonga
Main updates since v5:
- Reworked patch 1 based on Dan's feedback.
- Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
is no defined.
- Minor comment formatting and documentation fixes.
- Remove PTE_DEVMAP definitions from Loongarch which were added since
this series w
dax_layout_busy_page_range() is used by file systems to scan the DAX
page-cache to unmap mapping pages from user-space and to determine if
any pages in the given range are busy, either due to ongoing DMA or
other get_user_pages() usage.
Currently it checks to see the file mapping is mapped into us
FS DAX requires file systems to call into the DAX layout prior to unlinking
inodes to ensure there is no ongoing DMA or other remote access to the
direct mapped page. The fuse file system implements
fuse_dax_break_layouts() to do this which includes a comment indicating
that passing dmap_end == 0 l
Prior to any truncation operations file systems call
dax_break_mapping() to ensure pages in the range are not under going
DMA. Later DAX page-cache entries will be removed by
truncate_folio_batch_exceptionals() in the generic page-cache code.
However this makes it possible for folios to be removed
A FS DAX page is considered idle when its refcount drops to one. This
is currently open-coded in all file systems supporting FS DAX. Move
the idle detection to a common function to make future changes easier.
Signed-off-by: Alistair Popple
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Re
File systems call dax_break_mapping() prior to reallocating file
system blocks to ensure the page is not undergoing any DMA or other
accesses. Generally this is needed when a file is truncated to ensure
that if a block is reallocated nothing is writing to it. However
filesystems currently don't cal
Prior to freeing a block file systems supporting FS DAX must check
that the associated pages are both unmapped from user-space and not
undergoing DMA or other access from eg. get_user_pages(). This is
achieved by unmapping the file range and scanning the FS DAX
page-cache to see if any pages within
PCI P2PDMA pages are not mapped with pXX_devmap PTEs therefore the
check in __gup_device_huge() is redundant. Remove it
Signed-off-by: Alistair Popple
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Wiliams
Acked-by: David Hildenbrand
---
mm/gup.c | 5 -
1 file changed, 5 deletions(-)
diff
Currently ZONE_DEVICE page reference counts are initialised by core
memory management code in __init_zone_device_page() as part of the
memremap() call which driver modules make to obtain ZONE_DEVICE
pages. This initialises page refcounts to 1 before returning them to
the driver.
This was presumabl
PAGE_MAPPING_DAX_SHARED is the same as PAGE_MAPPING_ANON. This isn't
currently a problem because FS DAX pages are treated
specially. However a future change will make FS DAX pages more like
normal pages, so folio_test_anon() must not return true for a FS DAX
page.
We could explicitly test for a FS
Several functions internal to FS DAX use the following pattern when
trying to obtain an unlocked entry:
xas_for_each(&xas, entry, end_idx) {
if (dax_is_locked(entry))
entry = get_unlocked_entry(&xas, 0);
This is problematic because get_unlocked_entry() will get the next
pr
Zone device pages are used to represent various type of device memory
managed by device drivers. Currently compound zone device pages are
not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only
user of higher order zone device pages and have their own page
reference counting.
A futu
In preparation for using insert_page() for DAX, enhance
insert_page_into_pte_locked() to handle establishing writable
mappings. Recall that DAX returns VM_FAULT_NOPAGE after installing a
PTE which bypasses the typical set_pte_range() in finish_fault.
Signed-off-by: Alistair Popple
Suggested-by:
Currently DAX folio/page reference counts are managed differently to
normal pages. To allow these to be managed the same as normal pages
introduce vmf_insert_folio_pmd. This will map the entire PMD-sized folio
and take references as it would for a normally mapped page.
This is distinct from the cu
The rmap doesn't currently support adding a PUD mapping of a
folio. This patch adds support for entire PUD mappings of folios,
primarily to allow for more standard refcounting of device DAX
folios. Currently DAX is the only user of this and it doesn't require
support for partially mapped PUD-sized
Currently DAX folio/page reference counts are managed differently to
normal pages. To allow these to be managed the same as normal pages
introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio
and take references as it would for a normally mapped page.
This is distinct from the cu
Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This
creates a special devmap PTE entry for the pfn but does not take a
reference on the underlying struct page for the mapping. This is
because DAX page refcounts are treated specially, as indicated by the
presence of a devmap entry.
"Dmitry V. Levin" writes:
> Similar to syscall_set_arguments() that complements
> syscall_get_arguments(), introduce syscall_set_nr()
> that complements syscall_get_nr().
>
> syscall_set_nr() is going to be needed along with
> syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
> architectures to
> On 7 Jan 2025, at 2:55 AM, Namhyung Kim wrote:
>
> On Fri, Dec 27, 2024 at 04:18:32PM +0530, Athira Rajeev wrote:
>>
>>
>>> On 23 Dec 2024, at 7:28 PM, Athira Rajeev
>>> wrote:
>>>
>>> When kernel is built without debuginfo, running perf record with
>>> --off-cpu results in segfault as
Alistair Popple wrote:
> On Wed, Jan 08, 2025 at 04:14:20PM -0800, Dan Williams wrote:
> > Alistair Popple wrote:
> > > Prior to freeing a block file systems supporting FS DAX must check
> > > that the associated pages are both unmapped from user-space and not
> > > undergoing DMA or other access f
This would seem like a very good idea. However, it is perhaps important
to realize that it doesn't fully eliminate the problems with 64-bit
arguments on 32-bit ABIs being handled differently (never mind
inconsistencies in system call ABIs etc.) There isn't all that much that
can be done about t
On Thu, 09 Jan 2025 12:23:00 -0600, Rob Herring (Arm) wrote:
> The use of of_property_read_bool() for non-boolean properties is
> deprecated in favor of of_property_present() when testing for property
> presence.
>
>
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.gi
At present mlock skips ptes mapping ZONE_DEVICE pages. A future change
to remove pmd_devmap will allow pmd_trans_huge_lock() to return
ZONE_DEVICE folios so make sure we continue to skip those.
Signed-off-by: Alistair Popple
Acked-by: David Hildenbrand
---
mm/mlock.c | 2 ++
1 file changed, 2 i
Currently fs dax pages are considered free when the refcount drops to
one and their refcounts are not increased when mapped via PTEs or
decreased when unmapped. This requires special logic in mm paths to
detect that these pages should not be properly refcounted, and to
detect when the refcount drop
Device DAX pages are currently not reference counted when mapped,
instead relying on the devmap PTE bit to ensure mapping code will not
get/put references. This requires special handling in various page
table walkers, particularly GUP, to manage references on the
underlying pgmap to ensure the page
DEVMAP PTEs are no longer required to support ZONE_DEVICE so remove
them.
Signed-off-by: Alistair Popple
---
arch/loongarch/Kconfig| 1 -
arch/loongarch/include/asm/pgtable-bits.h | 6 ++
arch/loongarch/include/asm/pgtable.h | 19 ---
3 files change
Add helpers to determine if a page or folio is a devdax or fsdax page
or folio.
Signed-off-by: Alistair Popple
Acked-by: David Hildenbrand
---
Changes for v5:
- Renamed is_device_dax_page() to is_devdax_page() for consistency.
---
include/linux/memremap.h | 22 ++
1 file
Longterm pinning of FS DAX pages should already be disallowed by
various pXX_devmap checks. However a future change will cause these
checks to be invalid for FS DAX pages so make
folio_is_longterm_pinnable() return false for FS DAX pages.
Signed-off-by: Alistair Popple
Reviewed-by: John Hubbard
The procfs mmu files such as smaps and pagemap currently ignore devdax and
fsdax pages because these pages are considered special. A future change
will start treating these as normal pages, meaning they can be exposed via
smaps and pagemap.
The only difference is that devdax and fsdax pages can ne
On Wed, Jan 08, 2025 at 05:34:30PM -0800, Alison Schofield wrote:
> On Tue, Jan 07, 2025 at 02:42:16PM +1100, Alistair Popple wrote:
> > Main updates since v4:
> >
> > - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
> >means smaps/pagemap may contain DAX pages.
> >
> >
The devmap PTE special bit was used to detect mappings of FS DAX
pages. This tracking was required to ensure the generic mm did not
manipulate the page reference counts as FS DAX implemented it's own
reference counting scheme.
Now that FS DAX pages have their references counted the same way as
nor
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD
page table bits. So drop all references to these, freeing up a
software defined page table bit on architectures supporting it.
Signed-off-by: Alistair Popple
A
DEVMAP PTEs are no longer required to support ZONE_DEVICE so remove
them.
Signed-off-by: Alistair Popple
Suggested-by: Chunyan Zhang
Reviewed-by: Björn Töpel
---
arch/riscv/Kconfig| 1 -
arch/riscv/include/asm/pgtable-64.h | 20
arch/riscv/include/as
Currently, on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only
available for KVM Guests running on PowerNV and not for the KVM guests
running on pSeries hypervisors. This prevents a pSeries hypervisor from
leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hcalls t
Add the const qualifier to all the ctl_tables in the tree except the
ones in ./net dir. The "net" sysctl code is special as it modifies the
arrays before passing it on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the
On Thu, Jan 9, 2025 at 5:16 AM Joel Granados wrote:
>
[...]
> drivers/base/firmware_loader/fallback_table.c | 2 +-
> drivers/cdrom/cdrom.c | 2 +-
> drivers/char/hpet.c | 2 +-
> drivers/char/ipmi/ipmi_poweroff.c | 2 +-
> drivers/cha
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm)
---
sound/soc/fsl/fsl-asoc-card.c| 2 +-
sound/soc/fsl/imx-audmux.c | 2 +-
Greetings!!!
Observing Kernel Warnings on kernel 6.13.0-rc6-next-20250109, while
running fstests ext4/001.
Traces:
[ 433.607975] [ cut here ]
[ 433.607984] WARNING: CPU: 2 PID: 32051 at fs/debugfs/file.c:90
__debugfs_file_get+0xcc/0x274
[ 433.608002] Modules
On Thu, 09 Jan 2025 14:16:39 +0100
Joel Granados wrote:
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 2e113f8b13a2..489cbab3d64c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -8786,7 +8786,7 @@ ftrace_enable_sysctl(const struct ctl_table *table, in
On Thu, 09 Jan 2025, Joel Granados wrote:
> diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> index 2406cda75b7b..5384d1bb4923 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -4802,7 +4802,7 @@ int i915_perf_remove_c
Joel,
> Add the const qualifier to all the ctl_tables in the tree except the
> ones in ./net dir. The "net" sysctl code is special as it modifies the
> arrays before passing it on to the registration function.
Reviewed-by: Martin K. Petersen # SCSI
--
Martin K. Petersen Oracle Linux Eng
On Mon 18-11-24 10:25:17, Peter Zijlstra wrote:
> On Mon, Nov 18, 2024 at 10:04:18AM +0100, Michal Hocko wrote:
> > I do not see this patch staged in any tree (e.g. linux-next). Is this on
> > its way to be merged?
>
> I only now found it -- it doesn't look super urgent. I'll get it into a
> git t
On Mon, Dec 16, 2024 at 03:10:06PM +0100, Thomas Weißschuh wrote:
> The generic storage implementation provides the same features as the
> custom one. However it can be shared between architectures, making
> maintenance easier.
>
> Co-developed-by: Nam Cao
> Signed-off-by: Nam Cao
> Signed-off-b
Hi,
On Thu, Jan 09, 2025 at 02:29:19PM +0900, Akihiko Odaki wrote:
> On 2025/01/08 22:50, Dave Martin wrote:
> > On Wed, Jan 08, 2025 at 01:53:51PM +0900, Akihiko Odaki wrote:
> > > On 2025/01/08 1:17, Dave Martin wrote:
> > > > Hi,
> > > >
> > > > On Tue, Jan 07, 2025 at 09:45:56PM +0900, Akihik
On Thu, Jan 09, 2025 at 02:16:39PM +0100, Joel Granados wrote:
> Add the const qualifier to all the ctl_tables in the tree except the
> ones in ./net dir. The "net" sysctl code is special as it modifies the
> arrays before passing it on to the registration function.
>
...
> diff --git a/drivers/ch
On Thu, Jan 09, 2025 at 02:16:39PM +0100, Joel Granados wrote:
> Add the const qualifier to all the ctl_tables in the tree except the
> ones in ./net dir. The "net" sysctl code is special as it modifies the
> arrays before passing it on to the registration function.
>
> Constifying ctl_table struc
Hello Michael,
On 07/12/24 07:28, Michael Ellerman wrote:
Avnish Chouhan writes:
Change RMA size from 512 MB to 768 MB which will result
in more RMA at boot time for PowerPC.
Did you consider just increasing it to 1GB?
I see an impact of setting RMA to 1GB on fadump. Here’s how:
The minimu
On Wed, 8 Jan 2025, Yazen Ghannam wrote:
> On Wed, Dec 18, 2024 at 04:37:46PM +0200, Ilpo Järvinen wrote:
> > pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log
> > (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.
> >
> > Generalize pcie_read_tlp_log() and struct pc
Add the possibility of marking a page so that the UW and SW bits are
force-cleared. This is stored in the private info so that it persists
across multiple calls to kvmppc_e500_setup_stlbe.
Signed-off-by: Paolo Bonzini
---
arch/powerpc/kvm/e500.h | 2 ++
arch/powerpc/kvm/e500_mmu_host.
The new __kvm_faultin_pfn() function is upset by the fact that e500 KVM
ignores host page permissions - __kvm_faultin requires a "writable"
outgoing argument, but e500 KVM is nonchalantly passing NULL.
If the host page permissions do not include writability, the shadow
TLB entry is forcibly mapped
kvmppc_e500_ref_setup is returning whether the guest TLB entry is writable,
which is than passed to kvm_release_faultin_page. This makes little sense
for two reasons: first, because the function sets up the private data for
the page and the return value feels like it has been bolted on the side;
s
e500 KVM tries to bypass __kvm_faultin_pfn() in order to map VM_PFNMAP
VMAs as huge pages. This is a Bad Idea because VM_PFNMAP VMAs could
become noncontiguous as a result of callsto remap_pfn_range().
Instead, use the already existing host PTE lookup to retrieve a
valid host-side mapping level a
[Oliver/Will/Anup/Andrew, you're Cc'd because of an observation below
on VM_PFNMAP mappings. - Paolo]
The new __kvm_faultin_pfn() function is upset by the fact that e500
KVM ignores host page permissions - __kvm_faultin requires a "writable"
outgoing argument, but e500 KVM is passing NULL.
While
Avoid a NULL pointer dereference if the memslot table changes between the
exit and the call to kvmppc_e500_shadow_map().
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini
---
arch/powerpc/kvm/e500_mmu_host.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/kvm/e500_mmu
On Thu, 09 Jan 2025 07:51:45 +0100, Christophe Leroy wrote:
> The following appears in kernel log at boot:
>
> fsl_spi b01004c0.spi: at 0x(ptrval) (irq = 51), QE mode
>
> This is useless, so remove the display of that virtual address and
> display the MMIO address instead, just like serial
On Thu, Jan 09, 2025, Paolo Bonzini wrote:
> @@ -483,7 +383,7 @@ static inline int kvmppc_e500_shadow_map(struct
> kvmppc_vcpu_e500 *vcpu_e500,
>* can't run hence pfn won't change.
>*/
> local_irq_save(flags);
> - ptep = find_linux_pte(pgdir, hva, NULL, NULL);
> + pte
From: Andrey Albershteyn
Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
extended attributes/flags. The syscalls take parent directory FD and
path to the child together with struct fsxattr.
This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't n
On Thu, Jan 09, 2025, Paolo Bonzini wrote:
> Avoid a NULL pointer dereference if the memslot table changes between the
> exit and the call to kvmppc_e500_shadow_map().
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini
> ---
> arch/powerpc/kvm/e500_mmu_host.c | 5 +
> 1 file chang
On Thu, Jan 09, 2025, Paolo Bonzini wrote:
> kvmppc_e500_ref_setup is returning whether the guest TLB entry is writable,
> which is than passed to kvm_release_faultin_page. This makes little sense
s/than/then
> for two reasons: first, because the function sets up the private data for
> the page
On Thu, Jan 9, 2025, at 18:45, Andrey Albershteyn wrote:
>
> arch/alpha/kernel/syscalls/syscall.tbl | 2 +
> arch/m68k/kernel/syscalls/syscall.tbl | 2 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
> arch/parisc/kernel/syscalls/syscall.tbl | 2 +
> arch/powerpc/kern
64 matches
Mail list logo