Re: [PATCH v12 26/39] arm64/ptrace: Expose GCS via ptrace and core files
On Mon, Sep 02, 2024 at 07:58:28PM +0100, Mark Brown wrote: > On Mon, Sep 02, 2024 at 07:56:32PM +0100, Catalin Marinas wrote: > > On Thu, Aug 29, 2024 at 12:27:42AM +0100, Mark Brown wrote: > > > > +static int gcs_get(struct task_struct *target, > > > +const struct user_regset *regset, > > > +struct membuf to) > > > > + if (target == current) > > > + gcs_preserve_current_state(); > > > What's the use-case where target == current and do we need something > > similar for gcs_set()? > > IIRC core file generation so no. Ah, true. Makes sense. -- Catalin
Re: [PATCH 1/2] arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm: kill mm_cpumask usage"))
Hi, I know Will is on the case but just expressing some thoughts of my own. On Mon, Jun 17, 2019 at 11:32:54PM +0900, Takao Indoh wrote: > From: Takao Indoh > > mm_cpumask was deleted by the commit 38d96287504a ("arm64: mm: kill > mm_cpumask usage") because it was not used at that time. Now this is needed > to find appropriate CPUs for TLB flush, so this patch reverts this commit. > > Signed-off-by: QI Fuli > Signed-off-by: Takao Indoh > --- > arch/arm64/include/asm/mmu_context.h | 7 ++- > arch/arm64/kernel/smp.c | 6 ++ > arch/arm64/mm/context.c | 2 ++ > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/mmu_context.h > b/arch/arm64/include/asm/mmu_context.h > index 2da3e478fd8f..21ef11590bcb 100644 > --- a/arch/arm64/include/asm/mmu_context.h > +++ b/arch/arm64/include/asm/mmu_context.h > @@ -241,8 +241,13 @@ static inline void > switch_mm(struct mm_struct *prev, struct mm_struct *next, > struct task_struct *tsk) > { > - if (prev != next) > + unsigned int cpu = smp_processor_id(); > + > + if (prev != next) { > __switch_mm(next); > + cpumask_clear_cpu(cpu, mm_cpumask(prev)); > + local_flush_tlb_mm(prev); > + } That's not actually a revert as we've never flushed the TLBs on the switch_mm() path. Also, this flush is not sufficient on a CnP capable CPU since another thread of the same CPU could have the prev TTBR0_EL1 value set and loading the TLB back. -- Catalin
Re: [PATCH 2/2] arm64: tlb: Add boot parameter to disable TLB flush within the same inner shareable domain
On Mon, Jun 17, 2019 at 11:32:55PM +0900, Takao Indoh wrote: > From: Takao Indoh > > This patch adds new boot parameter 'disable_tlbflush_is' to disable TLB > flush within the same inner shareable domain for performance tuning. > > In the case of flush_tlb_mm() *without* this parameter, TLB entry is > invalidated by __tlbi(aside1is, asid). By this instruction, all CPUs within > the same inner shareable domain check if there are TLB entries which have > this ASID, this causes performance noise, especially at large-scale HPC > environment, which has more than thousand nodes with low latency > interconnect. > > When this new parameter is specified, TLB entry is invalidated by > __tlbi(aside1, asid) only on the CPUs specified by mm_cpumask(mm). > Therefore TLB flush is done on minimal CPUs and performance problem does > not occur. > > Signed-off-by: QI Fuli > Signed-off-by: Takao Indoh [...] > +void flush_tlb_mm(struct mm_struct *mm) > +{ > + if (disable_tlbflush_is) > + on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, > + (void *)mm, true); > + else > + __flush_tlb_mm(mm); > +} Could we try instead to call a _nosync variant here when cpumask_weight() is 1 or the *IS if greater than 1 and avoid the IPI? Will tried this in the past but because of the task placement after fork()+execve(), I think we always ended up with a weight of 2 in the child process. Your first patch "solves" this by flushing the TLBs on context switch (bar the CnP case). Can you give it a try to see if it improves things? At least it's a starting point for further investigation. I fully agree with Will that we don't want two different TLB handling implementations in the arm64 kernel and even less desirable to have a command line option. Thanks. -- Catalin
Re: [PATCH v6 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
Hi Dave, On Wed, Jul 31, 2019 at 09:43:46AM -0700, Dave Hansen wrote: > On 7/25/19 6:50 AM, Vincenzo Frascino wrote: > > With the relaxed ABI proposed through this document, it is now possible > > to pass tagged pointers to the syscalls, when these pointers are in > > memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap(). > > I don't see a lot of description of why this restriction is necessary. > What's the problem with supporting MAP_SHARED? We could support MAP_SHARED | MAP_ANONYMOUS (and based on some internal discussions, this would be fine with the hardware memory tagging as well). What we don't want in the ABI is to support file mmap() for top-byte-ignore (or MTE). If you see a use-case, please let us know. -- Catalin
[PATCH v7 0/2] arm64 tagged address ABI
Hi, Thanks for the feedback so far. This is an updated series documenting the AArch64 Tagged Address ABI as implemented by these patches: http://lkml.kernel.org/r/cover.1563904656.git.andreyk...@google.com Version 6 of the documentation series is available here: http://lkml.kernel.org/r/20190725135044.24381-1-vincenzo.frasc...@arm.com Changes in v7: - Dropped the MAP_PRIVATE requirements for tagged pointers for both anonymous and file mappings. One reason is that we can't enforce such restriction anyway. The other reason is that a future series implementing support for the hardware MTE will detect incompatibilities of the new PROT_MTE flag with various mmap() options. - As a consequence of the above, I removed Szabolcs ack as I'm not sure he's ok with the change. - Clarified the sysctl and prctl() interaction and reordered the descriptions. - Reworded the prctl(PR_SET_MM) restrictions. - Removed the description of the tag preservation from the first patch as it didn't really make sense (the syscall ABI has always preserved all registers other than x0 on return to user). - s/ARM64/AArch64/ for consistency with the tagged-pointers.rst document. - Other minor rewordings. Vincenzo Frascino (2): arm64: Define Documentation/arm64/tagged-address-abi.rst arm64: Relax Documentation/arm64/tagged-pointers.rst Documentation/arm64/tagged-address-abi.rst | 151 + Documentation/arm64/tagged-pointers.rst| 23 +++- 2 files changed, 167 insertions(+), 7 deletions(-) create mode 100644 Documentation/arm64/tagged-address-abi.rst
[PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
From: Vincenzo Frascino On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary. With the relaxed ABI proposed through this document, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap(). This change in the ABI requires a mechanism to requires the userspace to opt-in to such an option. Specify and document the way in which sysctl and prctl() can be used in combination to allow the userspace to opt-in this feature. Cc: Will Deacon Cc: Andrey Konovalov Cc: Szabolcs Nagy Cc: Kevin Brodsky Signed-off-by: Vincenzo Frascino [catalin.mari...@arm.com: some rewording, dropped MAP_PRIVATE] Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-address-abi.rst | 151 + 1 file changed, 151 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index ..f91a5d2ac865 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,151 @@ +== +AArch64 TAGGED ADDRESS ABI +== + +Author: Vincenzo Frascino + +Date: 25 July 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--- + +On AArch64 the TCR_EL1.TBI0 bit has always been enabled, allowing userspace +(EL0) to perform memory accesses through 64-bit pointers with a non-zero +top byte. Such tagged pointers, however, were not allowed at the +user-kernel syscall ABI boundary. + +This document describes the relaxation of the syscall ABI that allows +userspace to pass certain tagged pointers to kernel syscalls, as described +in section 2. + +2. AArch64 Tagged Address ABI +- + +From the kernel syscall interface perspective and for the purposes of this +document, a "valid tagged pointer" is a pointer with a potentially non-zero +top-byte that references an address in the user process address space +obtained in one of the following ways: + +- mmap() done by the process itself (or its parent), where either: + + - flags have the **MAP_ANONYMOUS** bit set + - the file descriptor refers to a regular file (including those returned +by memfd_create()) or **/dev/zero** + +- brk() system call done by the process itself (i.e. the heap area between + the initial location of the program break at process creation and its + current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for mmap() above (e.g. + data, bss, stack). + +The AArch64 Tagged Address ABI is an opt-in feature and an application can +control it via **prctl()** as follows: + +- **PR_SET_TAGGED_ADDR_CTRL**: enable or disable the AArch64 Tagged Address + ABI for the calling process. + + The (unsigned int) arg2 argument is a bit mask describing the control mode + used: + + - **PR_TAGGED_ADDR_ENABLE**: enable AArch64 Tagged Address ABI. Default +status is disabled. + + The arguments arg3, arg4, and arg5 are ignored. + +- **PR_GET_TAGGED_ADDR_CTRL**: get the status of the AArch64 Tagged Address + ABI for the calling process. + + The arguments arg2, arg3, arg4, and arg5 are ignored. + +The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the +AArch64 Tagged Address ABI is not available +(CONFIG_ARM64_TAGGED_ADDR_ABI disabled or sysctl abi.tagged_addr=0). + +The ABI properties set by the mechanism described above are inherited by +threads of the same application and fork()'ed children but cleared by +execve(). + +Opting in (the prctl() option described above only) to or out of the +AArch64 Tagged Address ABI can be disabled globally at runtime using the +sysctl interface: + +- **abi.tagged_addr**: a new sysctl interface that can be used to prevent + applications from enabling or disabling the relaxed ABI. The sysctl + supports the following configuration options: + + - **0**: disable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to +enable/disable the AArch64 Tagged Address ABI globally + + - **1** (Default): enable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to +enable/disable the AArch64 Tagged Address ABI globally + + Note that this sysctl does not affect the status of the AArch64 Tagged + Address ABI of the running processes. + +When a process has successfully enabled the new ABI by invoking +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following +behaviours are guaranteed: + +- Every currently available syscall, except the cases mentioned in section + 3, can accept any valid tagged pointer. The same rule is applicable to + any syscall introduced in the future. +
[PATCH v7 2/2] arm64: Relax Documentation/arm64/tagged-pointers.rst
From: Vincenzo Frascino On arm64 the TCR_EL1.TBI0 bit has been always enabled hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary. With the relaxed ABI proposed in this set, it is now possible to pass tagged pointers to the syscalls, when these pointers are in memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap(). Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the ARM64 Tagged Address ABI. Cc: Will Deacon Cc: Andrey Konovalov Cc: Szabolcs Nagy Cc: Kevin Brodsky Signed-off-by: Vincenzo Frascino [catalin.mari...@arm.com: minor tweaks] Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-pointers.rst | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..82a3eff71a70 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,8 @@ Passing tagged addresses to the kernel -- All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly. This includes, but is not limited to, addresses found in: @@ -33,18 +34,23 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph. -Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling visibility. +The AArch64 Tagged Address ABI description and the guarantees it +provides can be found in: Documentation/arm64/tagged-address-abi.rst. + Preserving tags --- @@ -59,6 +65,9 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. +This behaviour is preserved when the AArch64 Tagged Address ABI is +enabled. + Other considerations
Re: [PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
On Wed, Aug 07, 2019 at 01:38:16PM -0700, Dave Hansen wrote: > On 8/7/19 8:53 AM, Catalin Marinas wrote: > > +- mmap() done by the process itself (or its parent), where either: > > + > > + - flags have the **MAP_ANONYMOUS** bit set > > + - the file descriptor refers to a regular file (including those returned > > +by memfd_create()) or **/dev/zero** > > What's a "regular file"? ;) We could make it more explicit like '(stat.st_mode & S_IFMT) == S_IFREG' but it gets too verbose ;). > > +- brk() system call done by the process itself (i.e. the heap area between > > + the initial location of the program break at process creation and its > > + current location). > > + > > +- any memory mapped by the kernel in the address space of the process > > + during creation and with the same restrictions as for mmap() above (e.g. > > + data, bss, stack). > > + > > +The AArch64 Tagged Address ABI is an opt-in feature and an application can > > +control it via **prctl()** as follows: > > + > > +- **PR_SET_TAGGED_ADDR_CTRL**: enable or disable the AArch64 Tagged Address > > + ABI for the calling process. > > + > > + The (unsigned int) arg2 argument is a bit mask describing the control > > mode > > + used: > > + > > + - **PR_TAGGED_ADDR_ENABLE**: enable AArch64 Tagged Address ABI. Default > > +status is disabled. > > + > > + The arguments arg3, arg4, and arg5 are ignored. > > For previous prctl()'s, we've found that it's best to require that the > unused arguments be 0. Without that, apps are free to put garbage > there, which makes extending the prctl to use other arguments impossible > in the future. We've had a bit of bikeshedding already: http://lkml.kernel.org/r/20190613110235.gw28...@e103592.cambridge.arm.com Extending the interface is still possible even with the current proposal, by changing arg2 etc. We also don't seem to be consistent in sys_prctl(). > Also, shouldn't this be converted over to an arch_prctl()? What do you mean by arch_prctl()? We don't have such thing, apart from maybe arch_prctl_spec_ctrl_*(). We achieve the same thing with the {SET,GET}_TAGGED_ADDR_CTRL macros. They could be renamed to arch_prctl_tagged_addr_{set,get} or something but I don't see much point. What would be better (for a separate patch series) is to clean up sys_prctl() and move the arch-specific options into separate arch_prctl() under arch/*/kernel/. But it's not really for this series. > > +The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the > > +AArch64 Tagged Address ABI is not available > > +(CONFIG_ARM64_TAGGED_ADDR_ABI disabled or sysctl abi.tagged_addr=0). > > + > > +The ABI properties set by the mechanism described above are inherited by > > +threads of the same application and fork()'ed children but cleared by > > +execve(). > > What is the scope of these prctl()'s? Are they thread-scoped or > process-scoped? Can two threads in the same process run with different > tagging ABI modes? Good point. They are thread-scoped and this should be made clear in the doc. Two threads can have different modes. The expectation is that this is invoked early during process start (by the dynamic loader or libc init) while in single-thread mode and subsequent threads will inherit the same mode. However, other uses are possible. > > +Opting in (the prctl() option described above only) to or out of the > > +AArch64 Tagged Address ABI can be disabled globally at runtime using the > > +sysctl interface: > > + > > +- **abi.tagged_addr**: a new sysctl interface that can be used to prevent > > + applications from enabling or disabling the relaxed ABI. The sysctl > > + supports the following configuration options: > > + > > + - **0**: disable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to > > +enable/disable the AArch64 Tagged Address ABI globally > > + > > + - **1** (Default): enable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to > > +enable/disable the AArch64 Tagged Address ABI globally > > + > > + Note that this sysctl does not affect the status of the AArch64 Tagged > > + Address ABI of the running processes. > > Shouldn't the name be "abi.tagged_addr_control" or something? It > actually has *zero* direct effect on tagged addresses in the ABI. Yeah, we could add a _ctrl suffix. I usually lack inspiration when naming things. > What's the reason for allowing it to be toggled at runtime like this? > Wouldn't it make more sense to just have it be a boot option so you > *know* what the state of individual processes is?
Re: [PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
On Mon, Aug 12, 2019 at 11:46:06AM +0100, Andrew Murray wrote: > On Thu, Aug 08, 2019 at 06:04:24PM +0100, Will Deacon wrote: > > On Wed, Aug 07, 2019 at 04:53:20PM +0100, Catalin Marinas wrote: > > > + > > > +- mmap() addr parameter. > > > + > > > +- mremap() new_address parameter. > > > + > > > +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and > > > + PR_SET_MM_MAP_SIZE. > > > + > > > +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map > > > fields. > > > > How did you generate this list and who will keep it up to date? How do you > > know you haven't missed anything? > > What about shared memory system calls: shmat, shmdt? The latest "arm64: untag > user pointers passed to the kernel" series doesn't untag these, thus we should > indicate here that these too are no supported. Yes. We dropped them from a previous version of the series but they've never been documented. I'll add something (unless someone has a real use-case for using tags with shmat/shmdt). -- Catalin
Re: [PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
On Fri, Aug 09, 2019 at 07:10:18AM -0700, Dave Hansen wrote: > On 8/8/19 10:27 AM, Catalin Marinas wrote: > > On Wed, Aug 07, 2019 at 01:38:16PM -0700, Dave Hansen wrote: > >> Also, shouldn't this be converted over to an arch_prctl()? > > > > What do you mean by arch_prctl()? We don't have such thing, apart from > > maybe arch_prctl_spec_ctrl_*(). We achieve the same thing with the > > {SET,GET}_TAGGED_ADDR_CTRL macros. They could be renamed to > > arch_prctl_tagged_addr_{set,get} or something but I don't see much > > point. > > Silly me. We have an x86-specific: > > SYSCALL_DEFINE2(arch_prctl, int , option, unsigned long , arg2) > > I guess there's no ARM equivalent so you're stuck with the generic one. > > > What would be better (for a separate patch series) is to clean up > > sys_prctl() and move the arch-specific options into separate > > arch_prctl() under arch/*/kernel/. But it's not really for this series. > > I think it does make sense for truly arch-specific features to stay out > of the arch-generic prctl(). Yes, I know I've personally violated this > in the past. :) Maybe Dave M could revive his prctl() clean-up patches which moves the arch specific cases to the corresponding arch/*/ code > >> What is the scope of these prctl()'s? Are they thread-scoped or > >> process-scoped? Can two threads in the same process run with different > >> tagging ABI modes? > > > > Good point. They are thread-scoped and this should be made clear in the > > doc. Two threads can have different modes. > > > > The expectation is that this is invoked early during process start (by > > the dynamic loader or libc init) while in single-thread mode and > > subsequent threads will inherit the same mode. However, other uses are > > possible. > > If that's the expectation, it would be really nice to codify it. > Basically, you can't enable the feature if another thread is already > been forked off. Well, that's my expectation but I'm not a userspace developer. I don't think there is any good reason to prevent it. It doesn't cost us anything to support in the kernel, other than making the documentation clearer. > >>> +When a process has successfully enabled the new ABI by invoking > >>> +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following > >>> +behaviours are guaranteed: > >>> + > >>> +- Every currently available syscall, except the cases mentioned in > >>> section > >>> + 3, can accept any valid tagged pointer. The same rule is applicable to > >>> + any syscall introduced in the future. > >>> + > >>> +- The syscall behaviour is undefined for non valid tagged pointers. > >> > >> Do you really mean "undefined"? I mean, a bad pointer is a bad pointer. > >> Why should it matter if it's a tagged bad pointer or an untagged bad > >> pointer? > > > > Szabolcs already replied here. We may have tagged pointers that can be > > dereferenced just fine but being passed to the kernel may not be well > > defined (e.g. some driver doing a find_vma() that fails unless it > > explicitly untags the address). It's as undefined as the current > > behaviour (without these patches) guarantees. > > It might just be nicer to point out what this features *changes* about > invalid pointer handling, which is nothing. :) Maybe: > > The syscall behaviour for invalid pointers is the same for both > tagged and untagged pointers. Good point. > >>> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and > >>> + PR_SET_MM_MAP_SIZE. > >>> + > >>> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map > >>> fields. > >>> + > >>> +Any attempt to use non-zero tagged pointers will lead to undefined > >>> +behaviour. > >> > >> I wonder if you want to generalize this a bit. I think you're saying > >> that parts of the ABI that modify the *layout* of the address space > >> never accept tagged pointers. > > > > I guess our difficulty in specifying this may have been caused by > > over-generalising. For example, madvise/mprotect came under the same > > category but there is a use-case for malloc'ed pointers (and tagged) to > > the kernel (e.g. MADV_DONTNEED). If we can restrict the meaning to > > address space *layout* manipulation, we'd have mmap/mremap/munmap, > > brk/sbrk, prctl(PR_SET_MM). Did I miss anythin
[PATCH v8 0/2] arm64 tagged address ABI
Hi, This series contains an update to the arm64 tagged address ABI documentation posted here (v7): http://lkml.kernel.org/r/20190807155321.9648-1-catalin.mari...@arm.com together some adjustments to Andrey's patches (already queued through different trees) following the discussions on the ABI documents: http://lkml.kernel.org/r/cover.1563904656.git.andreyk...@google.com If there are not objections, I propose that that patch 1 (mm: untag user pointers in mmap...) goes via the mm tree while the other 4 are routed via the arm64 tree. Changes in v8: - removed mmap/munmap/mremap/brk from the list of syscalls not accepting tagged pointers - added ioctl() to the list of syscalls not accepting tagged pointers - added shmat/shmdt to a list of syscalls not accepting tagged pointers - prctl() now requires all unused arguments to be 0 - note about two-stage ABI relaxation since even without the prctl() opt-in, the tag is still ignored on a few syscalls (untagged_addr() in the kernel is unconditional) - compilable example code together with syscall use - added a note on tag preservation in the tagged-pointers.rst document - various rewordings and cleanups Catalin Marinas (3): mm: untag user pointers in mmap/munmap/mremap/brk arm64: Tighten the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl() unused arguments arm64: Change the tagged_addr sysctl control semantics to only prevent the opt-in Vincenzo Frascino (2): arm64: Define Documentation/arm64/tagged-address-abi.rst arm64: Relax Documentation/arm64/tagged-pointers.rst Documentation/arm64/tagged-address-abi.rst | 155 + Documentation/arm64/tagged-pointers.rst| 23 ++- arch/arm64/kernel/process.c| 17 ++- kernel/sys.c | 4 + mm/mmap.c | 5 + mm/mremap.c| 6 +- 6 files changed, 191 insertions(+), 19 deletions(-) create mode 100644 Documentation/arm64/tagged-address-abi.rst
[PATCH v8 1/5] mm: untag user pointers in mmap/munmap/mremap/brk
There isn't a good reason to differentiate between the user address space layout modification syscalls and the other memory permission/attributes ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the user addresses on entry to these functions. Signed-off-by: Catalin Marinas --- mm/mmap.c | 5 + mm/mremap.c | 6 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..b766b633b7ae 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -201,6 +201,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) bool downgraded = false; LIST_HEAD(uf); + brk = untagged_addr(brk); + if (down_write_killable(&mm->mmap_sem)) return -EINTR; @@ -1573,6 +1575,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, struct file *file = NULL; unsigned long retval; + addr = untagged_addr(addr); + if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); @@ -2874,6 +2878,7 @@ EXPORT_SYMBOL(vm_munmap); SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) { + addr = untagged_addr(addr); profile_munmap(addr); return __vm_munmap(addr, len, true); } diff --git a/mm/mremap.c b/mm/mremap.c index 64c9a3b8be0a..1fc8a29fbe3f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -606,12 +606,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); - /* -* Architectures may interpret the tag passed to mmap as a background -* colour for the corresponding vma. For mremap we don't allow tagged -* new_addr to preserve similar behaviour to mmap. -*/ addr = untagged_addr(addr); + new_addr = untagged_addr(new_addr); if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) return ret;
[PATCH v8 3/5] arm64: Change the tagged_addr sysctl control semantics to only prevent the opt-in
First rename the sysctl control to abi.tagged_addr_disabled and make it default off (zero). When abi.tagged_addr_disabled == 1, only block the enabling of the TBI ABI via prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE). Getting the status of the ABI or disabling it is still allowed. Signed-off-by: Catalin Marinas --- arch/arm64/kernel/process.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 76b7c55026aa..03689c0beb34 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -579,17 +579,22 @@ void arch_setup_new_exec(void) /* * Control the relaxed ABI allowing tagged user addresses into the kernel. */ -static unsigned int tagged_addr_prctl_allowed = 1; +static unsigned int tagged_addr_disabled; long set_tagged_addr_ctrl(unsigned long arg) { - if (!tagged_addr_prctl_allowed) - return -EINVAL; if (is_compat_task()) return -EINVAL; if (arg & ~PR_TAGGED_ADDR_ENABLE) return -EINVAL; + /* +* Do not allow the enabling of the tagged address ABI if globally +* disabled via sysctl abi.tagged_addr_disabled. +*/ + if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled) + return -EINVAL; + update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE); return 0; @@ -597,8 +602,6 @@ long set_tagged_addr_ctrl(unsigned long arg) long get_tagged_addr_ctrl(void) { - if (!tagged_addr_prctl_allowed) - return -EINVAL; if (is_compat_task()) return -EINVAL; @@ -618,9 +621,9 @@ static int one = 1; static struct ctl_table tagged_addr_sysctl_table[] = { { - .procname = "tagged_addr", + .procname = "tagged_addr_disabled", .mode = 0644, - .data = &tagged_addr_prctl_allowed, + .data = &tagged_addr_disabled, .maxlen = sizeof(int), .proc_handler = proc_dointvec_minmax, .extra1 = &zero,
[PATCH v8 5/5] arm64: Relax Documentation/arm64/tagged-pointers.rst
From: Vincenzo Frascino On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. However, such pointers were not allowed at the user-kernel syscall ABI boundary. With the Tagged Address ABI patchset, it is now possible to pass tagged pointers to the syscalls. Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the AArch64 Tagged Address ABI. Cc: Will Deacon Cc: Andrey Konovalov Cc: Szabolcs Nagy Cc: Kevin Brodsky Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-pointers.rst | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..fd5306019e91 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,9 @@ Passing tagged addresses to the kernel -- All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arm64/tagged-address-abi.rst). This includes, but is not limited to, addresses found in: @@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph. -Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling @@ -59,6 +63,11 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. In addition, with the exceptions above, the kernel will +preserve any non-zero tags passed by the user via syscalls and stored in +kernel data structures (e.g. set_robust_list(), sigaltstack()). + Other considerations
[PATCH v8 4/5] arm64: Define Documentation/arm64/tagged-address-abi.rst
From: Vincenzo Frascino On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. Introduce the document describing the relaxation of the syscall ABI that allows userspace to pass certain tagged pointers to kernel syscalls. Cc: Will Deacon Cc: Andrey Konovalov Cc: Szabolcs Nagy Cc: Kevin Brodsky Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-address-abi.rst | 155 + 1 file changed, 155 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index ..8808337775d6 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,155 @@ +== +AArch64 TAGGED ADDRESS ABI +== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 15 August 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--- + +On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace +(EL0) to perform memory accesses through 64-bit pointers with a non-zero +top byte. This document describes the relaxation of the syscall ABI that +allows userspace to pass certain tagged pointers to kernel syscalls. + +2. AArch64 Tagged Address ABI +- + +From the kernel syscall interface perspective and for the purposes of +this document, a "valid tagged pointer" is a pointer with a potentially +non-zero top-byte that references an address in the user process address +space obtained in one of the following ways: + +- mmap() done by the process itself (or its parent), where either: + + - flags have the **MAP_ANONYMOUS** bit set + - the file descriptor refers to a regular file (including those +returned by memfd_create()) or **/dev/zero** + +- brk() system call done by the process itself (i.e. the heap area + between the initial location of the program break at process creation + and its current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for mmap() above + (e.g. data, bss, stack). + +The AArch64 Tagged Address ABI has two stages of relaxation depending +how the user addresses are used by the kernel: + +1. User addresses not accessed by the kernel but used for address space + management (e.g. mmap(), mprotect(), madvise()). The use of valid + tagged pointers in this context is always allowed. + +2. User addresses accessed by the kernel (e.g. write()). This ABI + relaxation is disabled by default and the application thread needs to + explicitly enable it via **prctl()** as follows: + + - **PR_SET_TAGGED_ADDR_CTRL**: enable or disable the AArch64 Tagged + Address ABI for the calling thread. + + The (unsigned int) arg2 argument is a bit mask describing the + control mode used: + + - **PR_TAGGED_ADDR_ENABLE**: enable AArch64 Tagged Address ABI. + Default status is disabled. + + Arguments arg3, arg4, and arg5 must be 0. + + - **PR_GET_TAGGED_ADDR_CTRL**: get the status of the AArch64 Tagged + Address ABI for the calling thread. + + Arguments arg2, arg3, arg4, and arg5 must be 0. + + The ABI properties described above are thread-scoped, inherited on + clone() and fork() and cleared on exec(). + + Calling prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0) + returns -EINVAL if the AArch64 Tagged Address ABI is globally disabled + by sysctl abi.tagged_addr_disabled=1. The default sysctl + abi.tagged_addr_disabled configuration is 0. + +When the AArch64 Tagged Address ABI is enabled for a thread, the +following behaviours are guaranteed: + +- All syscalls except the cases mentioned in section 3 can accept any + valid tagged pointer. + +- The syscall behaviour is undefined for invalid tagged pointers: it may + result in an error code being returned, a (fatal) signal being raised, + or other modes of failure. + +- A valid tagged pointer has the same semantics as the corresponding + untagged pointer. + +A definition of the meaning of tagged pointers on AArch64 can be found +in Documentation/arm64/tagged-pointers.rst. + +3. AArch64 Tagged Address ABI Exceptions +- + +The following system call parameters must be untagged regardless of the +ABI relaxation: + +- prctl() other than arguments pointing to user structures to be + accessed by the kernel. + +- ioctl() other than arguments pointing to user structures to be + accessed by the kernel. + +- shmat() and shmdt(). + +Any attempt to use non-zero tagged pointers may result in an error code +being returned, a (fatal) signal being raised, or other modes of +fai
[PATCH v8 2/5] arm64: Tighten the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl() unused arguments
Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored for future extensions. Signed-off-by: Catalin Marinas --- kernel/sys.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/sys.c b/kernel/sys.c index c6c4d5358bd3..ec48396b4943 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2499,9 +2499,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = PAC_RESET_KEYS(me, arg2); break; case PR_SET_TAGGED_ADDR_CTRL: + if (arg3 || arg4 || arg5) + return -EINVAL; error = SET_TAGGED_ADDR_CTRL(arg2); break; case PR_GET_TAGGED_ADDR_CTRL: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; error = GET_TAGGED_ADDR_CTRL(); break; default:
[PATCH v9 3/3] arm64: Relax Documentation/arm64/tagged-pointers.rst
From: Vincenzo Frascino On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. However, such pointers were not allowed at the user-kernel syscall ABI boundary. With the Tagged Address ABI patchset, it is now possible to pass tagged pointers to the syscalls. Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the AArch64 Tagged Address ABI. Cc: Will Deacon Cc: Szabolcs Nagy Cc: Kevin Brodsky Acked-by: Andrey Konovalov Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-pointers.rst | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..04f2ba9b779e 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,9 @@ Passing tagged addresses to the kernel -- All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arm64/tagged-address-abi.rst). This includes, but is not limited to, addresses found in: @@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph. -Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling @@ -59,6 +63,11 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. In addition, with the exceptions above, the kernel will +preserve any non-zero tags passed by the user via syscalls and stored in +kernel data structures (e.g. ``set_robust_list()``, ``sigaltstack()``). + Other considerations
[PATCH v9 2/3] arm64: Define Documentation/arm64/tagged-address-abi.rst
From: Vincenzo Frascino On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. Introduce the document describing the relaxation of the syscall ABI that allows userspace to pass certain tagged pointers to kernel syscalls. Cc: Will Deacon Cc: Andrey Konovalov Cc: Szabolcs Nagy Cc: Kevin Brodsky Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-address-abi.rst | 156 + 1 file changed, 156 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index ..d4a85d535bf9 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,156 @@ +== +AArch64 TAGGED ADDRESS ABI +== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 21 August 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--- + +On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing +userspace (EL0) to perform memory accesses through 64-bit pointers with +a non-zero top byte. This document describes the relaxation of the +syscall ABI that allows userspace to pass certain tagged pointers to +kernel syscalls. + +2. AArch64 Tagged Address ABI +- + +From the kernel syscall interface perspective and for the purposes of +this document, a "valid tagged pointer" is a pointer with a potentially +non-zero top-byte that references an address in the user process address +space obtained in one of the following ways: + +- ``mmap()`` syscall where either: + + - flags have the ``MAP_ANONYMOUS`` bit set or + - the file descriptor refers to a regular file (including those +returned by ``memfd_create()``) or ``/dev/zero`` + +- ``brk()`` syscall (i.e. the heap area between the initial location of + the program break at process creation and its current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for ``mmap()`` above + (e.g. data, bss, stack). + +The AArch64 Tagged Address ABI has two stages of relaxation depending +how the user addresses are used by the kernel: + +1. User addresses not accessed by the kernel but used for address space + management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use + of valid tagged pointers in this context is always allowed. + +2. User addresses accessed by the kernel (e.g. ``write()``). This ABI + relaxation is disabled by default and the application thread needs to + explicitly enable it via ``prctl()`` as follows: + + - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged + Address ABI for the calling thread. + + The ``(unsigned int) arg2`` argument is a bit mask describing the + control mode used: + + - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI. + Default status is disabled. + + Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0. + + - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged + Address ABI for the calling thread. + + Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0. + + The ABI properties described above are thread-scoped, inherited on + clone() and fork() and cleared on exec(). + + Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)`` + returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally + disabled by ``sysctl abi.tagged_addr_disabled=1``. The default + ``sysctl abi.tagged_addr_disabled`` configuration is 0. + +When the AArch64 Tagged Address ABI is enabled for a thread, the +following behaviours are guaranteed: + +- All syscalls except the cases mentioned in section 3 can accept any + valid tagged pointer. + +- The syscall behaviour is undefined for invalid tagged pointers: it may + result in an error code being returned, a (fatal) signal being raised, + or other modes of failure. + +- The syscall behaviour for a valid tagged pointer is the same as for + the corresponding untagged pointer. + + +A definition of the meaning of tagged pointers on AArch64 can be found +in Documentation/arm64/tagged-pointers.rst. + +3. AArch64 Tagged Address ABI Exceptions +- + +The following system call parameters must be untagged regardless of the +ABI relaxation: + +- ``prctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``ioctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``shmat()`` and ``shmdt()``. + +Any attempt to us
[PATCH v9 1/3] mm: untag user pointers in mmap/munmap/mremap/brk
There isn't a good reason to differentiate between the user address space layout modification syscalls and the other memory permission/attributes ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the user addresses on entry to these functions. Acked-by: Will Deacon Acked-by: Andrey Konovalov Signed-off-by: Catalin Marinas --- mm/mmap.c | 5 + mm/mremap.c | 6 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..b766b633b7ae 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -201,6 +201,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) bool downgraded = false; LIST_HEAD(uf); + brk = untagged_addr(brk); + if (down_write_killable(&mm->mmap_sem)) return -EINTR; @@ -1573,6 +1575,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, struct file *file = NULL; unsigned long retval; + addr = untagged_addr(addr); + if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); @@ -2874,6 +2878,7 @@ EXPORT_SYMBOL(vm_munmap); SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) { + addr = untagged_addr(addr); profile_munmap(addr); return __vm_munmap(addr, len, true); } diff --git a/mm/mremap.c b/mm/mremap.c index 64c9a3b8be0a..1fc8a29fbe3f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -606,12 +606,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); - /* -* Architectures may interpret the tag passed to mmap as a background -* colour for the corresponding vma. For mremap we don't allow tagged -* new_addr to preserve similar behaviour to mmap. -*/ addr = untagged_addr(addr); + new_addr = untagged_addr(new_addr); if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) return ret;
[PATCH v9 0/3] arm64 tagged address ABI
Hi, This series is an update to the arm64 tagged address ABI documentation patches v8, posted here: http://lkml.kernel.org/r/20190815154403.16473-1-catalin.mari...@arm.com >From v8, I dropped patches 2 and 3 as they've been queued by Will via the arm64 tree. Reposting patch 1 (unmodified) as it should be merged via the mm tree. Changes in v9: - Replaced the emphasized/bold font with a typewriter one for function/constant names - Simplified the mmap/brk bullet points when describing the tagged pointer origin - Reworded expected syscall behaviour with valid tagged pointers - Reworded the prctl/ioctl restrictions to clarify the allowed tagged pointers w.r.t. user data access by the kernel Catalin Marinas (1): mm: untag user pointers in mmap/munmap/mremap/brk Vincenzo Frascino (2): arm64: Define Documentation/arm64/tagged-address-abi.rst arm64: Relax Documentation/arm64/tagged-pointers.rst Documentation/arm64/tagged-address-abi.rst | 156 + Documentation/arm64/tagged-pointers.rst| 23 ++- mm/mmap.c | 5 + mm/mremap.c| 6 +- 4 files changed, 178 insertions(+), 12 deletions(-) create mode 100644 Documentation/arm64/tagged-address-abi.rst
Re: [PATCH v9 3/3] arm64: Relax Documentation/arm64/tagged-pointers.rst
On Wed, Aug 21, 2019 at 07:46:51PM +0100, Dave P Martin wrote: > On Wed, Aug 21, 2019 at 06:33:53PM +0100, Will Deacon wrote: > > On Wed, Aug 21, 2019 at 05:47:30PM +0100, Catalin Marinas wrote: > > > @@ -59,6 +63,11 @@ be preserved. > > > The architecture prevents the use of a tagged PC, so the upper byte will > > > be set to a sign-extension of bit 55 on exception return. > > > > > > +This behaviour is maintained when the AArch64 Tagged Address ABI is > > > +enabled. In addition, with the exceptions above, the kernel will > > > +preserve any non-zero tags passed by the user via syscalls and stored in > > > +kernel data structures (e.g. ``set_robust_list()``, ``sigaltstack()``). > > sigaltstack() is interesting, since we don't support tagged stacks. We should support tagged SP with the new ABI as they'll be required for MTE. sigaltstack() and clone() are the two syscalls that come to mind here. > Do we keep the ss_sp tag in the kernel, but squash it when delivering > a signal to the alternate stack? We don't seem to be doing any untagging, so we just just use whatever the caller asked for. We may need a small test to confirm. That said, on_sig_stack() probably needs some untagging as it does user pointer arithmetics with potentially different tags. > > Hmm. I can see the need to provide this guarantee for things like > > set_robust_list(), but the problem is that the statement above is too broad > > and isn't strictly true: for example, mmap() doesn't propagate the tag of > > its address parameter into the VMA. > > > > So I think we need to nail this down a bit more, but I'm having a really > > hard time coming up with some wording :( > > Time for some creative vagueness? > > We can write a statement of our overall intent, along with examples of > a few cases where the tag should and should not be expected to emerge > intact. > > There is no foolproof rule, unless we can rewrite history... I would expect the norm to be the preservation of tags with a few exceptions. The only ones I think where we won't preserve the tags are mmap, mremap, brk (apart from the signal stuff already mentioned in the current tagged-pointers.rst doc). So I can remove this paragraph altogether and add a note in part 3 of the tagged-address-abi.rst document that mmap/mremap/brk do not preserve the tag information. -- Catalin
Re: [PATCH v9 3/3] arm64: Relax Documentation/arm64/tagged-pointers.rst
On Thu, Aug 22, 2019 at 05:37:23PM +0100, Dave P Martin wrote: > On Thu, Aug 22, 2019 at 04:55:32PM +0100, Catalin Marinas wrote: > > On Wed, Aug 21, 2019 at 07:46:51PM +0100, Dave P Martin wrote: > > > On Wed, Aug 21, 2019 at 06:33:53PM +0100, Will Deacon wrote: > > > > On Wed, Aug 21, 2019 at 05:47:30PM +0100, Catalin Marinas wrote: > > > > > @@ -59,6 +63,11 @@ be preserved. > > > > > The architecture prevents the use of a tagged PC, so the upper byte > > > > > will > > > > > be set to a sign-extension of bit 55 on exception return. > > > > > > > > > > +This behaviour is maintained when the AArch64 Tagged Address ABI is > > > > > +enabled. In addition, with the exceptions above, the kernel will > > > > > +preserve any non-zero tags passed by the user via syscalls and > > > > > stored in > > > > > +kernel data structures (e.g. ``set_robust_list()``, > > > > > ``sigaltstack()``). > > > > > > sigaltstack() is interesting, since we don't support tagged stacks. > > > > We should support tagged SP with the new ABI as they'll be required for > > MTE. sigaltstack() and clone() are the two syscalls that come to mind > > here. > > > > > Do we keep the ss_sp tag in the kernel, but squash it when delivering > > > a signal to the alternate stack? > > > > We don't seem to be doing any untagging, so we just just use whatever > > the caller asked for. We may need a small test to confirm. > > If we want to support tagged SP, then I guess we shouldn't be squashing > the tag anywhere. A test for that would be sensible to have. I hacked the sas.c kselftest to use a tagged stack and works fine, the SP register has a tagged address on the signal handler. > > > > Hmm. I can see the need to provide this guarantee for things like > > > > set_robust_list(), but the problem is that the statement above is too > > > > broad > > > > and isn't strictly true: for example, mmap() doesn't propagate the tag > > > > of > > > > its address parameter into the VMA. > > > > > > > > So I think we need to nail this down a bit more, but I'm having a really > > > > hard time coming up with some wording :( > > > > > > Time for some creative vagueness? > > > > > > We can write a statement of our overall intent, along with examples of > > > a few cases where the tag should and should not be expected to emerge > > > intact. > > > > > > There is no foolproof rule, unless we can rewrite history... > > > > I would expect the norm to be the preservation of tags with a few > > exceptions. The only ones I think where we won't preserve the tags are > > mmap, mremap, brk (apart from the signal stuff already mentioned in the > > current tagged-pointers.rst doc). > > > > So I can remove this paragraph altogether and add a note in part 3 of > > the tagged-address-abi.rst document that mmap/mremap/brk do not preserve > > the tag information. > > Deleting text is always a good idea ;) I'm going this route ;). -- Catalin
[PATCH v10 0/1] arm64 tagged address ABI
Hi, Minor update to the arm64 tagged address ABI documentation since v9, posted here: http://lkml.kernel.org/r/20190821164730.47450-1-catalin.mari...@arm.com The mmap/mremap/... patch (1/3) has been queued in the -mm tree and removed from this series. The tagged-address-abi.rst patch (2/3) has been queued in the arm64 for-next/core tree. There is only one patch left in this series (keeping the cover letter for consistency). Changes in v10: - Remove the tag preservation paragraph since the new ABI does not change the behaviour we already have. The only difference is that now the kernel can access tagged addresses (e.g. delivering a signal on a tagged alternate stack). Vincenzo Frascino (1): arm64: Relax Documentation/arm64/tagged-pointers.rst Documentation/arm64/tagged-pointers.rst | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-)
[PATCH v10 1/1] arm64: Relax Documentation/arm64/tagged-pointers.rst
From: Vincenzo Frascino On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. However, such pointers were not allowed at the user-kernel syscall ABI boundary. With the Tagged Address ABI patchset, it is now possible to pass tagged pointers to the syscalls. Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the AArch64 Tagged Address ABI. Cc: Will Deacon Cc: Szabolcs Nagy Cc: Kevin Brodsky Acked-by: Andrey Konovalov Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas --- Documentation/arm64/tagged-pointers.rst | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..eab4323609b9 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,9 @@ Passing tagged addresses to the kernel -- All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arm64/tagged-address-abi.rst). This includes, but is not limited to, addresses found in: @@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph. -Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling @@ -59,6 +63,9 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. + Other considerations
Re: [PATCH] docs: kmemleak: DEBUG_KMEMLEAK_EARLY_LOG_SIZE changed names
On Wed, Sep 25, 2019 at 02:31:14PM +, Jeremy Cline wrote: > Commit c5665868183f ("mm: kmemleak: use the memory pool for early > allocations") renamed CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE to > CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE. Update the documentation reference > to reflect that. > > Signed-off-by: Jeremy Cline I forgot about this. Thanks. Acked-by: Catalin Marinas
Re: [PATCH 0/4] arm64/cpufeature: Fix + doc update
On Fri, Oct 04, 2019 at 11:37:22AM +0100, Will Deacon wrote: > On Thu, Oct 03, 2019 at 12:12:07PM +0100, Julien Grall wrote: > > This patch fix an issue related to exposing the FRINT capability to > > userspace (see patch #1). The rest is documentation update. > > > For patches 2-4: > > Acked-by: Will Deacon > > Catalin can take them for 5.5, since I don't think they're urgent. Queued. Thanks. -- Catalin
Re: [PATCH 1/4] arm64: add support for the AMU extension v1
Hi Ionela, On Tue, Sep 17, 2019 at 02:42:25PM +0100, Ionela Voinescu wrote: > +#ifdef CONFIG_ARM64_AMU_EXTN > + > +/* > + * This per cpu variable only signals that the CPU implementation supports > the > + * AMU but does not provide information regarding all the events that it > + * supports. > + * When this amu_feat per CPU variable is true, the user of this feature can > + * only rely on the presence of the 4 fixed counters. But this does not > + * guarantee that the counters are enabled or access to these counters is > + * provided by code executed at higher exception levels. > + */ > +DEFINE_PER_CPU(bool, amu_feat) = false; > + > +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) > +{ > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > + pr_info("detected CPU%d: Activity Monitors extension\n", > + smp_processor_id()); > + this_cpu_write(amu_feat, true); > + } > +} Sorry if I missed anything as I just skimmed through this series. I can't see the amu_feat used anywhere in these patches, so on its own it doesn't make much sense. I also can't see the advantage of allowing mismatched CPU implementations for this feature. What's the intended use-case? Thanks. -- Catalin
Re: [PATCH 1/4] arm64: add support for the AMU extension v1
On Fri, Oct 11, 2019 at 11:31:40AM +0100, Ionela Voinescu wrote: > On 10/10/2019 18:20, Catalin Marinas wrote: > > On Tue, Sep 17, 2019 at 02:42:25PM +0100, Ionela Voinescu wrote: > >> +#ifdef CONFIG_ARM64_AMU_EXTN > >> + > >> +/* > >> + * This per cpu variable only signals that the CPU implementation > >> supports the > >> + * AMU but does not provide information regarding all the events that it > >> + * supports. > >> + * When this amu_feat per CPU variable is true, the user of this feature > >> can > >> + * only rely on the presence of the 4 fixed counters. But this does not > >> + * guarantee that the counters are enabled or access to these counters is > >> + * provided by code executed at higher exception levels. > >> + */ > >> +DEFINE_PER_CPU(bool, amu_feat) = false; > >> + > >> +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) > >> +{ > >> + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > >> + pr_info("detected CPU%d: Activity Monitors extension\n", > >> + smp_processor_id()); > >> + this_cpu_write(amu_feat, true); > >> + } > >> +} > > > > Sorry if I missed anything as I just skimmed through this series. I > > can't see the amu_feat used anywhere in these patches, so on its own it > > doesn't make much sense. > > No worries, you are correct, at the moment the per-cpu amu_feat is not > yet used anywhere. But the intention is to use it to discover the > feature at CPU level as some CPUs might implement AMU and some might > not. > > I'll soon submit some patches using the counters for the scheduler's > frequency invariance - to discover the frequency the CPUs are actually > running at in case there is power or thermal mitigation happening > outside of the OS. Thanks for the explanation. I guess I'll wait for the other patches to see how all fits together. In general I'm not keen on per-CPU variables exposed to the rest of the kernel. For example, is it always read in a non-preemptible context? I'd rather have an accessor function with the corresponding WARN_ON_ONCE(preemptible()). This may come with the rest of the patches. > More practically, it's possible that we'll see big.LITTLE platforms > where the big CPUs only will implement activity monitors and for those > CPUs it will be useful to get more accurate information on the current > frequency, for example, as power and thermal mitigation is more > probable to happen in the power domain of the big CPUs. As long as that's a realistic possibility (not just a theoretical one) and the in-kernel code can handle such asymmetry, it's fine by me. This could be another reason to never expose the AMU counters to user-space or guests. You can control preemption in the kernel but can't run user-space in a non-preemptible context. -- Catalin
Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt
Hi Vincenzo, Some minor comments below but it looks fine to me overall. Cc'ing Szabolcs as well since I'd like a view from the libc people. On Wed, Jun 12, 2019 at 03:21:10PM +0100, Vincenzo Frascino wrote: > diff --git a/Documentation/arm64/tagged-address-abi.txt > b/Documentation/arm64/tagged-address-abi.txt > new file mode 100644 > index ..96e149e2c55c > --- /dev/null > +++ b/Documentation/arm64/tagged-address-abi.txt > @@ -0,0 +1,111 @@ > +ARM64 TAGGED ADDRESS ABI > + > + > +This document describes the usage and semantics of the Tagged Address > +ABI on arm64. > + > +1. Introduction > +--- > + > +On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64 kernel, > +hence the userspace (EL0) is allowed to set a non-zero value in the top I'd be clearer here: "userspace (EL0) is allowed to perform a user memory access through a 64-bit pointer with a non-zero top byte" (or something along the lines). Otherwise setting a non-zero top byte is allowed on any architecture, dereferencing it is a problem. > +byte but the resulting pointers are not allowed at the user-kernel syscall > +ABI boundary. > + > +This document describes a relaxation of the ABI with which it is possible "relaxation of the ABI that makes it possible to..." > +to pass tagged tagged pointers to the syscalls, when these pointers are in > +memory ranges obtained as described in paragraph 2. "section 2" is better. There are a lot more paragraphs. > + > +Since it is not desirable to relax the ABI to allow tagged user addresses > +into the kernel indiscriminately, arm64 provides a new sysctl interface > +(/proc/sys/abi/tagged_addr) that is used to prevent the applications from > +enabling the relaxed ABI and a new prctl() interface that can be used to > +enable or disable the relaxed ABI. > + > +The sysctl is meant also for testing purposes in order to provide a simple > +way for the userspace to verify the return error checking of the prctl() > +command without having to reconfigure the kernel. > + > +The ABI properties are inherited by threads of the same application and > +fork()'ed children but cleared when a new process is spawn (execve()). "spawned". I guess you could drop these three paragraphs here and mention the inheritance properties when introducing the prctl() below. You can also mention the global sysctl switch after the prctl() was introduced. > + > +2. ARM64 Tagged Address ABI > +--- > + > +From the kernel syscall interface prospective, we define, for the purposes > +of this document, a "valid tagged pointer" as a pointer that either it has "either has" (no 'it') sounds slightly better but I'm not a native English speaker either. > +a zero value set in the top byte or it has a non-zero value, it is in memory > +ranges privately owned by a userspace process and it is obtained in one of > +the following ways: > + - mmap() done by the process itself, where either: > +* flags = MAP_PRIVATE | MAP_ANONYMOUS > +* flags = MAP_PRIVATE and the file descriptor refers to a regular > + file or "/dev/zero" > + - a mapping below sbrk(0) done by the process itself > + - any memory mapped by the kernel in the process's address space during > +creation and following the restrictions presented above (i.e. data, bss, > +stack). > + > +The ARM64 Tagged Address ABI is an opt-in feature, and an application can > +control it using the following prctl()s: > + - PR_SET_TAGGED_ADDR_CTRL: can be used to enable the Tagged Address ABI. enable or disable (not sure we need the latter but it doesn't heart). I'd add the arg2 description here as well. > + - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the Tagged > + Address ABI. > + > +As a consequence of invoking PR_SET_TAGGED_ADDR_CTRL prctl() by an > applications, > +the ABI guarantees the following behaviours: > + > + - Every current or newly introduced syscall can accept any valid tagged > +pointers. > + > + - If a non valid tagged pointer is passed to a syscall then the behaviour > +is undefined. > + > + - Every valid tagged pointer is expected to work as an untagged one. > + > + - The kernel preserves any valid tagged pointers and returns them to the > +userspace unchanged in all the cases except the ones documented in the > +"Preserving tags" paragraph of tagged-pointers.txt. I'd think we need to qualify the context here in which the kernel preserves the tagged pointers. Did you mean on the syscall return? > + > +A definition of the meaning of tagged pointers on arm64 can be found in: > +Documentation/arm64/tagged-pointers.txt. > + > +3. ARM64 Tagged Address ABI Exceptions > +-- > + > +The behaviours described in paragraph 2, with particular reference to the "section 2" > +acceptance by the syscalls of any valid tagged pointer are not applicable > +to the following cases: > + - mmap
Re: [PATCH v4 2/2] arm64: Relax Documentation/arm64/tagged-pointers.txt
A couple of minor nits below. On Wed, Jun 12, 2019 at 03:21:11PM +0100, Vincenzo Frascino wrote: > --- a/Documentation/arm64/tagged-pointers.txt > +++ b/Documentation/arm64/tagged-pointers.txt > @@ -18,7 +18,8 @@ Passing tagged addresses to the kernel > -- > > All interpretation of userspace memory addresses by the kernel assumes > -an address tag of 0x00. > +an address tag of 0x00, unless the userspace opts-in the ARM64 Tagged > +Address ABI via the PR_SET_TAGGED_ADDR_CTRL prctl(). > > This includes, but is not limited to, addresses found in: > > @@ -31,18 +32,23 @@ This includes, but is not limited to, addresses found in: > - the frame pointer (x29) and frame records, e.g. when interpreting > them to generate a backtrace or call graph. > > -Using non-zero address tags in any of these locations may result in an > -error code being returned, a (fatal) signal being raised, or other modes > -of failure. > +Using non-zero address tags in any of these locations when the > +userspace application did not opt-in to the ARM64 Tagged Address ABI, Nitpick: drop the comma after "ABI," since a predicate follows. > +may result in an error code being returned, a (fatal) signal being raised, > +or other modes of failure. > > -For these reasons, passing non-zero address tags to the kernel via > -system calls is forbidden, and using a non-zero address tag for sp is > -strongly discouraged. > +For these reasons, when the userspace application did not opt-in, passing > +non-zero address tags to the kernel via system calls is forbidden, and using > +a non-zero address tag for sp is strongly discouraged. > > Programs maintaining a frame pointer and frame records that use non-zero > address tags may suffer impaired or inaccurate debug and profiling > visibility. > > +A definition of the meaning of ARM64 Tagged Address ABI and of the > +guarantees that the ABI provides when the userspace opts-in via prctl() > +can be found in: Documentation/arm64/tagged-address-abi.txt. > + > > Preserving tags > --- > @@ -57,6 +63,9 @@ be preserved. > The architecture prevents the use of a tagged PC, so the upper byte will > be set to a sign-extension of bit 55 on exception return. > > +This behaviours are preserved even when the the userspace opts-in the ARM64 "These" ... "opts in to" > +Tagged Address ABI via the PR_SET_TAGGED_ADDR_CTRL prctl(). > + > > Other considerations > > -- > 2.21.0 -- Catalin
Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt
On Thu, Jun 13, 2019 at 12:37:32PM +0100, Dave P Martin wrote: > On Thu, Jun 13, 2019 at 11:15:34AM +0100, Vincenzo Frascino wrote: > > On 12/06/2019 16:35, Catalin Marinas wrote: > > > On Wed, Jun 12, 2019 at 03:21:10PM +0100, Vincenzo Frascino wrote: > > >> + - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the > > >> Tagged > > >> + Address ABI. [...] > Is there a canonical way to detect whether this whole API/ABI is > available? (i.e., try to call this prctl / check for an HWCAP bit, > etc.) The canonical way is a prctl() call. HWCAP doesn't make sense since it's not a hardware feature. If you really want a different way of detecting this (which I don't think it's worth), we can reinstate the AT_FLAGS bit. -- Catalin
Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt
On Thu, Jun 13, 2019 at 02:23:43PM +0100, Dave P Martin wrote: > On Thu, Jun 13, 2019 at 01:28:21PM +0100, Catalin Marinas wrote: > > On Thu, Jun 13, 2019 at 12:37:32PM +0100, Dave P Martin wrote: > > > On Thu, Jun 13, 2019 at 11:15:34AM +0100, Vincenzo Frascino wrote: > > > > On 12/06/2019 16:35, Catalin Marinas wrote: > > > > > On Wed, Jun 12, 2019 at 03:21:10PM +0100, Vincenzo Frascino wrote: > > > > >> + - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the > > > > >> Tagged > > > > >> + Address ABI. > > [...] > > > Is there a canonical way to detect whether this whole API/ABI is > > > available? (i.e., try to call this prctl / check for an HWCAP bit, > > > etc.) > > > > The canonical way is a prctl() call. HWCAP doesn't make sense since it's > > not a hardware feature. If you really want a different way of detecting > > this (which I don't think it's worth), we can reinstate the AT_FLAGS > > bit. > > Sure, I think this probably makes sense -- I'm still getting my around > which parts of the design are directly related to MTE and which aren't. > > I was a bit concerned about the interaction between > PR_SET_TAGGED_ADDR_CTRL and the sysctl: the caller might conclude that > this API is unavailable when actually tagged addresses are stuck on. > > I'm not sure whether this matters, but it's a bit weird. > > One option would be to change the semantics, so that the sysctl just > forbids turning tagging from off to on. Alternatively, we could return > a different error code to distinguish this case. This is the intention, just to forbid turning tagging on. We could return -EPERM instead, though my original intent was to simply pretend that the prctl does not exist like in an older kernel version. -- Catalin
Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt
Hi Szabolcs, On Wed, Jun 12, 2019 at 05:30:34PM +0100, Szabolcs Nagy wrote: > On 12/06/2019 15:21, Vincenzo Frascino wrote: > > +2. ARM64 Tagged Address ABI > > +--- > > + > > +From the kernel syscall interface prospective, we define, for the purposes > ^^^ > perspective > > > +of this document, a "valid tagged pointer" as a pointer that either it has > > +a zero value set in the top byte or it has a non-zero value, it is in > > memory > > +ranges privately owned by a userspace process and it is obtained in one of > > +the following ways: > > + - mmap() done by the process itself, where either: > > +* flags = MAP_PRIVATE | MAP_ANONYMOUS > > +* flags = MAP_PRIVATE and the file descriptor refers to a regular > > + file or "/dev/zero" > > this does not make it clear if MAP_FIXED or other flags are valid > (there are many map flags i don't know, but at least fixed should work > and stack/growsdown. i'd expect anything that's not incompatible with > private|anon to work). Just to clarify, this document tries to define the memory ranges from where tagged addresses can be passed into the kernel in the context of TBI only (not MTE); that is for hwasan support. FIXED or GROWSDOWN should not affect this. > > + - a mapping below sbrk(0) done by the process itself > > doesn't the mmap rule cover this? IIUC it doesn't cover it as that's memory mapped by the kernel automatically on access vs a pointer returned by mmap(). The statement above talks about how the address is obtained by the user. > > + - any memory mapped by the kernel in the process's address space during > > +creation and following the restrictions presented above (i.e. data, > > bss, > > +stack). > > OK. > > Can a null pointer have a tag? > (in case NULL is valid to pass to a syscall) Good point. I don't think it can. We may change this for MTE where we give a hint tag but no hint address, however, this document only covers TBI for now. > > +The ARM64 Tagged Address ABI is an opt-in feature, and an application can > > +control it using the following prctl()s: > > + - PR_SET_TAGGED_ADDR_CTRL: can be used to enable the Tagged Address ABI. > > + - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the Tagged > > + Address ABI. > > + > > +As a consequence of invoking PR_SET_TAGGED_ADDR_CTRL prctl() by an > > applications, > > +the ABI guarantees the following behaviours: > > + > > + - Every current or newly introduced syscall can accept any valid tagged > > +pointers. > > + > > + - If a non valid tagged pointer is passed to a syscall then the behaviour > > +is undefined. > > + > > + - Every valid tagged pointer is expected to work as an untagged one. > > + > > + - The kernel preserves any valid tagged pointers and returns them to the > > +userspace unchanged in all the cases except the ones documented in the > > +"Preserving tags" paragraph of tagged-pointers.txt. > > OK. > > i guess pointers of another process are not "valid tagged pointers" > for the current one, so e.g. in ptrace the ptracer has to clear the > tags before PEEK etc. Another good point. Are there any pros/cons here or use-cases? When we add MTE support, should we handle this differently? > > +A definition of the meaning of tagged pointers on arm64 can be found in: > > +Documentation/arm64/tagged-pointers.txt. > > + > > +3. ARM64 Tagged Address ABI Exceptions > > +-- > > + > > +The behaviours described in paragraph 2, with particular reference to the > > +acceptance by the syscalls of any valid tagged pointer are not applicable > > +to the following cases: > > + - mmap() addr parameter. > > + - mremap() new_address parameter. > > + - prctl_set_mm() struct prctl_map fields. > > + - prctl_set_mm_map() struct prctl_map fields. > > i don't understand the exception: does it mean that passing a tagged > address to these syscalls is undefined? I'd say it's as undefined as it is right now without these patches. We may be able to explain this better in the document. -- Catalin
Re: [PATCH v5 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt
On Tue, Jun 18, 2019 at 02:13:01PM +0100, Kevin Brodsky wrote: > On 13/06/2019 16:51, Vincenzo Frascino wrote: > > +The ARM64 Tagged Address ABI is an opt-in feature, and an application can > > +control it using the following: > > + - /proc/sys/abi/tagged_addr: a new sysctl interface that can be used to > > +prevent the applications from enabling the relaxed ABI. > > +The sysctl is meant also for testing purposes in order to provide a > > +simple way for the userspace to verify the return error checking of > > +the prctl() commands without having to reconfigure the kernel. > > +The sysctl supports the following configuration options: > > + - 0: Disable ARM64 Tagged Address ABI for all the applications. > > + - 1 (Default): Enable ARM64 Tagged Address ABI for all the > > +applications. > > I find this very confusing, because it suggests that the default value of > PR_GET_TAGGED_ADDR_CTRL for new processes will be set to the value of this > sysctl, when in fact this sysctl is about restricting the *availability* of > the new ABI. Instead of disabling the ABI, I would talk about disabling > access to the new ABI here. This bullet point needs to be re-written. The sysctl is meant to disable opting in to the ABI. I'd also drop the "meant for testing" part. I put it in my commit log as justification but I don't think it should be part of the ABI document. > > + - prctl()s: > > + - PR_SET_TAGGED_ADDR_CTRL: can be used to enable or disable the Tagged > > +Address ABI. > > +The (unsigned int) arg2 argument is a bit mask describing the > > +control mode used: > > + - PR_TAGGED_ADDR_ENABLE: Enable ARM64 Tagged Address ABI. > > +The arguments arg3, arg4, and arg5 are ignored. > > Have we definitely decided that arg{3,4,5} are ignored? Catalin? I don't have a strong preference either way. If it's simpler for the user to ignore them, fine by me. I can see in the current prctl commands a mix if ignore vs forced zero. > > +the ABI guarantees the following behaviours: > > + > > + - Every current or newly introduced syscall can accept any valid tagged > > +pointers. > "pointer". Also, is it really useful to talk about newly introduced syscall? > New from which point of view? I think we should drop this guarantee. It would have made sense if we allowed tagged pointers everywhere but we already have some exceptions. > > +3. ARM64 Tagged Address ABI Exceptions > > +-- > > + > > +The behaviours described in section 2, with particular reference to the > > +acceptance by the syscalls of any valid tagged pointer are not applicable > > +to the following cases: > > + - mmap() addr parameter. > > + - mremap() new_address parameter. > > + - prctl_set_mm() struct prctl_map fields. > > + - prctl_set_mm_map() struct prctl_map fields. > > prctl_set_mm() and prctl_set_mm_map() are internal kernel functions, not > syscall names. IIUC, we don't want to allow any address field settable via > the PR_SET_MM prctl() to be tagged. Catalin, is that correct? I think this > needs rephrasing. I fully agree. It should talk about PR_SET_MM, PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE. -- Catalin
Re: [PATCH 16/18] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
On Fri, Jan 06, 2017 at 02:10:03AM +0530, Yury Norov wrote: > On Wed, Dec 07, 2016 at 09:40:13PM +0100, Arnd Bergmann wrote: > > On Wednesday, December 7, 2016 4:59:13 PM CET Catalin Marinas wrote: > > > On Tue, Dec 06, 2016 at 11:55:08AM +0530, Yury Norov wrote: > > > > On Mon, Dec 05, 2016 at 04:34:23PM +, Catalin Marinas wrote: > > > > > On Fri, Oct 21, 2016 at 11:33:15PM +0300, Yury Norov wrote: > > > > > > New aarch32 ptrace syscall handler is introduced to avoid run-time > > > > > > detection of the task type. > > > > > > > > > > What's wrong with the run-time detection? If it's just to avoid a > > > > > negligible overhead, I would rather keep the code simpler by avoiding > > > > > duplicating the generic compat_sys_ptrace(). > > > > > > > > Nothing wrong. This is how Arnd asked me to do. You already asked this > > > > question: http://lkml.iu.edu/hypermail/linux/kernel/1604.3/00930.html > > > > > > Hmm, I completely forgot about this ;). There is still an advantage to > > > doing run-time checking if we avoid touching core code (less acks to > > > gather and less code duplication). > > > > > > Let's see what Arnd says but the initial patch looked simpler. > > > > I don't currently have either version of the patch in my inbox > > (the archive is on a different machine), but in general I'd still > > think it's best to avoid the runtime check for aarch64-ilp32 > > altogether. I'd have to look at the overall kernel source to > > see if it's worth avoiding one or two instances though, or > > if there are an overwhelming number of other checks that we > > can't avoid at all. > > > > Regarding ptrace, I notice that arch/tile doesn't even use > > the compat entry point for its ilp32 user space on 64-bit > > kernels, it just calls the regular 64-bit one. Would that > > help here? > > ILP32 tasks has unique context that is not like aarch64 or aarch32, > so we have to have unique ptrace handler. I prepared the patch for > ptrace with runtime ABI detection, as Catalin said, see there: > https://github.com/norov/linux/commit/1f66dc22a4450b192e83458f2c3cc0e79f53e670 > > If it's OK, I'd like to update submission. This looks better to me (and even better if you no longer need to touch the generic ptrace code). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC3 nowrap: PATCH v7 00/18] ILP32 for ARM64
On Sun, Dec 18, 2016 at 12:38:23PM +0530, Yury Norov wrote: > On Fri, Oct 21, 2016 at 11:32:59PM +0300, Yury Norov wrote: > > This series enables aarch64 with ilp32 mode, and as supporting work, > > introduces ARCH_32BIT_OFF_T configuration option that is enabled for > > existing 32-bit architectures but disabled for new arches (so 64-bit > > off_t is is used by new userspace). > > > > This version is based on kernel v4.9-rc1. It works with glibc-2.24, > > and tested with LTP. > > Hi Arnd, Catalin > > For last few days I'm trying to rebase this series on current master, > and I see significant conflicts and regressions. In fact, every time > I rebase on next rc1, I feel like I play a roulette. > > This is not a significant problem now because it's almost for sure > that this series will not get into 4.10, for reasons not related to > kernel code. And I have time to deal with regressions. But in general, > I'd like to try my patches on top of other candidates for next merge > window. I cannot read all emails in LKML, but I can easily detect > problems and join to the discussion at early stage if I see any problem. > > This is probably a noob question, and there are well-known branches, > like Andrew Morton's one. But at this stage it's very important to > have this series prepared for merge, and I'd prefer to ask about it. I'm not entirely sure what the question is. For development, you could base your series on a final release, e.g. 4.9. For reviews and especially if you are targeting a certain merging window, it's useful to rebase your patches on a fairly recent -rc, e.g. 4.10-rc3. I would entirely skip any non-tagged kernel states (like middle of the merging window) or out of tree branches. There may be a case to rebase on some other developer's branch but only if there is a dependency that can't be avoided and usually with prior agreement from both the respective developer (as not to rebase the branch) and the involved maintainers. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/18] arm64: ilp32: introduce binfmt_ilp32.c
On Thu, Dec 22, 2016 at 12:26:40AM +0530, Yury Norov wrote: > On Mon, Dec 05, 2016 at 03:38:01PM +0000, Catalin Marinas wrote: > > On Fri, Oct 21, 2016 at 11:33:09PM +0300, Yury Norov wrote: > > > binfmt_ilp32.c is needed to handle ILP32 binaries > > > > > > Signed-off-by: Yury Norov > > > Signed-off-by: Bamvor Zhang Jian > > > --- > > > arch/arm64/include/asm/elf.h | 6 +++ > > > arch/arm64/kernel/Makefile | 1 + > > > arch/arm64/kernel/binfmt_ilp32.c | 97 > > > > > > 3 files changed, 104 insertions(+) > > > create mode 100644 arch/arm64/kernel/binfmt_ilp32.c > > > > > > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h > > > index f259fe8..be29dde 100644 > > > --- a/arch/arm64/include/asm/elf.h > > > +++ b/arch/arm64/include/asm/elf.h > > > @@ -175,10 +175,16 @@ extern int arch_setup_additional_pages(struct > > > linux_binprm *bprm, > > > > > > #define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3) > > > > > > +#ifndef USE_AARCH64_GREG > > > /* AArch32 registers. */ > > > #define COMPAT_ELF_NGREG 18 > > > typedef unsigned int compat_elf_greg_t; > > > typedef compat_elf_greg_t > > > compat_elf_gregset_t[COMPAT_ELF_NGREG]; > > > +#else /* AArch64 registers for AARCH64/ILP32 */ > > > +#define COMPAT_ELF_NGREG ELF_NGREG > > > +#define compat_elf_greg_telf_greg_t > > > +#define compat_elf_gregset_t elf_gregset_t > > > +#endif > > > > I think you only need compat_elf_gregset_t definition here and leave the > > other two undefined. > > I checked everything here again, and found that almost all compat defines > may be moved to corresponding binfmt files. If everything is OK, I'll > incorporate next patch to the series It seems fine at a quick look but I'll have to see the final patch. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
Some minor comments below, nothing fundamental (as long as you say the new sequence doesn't have the speculative TLB load problem I mentioned on a previous version). On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote: > diff --git a/Documentation/arm64/silicon-errata.txt > b/Documentation/arm64/silicon-errata.txt > index 405da11..7151aed 100644 > --- a/Documentation/arm64/silicon-errata.txt > +++ b/Documentation/arm64/silicon-errata.txt > @@ -42,24 +42,25 @@ file acts as a registry of software workarounds in the > Linux Kernel and > will be updated when new workarounds are committed and backported to > stable kernels. > > -| Implementor| Component | Erratum ID | Kconfig > | > -++-+-+-+ > -| ARM| Cortex-A53 | #826319 | ARM64_ERRATUM_826319 > | > -| ARM| Cortex-A53 | #827319 | ARM64_ERRATUM_827319 > | > -| ARM| Cortex-A53 | #824069 | ARM64_ERRATUM_824069 > | > -| ARM| Cortex-A53 | #819472 | ARM64_ERRATUM_819472 > | > -| ARM| Cortex-A53 | #845719 | ARM64_ERRATUM_845719 > | > -| ARM| Cortex-A53 | #843419 | ARM64_ERRATUM_843419 > | > -| ARM| Cortex-A57 | #832075 | ARM64_ERRATUM_832075 > | > -| ARM| Cortex-A57 | #852523 | N/A > | > -| ARM| Cortex-A57 | #834220 | ARM64_ERRATUM_834220 > | > -| ARM| Cortex-A72 | #853709 | N/A > | > -| ARM| MMU-500 | #841119,#826419 | N/A > | > -|| | | > | > -| Cavium | ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 > | > -| Cavium | ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 > | > -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 > | > -| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 > | > -| Cavium | ThunderX SMMUv2 | #27704 | N/A | > -|| | | > | > -| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 > | > +| Implementor | Component | Erratum ID | Kconfig > | > ++---+-+-+--+ > +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 > | > +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 > | > +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 > | > +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 > | > +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 > | > +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 > | > +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 > | > +| ARM | Cortex-A57 | #852523 | N/A > | > +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 > | > +| ARM | Cortex-A72 | #853709 | N/A > | > +| ARM | MMU-500 | #841119,#826419 | N/A > | > +| | | | > | > +| Cavium| ThunderX ITS| #22375, #24313 | CAVIUM_ERRATUM_22375 > | > +| Cavium| ThunderX ITS| #23144 | CAVIUM_ERRATUM_23144 > | > +| Cavium| ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 > | > +| Cavium| ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 > | > +| Cavium| ThunderX SMMUv2 | #27704 | N/A > | > +| | | | > | > +| Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 > | > +| Qualcomm | Falkor v1 | E1003 | > QCOM_FALKOR_ERRATUM_1003 | Please don't change the "Implementor" column width, there is no point and it makes the patch harder to read (i.e. this hunk should only have one line). > diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c > index 4c63cb1..5a0a82a 100644 > --- a/arch/arm64/mm/context.c > +++ b/arch/arm64/mm/context.c > @@ -87,6 +87,11 @@ static void flush_context(unsigned int cpu) > /* Update the list of reserved ASIDs and the ASID bitmap. */ > bitmap_clear(asid_map, 0, NUM_USER_ASIDS); > > + /* Reserve ASID for Falkor erratum 1003 */ > + if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) && >
Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
On Wed, Jan 11, 2017 at 06:37:39PM +, Mark Rutland wrote: > On Wed, Jan 11, 2017 at 12:35:55PM -0600, Timur Tabi wrote: > > On 01/11/2017 12:33 PM, Mark Rutland wrote: > > >It'll need to affect all lines since the kconfig column needs to expand > > >by at least one character to fit QCOM_FALKOR_ERRATUM_1003. > > > > Or we can make the macro shorter. > > The name, as it is, is perfectly descriptive. > > Let's not sacrifice legibility over a non-issue. I agree, I didn't realise that the we expand the last column already. It's a non-issue indeed. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
On Wed, Jan 11, 2017 at 06:40:52PM +, Mark Rutland wrote: > On Wed, Jan 11, 2017 at 06:22:08PM +, Marc Zyngier wrote: > > On 11/01/17 18:06, Catalin Marinas wrote: > > > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote: > > >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > > >> index 32682be..9ee46df 100644 > > >> --- a/arch/arm64/mm/proc.S > > >> +++ b/arch/arm64/mm/proc.S > > >> @@ -23,6 +23,7 @@ > > >> #include > > >> #include > > >> #include > > >> +#include > > >> #include > > >> #include > > >> #include > > >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume) > > >> ENTRY(cpu_do_switch_mm) > > >> mmidx1, x1 // get mm->context.id > > >> bfi x0, x1, #48, #16// set the ASID > > >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 > > >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > >> +mrs x2, ttbr0_el1 > > >> +mov x3, #FALKOR_RESERVED_ASID > > >> +bfi x2, x3, #48, #16// reserved ASID + old > > >> BADDR > > >> +msr ttbr0_el1, x2 > > >> +isb > > >> +bfi x2, x0, #0, #48 // reserved ASID + new > > >> BADDR > > >> +msr ttbr0_el1, x2 > > >> +isb > > >> +alternative_else_nop_endif > > >> +#endif > > >> msr ttbr0_el1, x0 // set TTBR0 > > >> isb > > >> post_ttbr0_update_workaround > > > > > > Please move the above hunk to a pre_ttbr0_update_workaround macro for > > > consistency with post_ttbr0_update_workaround. > > > > In which case (and also for consistency), should we add that pre_ttbr0 > > macro to entry.S, just before __uaccess_ttbr0_enable? It may not be > > needed in the SW pan case, but it is probably worth entertaining the > > idea that there may be something to do there... > > Likewise, I beleive we may need to modify cpu_set_reserved_ttbr0(). This may be fine if my assumptions about this erratum are correct. In the cpu_set_reserved_ttbr0() case we set TTBR0_EL1 to a table without any entries, so no new entries could be tagged with the old ASID. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
On Wed, Jan 11, 2017 at 06:22:08PM +, Marc Zyngier wrote: > On 11/01/17 18:06, Catalin Marinas wrote: > > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote: > >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > >> index 32682be..9ee46df 100644 > >> --- a/arch/arm64/mm/proc.S > >> +++ b/arch/arm64/mm/proc.S > >> @@ -23,6 +23,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume) > >> ENTRY(cpu_do_switch_mm) > >>mmidx1, x1 // get mm->context.id > >>bfi x0, x1, #48, #16// set the ASID > >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 > >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003 > >> + mrs x2, ttbr0_el1 > >> + mov x3, #FALKOR_RESERVED_ASID > >> + bfi x2, x3, #48, #16// reserved ASID + old BADDR > >> + msr ttbr0_el1, x2 > >> + isb > >> + bfi x2, x0, #0, #48 // reserved ASID + new BADDR > >> + msr ttbr0_el1, x2 > >> + isb > >> +alternative_else_nop_endif > >> +#endif > >>msr ttbr0_el1, x0 // set TTBR0 > >>isb > >>post_ttbr0_update_workaround > > > > Please move the above hunk to a pre_ttbr0_update_workaround macro for > > consistency with post_ttbr0_update_workaround. > > In which case (and also for consistency), should we add that pre_ttbr0 > macro to entry.S, just before __uaccess_ttbr0_enable? It may not be > needed in the SW pan case, but it is probably worth entertaining the > idea that there may be something to do there... It may actually be needed in entry.S as well. With SW PAN, we move the context switching from cpu_do_switch_mm to the kernel_exit macro when returning to user. In this case we are switching from the reserved ASID 0 and reserved TTBR0_EL1 (pointing to a zeroed page) to the user's TTBR0_EL1 and ASID. If the ASID switch isn't taken into account, we may end up with new TLB entries being tagged with the reserved ASID. Apart from a potential loss of protection with TTBR0 PAN, is there anything else that could go wrong? Maybe a TLB conflict if we mix TLBs from multiple address spaces tagged with the same reserved ASID. If the above is an issue, we would need to patch __uaccess_ttbr0_enable() as well, though I'm more inclined to make this erratum not selectable when TTBR0 PAN is enabled. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] arm64: Work around Falkor erratum 1003
On Wed, Feb 01, 2017 at 04:33:58PM +, Will Deacon wrote: > On Wed, Feb 01, 2017 at 11:29:22AM -0500, Christopher Covington wrote: > > On 01/31/2017 12:56 PM, Marc Zyngier wrote: > > > Given that all ARMv8 CPUs can support SW_PAN, it is more likely to be > > > enabled than the ARMv8.1 PAN. I'd vote for supporting the workaround in > > > that case too, and hope that people do enable the HW version. > > > > Okay, I'll do my best to add support for the SW PAN case. I rebased and > > submitted v6 of the E1009 patch [1] so that it no longer depends on this > > patch landing first, if you all are inclined to pick it up while work on > > this E1003 patch continues. > > The alternative is not enabling SW_PAN (at runtime) if this errata is > present, along with a warning stating that hardware-PAN should be > enabled in kconfig instead. Not sure what distributions will make of that > though. The problem with this patch is that when ARM64_SW_TTBR0_PAN is enabled and in the absence of hardware PAN (or ARM64_PAN disabled), cpu_do_switch_mm is no longer called for user process switching, so the workaround is pretty much useless. I'm ok with adding the Kconfig dependency below to QCOM_FALKOR_ERRATUM_1003: depends on !ARM64_SW_TTBR0_PAN || ARM64_PAN together with a run-time warning if ARM64_SW_TTBR0_PAN is being used. I'm not keen on adding the workaround to the uaccess macros. They are complex enough already and people who do want SW PAN (and single Image) would end up with 6-8 extra unnecessary nops on each side of a uaccess (and nops are not entirely free). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] arm64: Work around Falkor erratum 1003
On Wed, Feb 01, 2017 at 05:41:05PM +, Will Deacon wrote: > On Wed, Feb 01, 2017 at 05:36:09PM +0000, Catalin Marinas wrote: > > On Wed, Feb 01, 2017 at 04:33:58PM +, Will Deacon wrote: > > > On Wed, Feb 01, 2017 at 11:29:22AM -0500, Christopher Covington wrote: > > > > On 01/31/2017 12:56 PM, Marc Zyngier wrote: > > > > > Given that all ARMv8 CPUs can support SW_PAN, it is more likely to be > > > > > enabled than the ARMv8.1 PAN. I'd vote for supporting the workaround > > > > > in > > > > > that case too, and hope that people do enable the HW version. > > > > > > > > Okay, I'll do my best to add support for the SW PAN case. I rebased and > > > > submitted v6 of the E1009 patch [1] so that it no longer depends on this > > > > patch landing first, if you all are inclined to pick it up while work on > > > > this E1003 patch continues. > > > > > > The alternative is not enabling SW_PAN (at runtime) if this errata is > > > present, along with a warning stating that hardware-PAN should be > > > enabled in kconfig instead. Not sure what distributions will make of that > > > though. > > > > The problem with this patch is that when ARM64_SW_TTBR0_PAN is enabled > > and in the absence of hardware PAN (or ARM64_PAN disabled), > > cpu_do_switch_mm is no longer called for user process switching, so the > > workaround is pretty much useless. > > Oh, I see what you mean now. > > > I'm ok with adding the Kconfig dependency below to > > QCOM_FALKOR_ERRATUM_1003: > > > > depends on !ARM64_SW_TTBR0_PAN || ARM64_PAN > > > > together with a run-time warning if ARM64_SW_TTBR0_PAN is being used. > > That makes it look like hardware-PAN is the cause of the erratum. With the right Kconfig comment we could make this clearer. > Maybe > just select ARM64_PAN if the erratum workaround is selected, then > runtime warning if we find that the h/w doesn't have PAN but does have > the erratum (which should never fire)? You still need this workaround even if you don't want any PAN (both sw and hw PAN disabled). I wouldn't want to select ARM64_PAN since it's not a dependency. It's more like if you do need a PAN, make sure you only use the hw one. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] arm64: Work around Falkor erratum 1003
On Wed, Feb 01, 2017 at 05:49:34PM +, Catalin Marinas wrote: > On Wed, Feb 01, 2017 at 05:41:05PM +, Will Deacon wrote: > > On Wed, Feb 01, 2017 at 05:36:09PM +, Catalin Marinas wrote: > > > On Wed, Feb 01, 2017 at 04:33:58PM +, Will Deacon wrote: > > > > On Wed, Feb 01, 2017 at 11:29:22AM -0500, Christopher Covington wrote: > > > > > On 01/31/2017 12:56 PM, Marc Zyngier wrote: > > > > > > Given that all ARMv8 CPUs can support SW_PAN, it is more likely to > > > > > > be > > > > > > enabled than the ARMv8.1 PAN. I'd vote for supporting the > > > > > > workaround in > > > > > > that case too, and hope that people do enable the HW version. > > > > > > > > > > Okay, I'll do my best to add support for the SW PAN case. I rebased > > > > > and > > > > > submitted v6 of the E1009 patch [1] so that it no longer depends on > > > > > this > > > > > patch landing first, if you all are inclined to pick it up while work > > > > > on > > > > > this E1003 patch continues. > > > > > > > > The alternative is not enabling SW_PAN (at runtime) if this errata is > > > > present, along with a warning stating that hardware-PAN should be > > > > enabled in kconfig instead. Not sure what distributions will make of > > > > that > > > > though. > > > > > > The problem with this patch is that when ARM64_SW_TTBR0_PAN is enabled > > > and in the absence of hardware PAN (or ARM64_PAN disabled), > > > cpu_do_switch_mm is no longer called for user process switching, so the > > > workaround is pretty much useless. > > > > Oh, I see what you mean now. > > > > > I'm ok with adding the Kconfig dependency below to > > > QCOM_FALKOR_ERRATUM_1003: > > > > > > depends on !ARM64_SW_TTBR0_PAN || ARM64_PAN > > > > > > together with a run-time warning if ARM64_SW_TTBR0_PAN is being used. > > > > That makes it look like hardware-PAN is the cause of the erratum. > > With the right Kconfig comment we could make this clearer. > > > Maybe > > just select ARM64_PAN if the erratum workaround is selected, then > > runtime warning if we find that the h/w doesn't have PAN but does have > > the erratum (which should never fire)? > > You still need this workaround even if you don't want any PAN (both sw > and hw PAN disabled). I wouldn't want to select ARM64_PAN since it's not > a dependency. It's more like if you do need a PAN, make sure you only > use the hw one. Alternatively, your select idea could be refined to: select ARM64_PAN if ARM64_SW_TTBR0_PAN but we still need a proper comment as people would wonder what this is for. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] arm64: Work around Falkor erratum 1003
On Wed, Feb 01, 2017 at 05:59:48PM +, Will Deacon wrote: > On Wed, Feb 01, 2017 at 05:49:34PM +0000, Catalin Marinas wrote: > > On Wed, Feb 01, 2017 at 05:41:05PM +, Will Deacon wrote: > > > On Wed, Feb 01, 2017 at 05:36:09PM +0000, Catalin Marinas wrote: > > > > On Wed, Feb 01, 2017 at 04:33:58PM +, Will Deacon wrote: > > > > > On Wed, Feb 01, 2017 at 11:29:22AM -0500, Christopher Covington wrote: > > > > > > On 01/31/2017 12:56 PM, Marc Zyngier wrote: > > > > > > > Given that all ARMv8 CPUs can support SW_PAN, it is more likely > > > > > > > to be > > > > > > > enabled than the ARMv8.1 PAN. I'd vote for supporting the > > > > > > > workaround in > > > > > > > that case too, and hope that people do enable the HW version. > > > > > > > > > > > > Okay, I'll do my best to add support for the SW PAN case. I rebased > > > > > > and > > > > > > submitted v6 of the E1009 patch [1] so that it no longer depends on > > > > > > this > > > > > > patch landing first, if you all are inclined to pick it up while > > > > > > work on > > > > > > this E1003 patch continues. > > > > > > > > > > The alternative is not enabling SW_PAN (at runtime) if this errata is > > > > > present, along with a warning stating that hardware-PAN should be > > > > > enabled in kconfig instead. Not sure what distributions will make of > > > > > that > > > > > though. > > > > > > > > The problem with this patch is that when ARM64_SW_TTBR0_PAN is enabled > > > > and in the absence of hardware PAN (or ARM64_PAN disabled), > > > > cpu_do_switch_mm is no longer called for user process switching, so the > > > > workaround is pretty much useless. > > > > > > Oh, I see what you mean now. > > > > > > > I'm ok with adding the Kconfig dependency below to > > > > QCOM_FALKOR_ERRATUM_1003: > > > > > > > > depends on !ARM64_SW_TTBR0_PAN || ARM64_PAN > > > > > > > > together with a run-time warning if ARM64_SW_TTBR0_PAN is being used. > > > > > > That makes it look like hardware-PAN is the cause of the erratum. > > > > With the right Kconfig comment we could make this clearer. > > It's not just a comment though, the kconfig option for the workaround > will disappear from menuconfig as long as the dependencies aren't met. > The dependency is really that SW_PAN depends on !ERRATUM_1003, but that > doesn't work for the distributions. I agree. > > > Maybe > > > just select ARM64_PAN if the erratum workaround is selected, then > > > runtime warning if we find that the h/w doesn't have PAN but does have > > > the erratum (which should never fire)? > > > > You still need this workaround even if you don't want any PAN (both sw > > and hw PAN disabled). I wouldn't want to select ARM64_PAN since it's not > > a dependency. It's more like if you do need a PAN, make sure you only > > use the hw one. > > True, in the case that all PAN options are disabled we still want this > to work. How about: > > select ARM64_PAN if ARM64_SW_TTBR0_PAN As I replied to myself, the above would work for me as well, so let's go for this. > In fact, what's the reason for supporting SW_PAN and ARM64_PAN as a > config combination? Why not just have "PAN" that enables them both and > uses the hardware feature if it's there? Because SW PAN has a non-trivial performance hit. You would enable SW PAN only if you are paranoid about security. HW PAN, OTOH, is very cheap and I wouldn't want to miss enabling it in a single Image supporting ARMv8.0 and ARMv8.1 just because SW PAN is slow on ARMv8.0. IOW, ARM64_PAN is default y while ARM64_SW_TTBR0_PAN is default n. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] arm64: Work around Falkor erratum 1003
On Wed, Feb 01, 2017 at 06:34:01PM +, Will Deacon wrote: > On Wed, Feb 01, 2017 at 06:22:44PM +0000, Catalin Marinas wrote: > > On Wed, Feb 01, 2017 at 05:59:48PM +, Will Deacon wrote: > > > On Wed, Feb 01, 2017 at 05:49:34PM +0000, Catalin Marinas wrote: > > > > On Wed, Feb 01, 2017 at 05:41:05PM +, Will Deacon wrote: > > > > > Maybe > > > > > just select ARM64_PAN if the erratum workaround is selected, then > > > > > runtime warning if we find that the h/w doesn't have PAN but does have > > > > > the erratum (which should never fire)? > > > > > > > > You still need this workaround even if you don't want any PAN (both sw > > > > and hw PAN disabled). I wouldn't want to select ARM64_PAN since it's not > > > > a dependency. It's more like if you do need a PAN, make sure you only > > > > use the hw one. > > > > > > True, in the case that all PAN options are disabled we still want this > > > to work. How about: > > > > > > select ARM64_PAN if ARM64_SW_TTBR0_PAN > > > > As I replied to myself, the above would work for me as well, so let's go > > for this. > > > > > In fact, what's the reason for supporting SW_PAN and ARM64_PAN as a > > > config combination? Why not just have "PAN" that enables them both and > > > uses the hardware feature if it's there? > > > > Because SW PAN has a non-trivial performance hit. You would enable SW > > PAN only if you are paranoid about security. HW PAN, OTOH, is very cheap > > and I wouldn't want to miss enabling it in a single Image supporting > > ARMv8.0 and ARMv8.1 just because SW PAN is slow on ARMv8.0. > > > > IOW, ARM64_PAN is default y while ARM64_SW_TTBR0_PAN is default n. > > Ok, in that case, then how about another permutation: we make > ARM64_SW_TTBR0_PAN depend on ARM64_PAN? Then when you select "PAN Support" > you get a new menu option underneath it for the emulation? I think that > solves the erratum case and the use-case above. The problem is that ARM64_PAN is an ARMv8.1 feature and currently grouped accordingly in Kconfig. ARM64_SW_TTBR0_PAN is complementary (and even not recommended on ARMv8.1). We can do this if we break the ARMv8.x feature grouping but just for this erratum, I don't think it's worth. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] arm64: Work around Falkor erratum 1003
On Tue, Feb 07, 2017 at 07:35:16PM -0500, Christopher Covington wrote: > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -480,6 +480,18 @@ config CAVIUM_ERRATUM_27456 > > If unsure, say Y. > > +config QCOM_FALKOR_ERRATUM_1003 > + bool "Falkor E1003: Incorrect translation due to ASID change" > + default y > + select ARM64_PAN if ARM64_SW_TTBR0_PAN > + help > + On Falkor v1, an incorrect ASID may be cached in the TLB when ASID > + and BADDR are changed together in TTBRx_EL1. The workaround for this > + issue is to use a reserved ASID in cpu_do_switch_mm() before > + switching to the new ASID. > + > + If unsure, say Y. It would be good to have a comment here on why PAN is selected. > --- a/arch/arm64/mm/context.c > +++ b/arch/arm64/mm/context.c > @@ -79,6 +79,13 @@ void verify_cpu_asid_bits(void) > } > } > > +static void set_reserved_asid_bits(void) > +{ > + if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) && > + cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003)) > + __set_bit(FALKOR_RESERVED_ASID, asid_map); > +} You should use cpus_have_const_cap() as it would be optimised using jump labels. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7] arm64: Work around Falkor erratum 1003
On Wed, Feb 08, 2017 at 03:08:37PM -0500, Christopher Covington wrote: > The Qualcomm Datacenter Technologies Falkor v1 CPU may allocate TLB entries > using an incorrect ASID when TTBRx_EL1 is being updated. When the erratum > is triggered, page table entries using the new translation table base > address (BADDR) will be allocated into the TLB using the old ASID. All > circumstances leading to the incorrect ASID being cached in the TLB arise > when software writes TTBRx_EL1[ASID] and TTBRx_EL1[BADDR], a memory > operation is in the process of performing a translation using the specific > TTBRx_EL1 being written, and the memory operation uses a translation table > descriptor designated as non-global. EL2 and EL3 code changing the EL1&0 > ASID is not subject to this erratum because hardware is prohibited from > performing translations from an out-of-context translation regime. > > Consider the following pseudo code. > > write new BADDR and ASID values to TTBRx_EL1 > > Replacing the above sequence with the one below will ensure that no TLB > entries with an incorrect ASID are used by software. > > write reserved value to TTBRx_EL1[ASID] > ISB > write new value to TTBRx_EL1[BADDR] > ISB > write new value to TTBRx_EL1[ASID] > ISB > > When the above sequence is used, page table entries using the new BADDR > value may still be incorrectly allocated into the TLB using the reserved > ASID. Yet this will not reduce functionality, since TLB entries incorrectly > tagged with the reserved ASID will never be hit by a later instruction. > > Based on work by Shanker Donthineni > > Signed-off-by: Christopher Covington Reviewed-by: Catalin Marinas -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 resend 00/20] ILP32 for ARM64
On Mon, Apr 10, 2017 at 11:47:40PM +0400, Yury Norov wrote: > According to latest plans figured out on Linaro Connect, ILP32 should > be taken in 4.12 merge window. Sorry, I wasn't present at Linaro Connect, so definitely not involved in such decision. BTW, it would be nice to have Arnd's ack on patch 2 (32-bit ABI: introduce ARCH_32BIT_OFF_T config option). > The window will be opened in less than a month, so I'd like to remind > it to you, and ask if you have any questions/requests related to > ILP32. Is it still realistic idea to take patches in 4.12? 4.12 is not realistic and I wouldn't commit to a specific kernel version. Given the intrusiveness, such patches should sit in -next for at least 3-4 weeks (i.e. merged in the arch tree around -rc3). Anyway, I don't think the plan has changed since last time I stated it: https://lkml.org/lkml/2016/12/5/333 I haven't got the chance to test these patches yet, run benchmarks (step 4). Also, the latest benchmarks I've seen were mostly for user space while I'm more concerned with the user-kernel interface (https://marc.info/?l=linux-arm-kernel&m=148690490713310&w=2). Is there an up to date pre-built toolchain and a filesystem for ILP32? On the glibc testing side, have the regressions been identified/fixed? -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/20] arm64: rename COMPAT to AARCH32_EL0 in Kconfig
On Sun, Jun 04, 2017 at 02:59:54PM +0300, Yury Norov wrote: > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -402,7 +402,7 @@ config ARM64_ERRATUM_834220 > > config ARM64_ERRATUM_845719 > bool "Cortex-A53: 845719: a load might read incorrect data" > - depends on COMPAT > + depends on AARCH32_EL0 > default y > help > This option adds an alternative code sequence to work around ARM > @@ -784,7 +784,7 @@ config FORCE_MAX_ZONEORDER > > menuconfig ARMV8_DEPRECATED > bool "Emulate deprecated/obsolete ARMv8 instructions" > - depends on COMPAT > + depends on AARCH32_EL0 > help > Legacy software support may require certain instructions > that have been deprecated or obsoleted in the architecture. > @@ -1062,9 +1062,15 @@ menu "Userspace binary formats" > source "fs/Kconfig.binfmt" > > config COMPAT > + bool > + depends on AARCH32_EL0 You could just use "def_bool y" here > + > +config AARCH32_EL0 > bool "Kernel support for 32-bit EL0" > + def_bool y > depends on ARM64_4K_PAGES || EXPERT > select COMPAT_BINFMT_ELF if BINFMT_ELF > + select COMPAT and avoid the explicit select. > select HAVE_UID16 > select OLD_SIGSUSPEND3 > select COMPAT_OLD_SIGACTION [...] > --- a/arch/arm64/kernel/cpuinfo.c > +++ b/arch/arm64/kernel/cpuinfo.c > @@ -139,15 +139,17 @@ static int c_show(struct seq_file *m, void *v) >*/ > seq_puts(m, "Features\t:"); > if (compat) { > -#ifdef CONFIG_COMPAT > - for (j = 0; compat_hwcap_str[j]; j++) > - if (compat_elf_hwcap & (1 << j)) > - seq_printf(m, " %s", > compat_hwcap_str[j]); > - > - for (j = 0; compat_hwcap2_str[j]; j++) > - if (compat_elf_hwcap2 & (1 << j)) > - seq_printf(m, " %s", > compat_hwcap2_str[j]); > -#endif /* CONFIG_COMPAT */ > +#ifdef CONFIG_AARCH32_EL0 > + if (personality(current->personality) == PER_LINUX32) { > + for (j = 0; compat_hwcap_str[j]; j++) > + if (compat_elf_hwcap & (1 << j)) > + seq_printf(m, " %s", > compat_hwcap_str[j]); > + > + for (j = 0; compat_hwcap2_str[j]; j++) > + if (compat_elf_hwcap2 & (1 << j)) > + seq_printf(m, " %s", > compat_hwcap2_str[j]); > + } > +#endif /* CONFIG_AARCH32_EL0 */ I don't understand this hunk. Why do you need another check on personality? "compat" is already true if PER_LINUX32. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] arm64: ilp32: share aarch32 syscall handlers
On Sun, Jun 04, 2017 at 03:00:02PM +0300, Yury Norov wrote: > off_t is passed in register pair just like in aarch32. > In this patch corresponding aarch32 handlers are shared to > ilp32 code. Is the comment here relevant? IOW, do we have any AArch64/ILP32 syscall where off_t is used as an argument? AFAICT, the *64 syscalls use loff_t or loff_t *. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/20] 32-bit ABI: introduce ARCH_32BIT_OFF_T config option
On Sun, Jun 04, 2017 at 02:59:51PM +0300, Yury Norov wrote: > All new 32-bit architectures should have 64-bit off_t type, but existing > architectures has 32-bit ones. > > To handle it, new config option is added to arch/Kconfig that defaults > ARCH_32BIT_OFF_T to be disabled for non-64 bit architectures. All existing > 32-bit architectures enable it explicitly here. > > New option affects force_o_largefile() behaviour. Namely, if off_t is > 64-bits long, we have no reason to reject user to open big files. > > Note that even if architectures has only 64-bit off_t in the kernel > (arc, c6x, h8300, hexagon, metag, nios2, openrisc, tile32 and unicore32), > a libc may use 32-bit off_t, and therefore want to limit the file size > to 4GB unless specified differently in the open flags. > > Signed-off-by: Yury Norov > Acked-by: Arnd Bergmann [...] > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h > index 1b48d9c9a561..297993c92490 100644 > --- a/include/linux/fcntl.h > +++ b/include/linux/fcntl.h > @@ -11,7 +11,7 @@ >O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) > > #ifndef force_o_largefile > -#define force_o_largefile() (BITS_PER_LONG != 32) > +#define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) > #endif I may have confused myself with which off_t is 64-bit here for new 32-bit architectures. Are we referring to the glibc definition, the kernel one or simply that force_o_largefile() is true by default. Because the type off_t for 32-bit kernel builds is still, well, 32-bit. Otherwise it seems that the first paragraph in the description above should read "all new 32-bit ABIs on a 64-bit kernel..." but then AArch64/ILP32 is no longer the same as a new, pure 32-bit architecture. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/20] arm64:ilp32: add vdso-ilp32 and use for signal return
On Sun, Jun 04, 2017 at 03:00:08PM +0300, Yury Norov wrote: > From: Philipp Tomsich > > ILP32 VDSO exports following symbols: > __kernel_rt_sigreturn; > __kernel_gettimeofday; > __kernel_clock_gettime; > __kernel_clock_getres. > > What shared object to use, kernel selects depending on result of > is_ilp32_compat_task() in arch/arm64/kernel/vdso.c, so it substitutes > correct pages and spec. > > Adjusted to move the data page before code pages in sync with > commit 601255ae3c98 ("arm64: vdso: move data page before code pages") > > Signed-off-by: Philipp Tomsich > Signed-off-by: Christoph Muellner > Signed-off-by: Yury Norov > Signed-off-by: Bamvor Jian Zhang > --- > arch/arm64/Makefile | 3 + > arch/arm64/include/asm/vdso.h | 6 ++ > arch/arm64/kernel/Makefile| 1 + > arch/arm64/kernel/asm-offsets.c | 7 ++ > arch/arm64/kernel/signal.c| 2 + > arch/arm64/kernel/vdso-ilp32/.gitignore | 2 + > arch/arm64/kernel/vdso-ilp32/Makefile | 80 ++ > arch/arm64/kernel/vdso-ilp32/vdso-ilp32.S | 33 ++ > arch/arm64/kernel/vdso-ilp32/vdso-ilp32.lds.S | 95 > +++ > arch/arm64/kernel/vdso.c | 65 +++--- > arch/arm64/kernel/vdso/gettimeofday.S | 20 +- > arch/arm64/kernel/vdso/vdso.S | 6 +- > 12 files changed, 304 insertions(+), 16 deletions(-) > create mode 100644 arch/arm64/kernel/vdso-ilp32/.gitignore > create mode 100644 arch/arm64/kernel/vdso-ilp32/Makefile > create mode 100644 arch/arm64/kernel/vdso-ilp32/vdso-ilp32.S > create mode 100644 arch/arm64/kernel/vdso-ilp32/vdso-ilp32.lds.S Does this patch get simpler with Andrew Pinski's vdso in C proposal? I have to read the other thread in detail, Will followed up already. > diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h > index 839ce0031bd5..649a9a416500 100644 > --- a/arch/arm64/include/asm/vdso.h > +++ b/arch/arm64/include/asm/vdso.h > @@ -29,6 +29,12 @@ > > #include > > +#ifdef CONFIG_ARM64_ILP32 > +#include > +#else > +#define vdso_offset_sigtramp_ilp32 > +#endif BTW, here you could do something like: #define vdso_offset_sigtramp_ilp32 ({ BUILD_BUG(); 0; }) -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/20] arm64: rename COMPAT to AARCH32_EL0 in Kconfig
On Fri, Jun 09, 2017 at 01:40:59AM +0300, Yury Norov wrote: > On Thu, Jun 08, 2017 at 03:09:12PM +0100, Catalin Marinas wrote: > > On Sun, Jun 04, 2017 at 02:59:54PM +0300, Yury Norov wrote: > > > --- a/arch/arm64/Kconfig > > > +++ b/arch/arm64/Kconfig > > > @@ -402,7 +402,7 @@ config ARM64_ERRATUM_834220 > > > > > > config ARM64_ERRATUM_845719 > > > bool "Cortex-A53: 845719: a load might read incorrect data" > > > - depends on COMPAT > > > + depends on AARCH32_EL0 > > > default y > > > help > > > This option adds an alternative code sequence to work around ARM > > > @@ -784,7 +784,7 @@ config FORCE_MAX_ZONEORDER > > > > > > menuconfig ARMV8_DEPRECATED > > > bool "Emulate deprecated/obsolete ARMv8 instructions" > > > - depends on COMPAT > > > + depends on AARCH32_EL0 > > > help > > > Legacy software support may require certain instructions > > > that have been deprecated or obsoleted in the architecture. > > > @@ -1062,9 +1062,15 @@ menu "Userspace binary formats" > > > source "fs/Kconfig.binfmt" > > > > > > config COMPAT > > > + bool > > > + depends on AARCH32_EL0 > > > > You could just use "def_bool y" here > > > > > + > > > +config AARCH32_EL0 > > > bool "Kernel support for 32-bit EL0" > > > + def_bool y > > > depends on ARM64_4K_PAGES || EXPERT > > > select COMPAT_BINFMT_ELF if BINFMT_ELF > > > + select COMPAT > > > > and avoid the explicit select. > > in patch 20 COMPAT becomes depending also on ARM64_ILP32, like this: > - depends on AARCH32_EL0 > + depends on AARCH32_EL0 || ARM64_ILP32 > > So this is a preparation for it. If it looks confusing, I think it's > better to underline it in the description to the patch in addition to > this: > > > From now, AARCH32_EL0 (former COMPAT) config option means the support of > > AARCH32 userspace, ARM64_ILP32 - support of ILP32 ABI (see next patches), > > and COMPAT indicates that one of them, or both, is enabled. > > But if you prefer, I can do like you suggested here and make COMPAT > depend on AARCH32_EL0 in the last patch. What I meant is that if you define COMPAT as "def_bool y", you no longer need the explicit "select COMPAT". When AARCH32_EL0 is disabled, COMPAT would automatically be disabled because of the "depends on AARCH32_EL0" line. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/20] 32-bit ABI: introduce ARCH_32BIT_OFF_T config option
On Tue, Jun 13, 2017 at 02:04:11PM +0300, Yury Norov wrote: > On Thu, Jun 08, 2017 at 04:09:50PM +0100, Catalin Marinas wrote: > > On Sun, Jun 04, 2017 at 02:59:51PM +0300, Yury Norov wrote: > > > All new 32-bit architectures should have 64-bit off_t type, but existing > > > architectures has 32-bit ones. > > > > > > To handle it, new config option is added to arch/Kconfig that defaults > > > ARCH_32BIT_OFF_T to be disabled for non-64 bit architectures. All existing > > > 32-bit architectures enable it explicitly here. > > > > > > New option affects force_o_largefile() behaviour. Namely, if off_t is > > > 64-bits long, we have no reason to reject user to open big files. > > > > > > Note that even if architectures has only 64-bit off_t in the kernel > > > (arc, c6x, h8300, hexagon, metag, nios2, openrisc, tile32 and unicore32), > > > a libc may use 32-bit off_t, and therefore want to limit the file size > > > to 4GB unless specified differently in the open flags. > > > > > > Signed-off-by: Yury Norov > > > Acked-by: Arnd Bergmann > > [...] > > > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h > > > index 1b48d9c9a561..297993c92490 100644 > > > --- a/include/linux/fcntl.h > > > +++ b/include/linux/fcntl.h > > > @@ -11,7 +11,7 @@ > > >O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) > > > > > > #ifndef force_o_largefile > > > -#define force_o_largefile() (BITS_PER_LONG != 32) > > > +#define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) > > > #endif > > > > I may have confused myself with which off_t is 64-bit here for new > > 32-bit architectures. Are we referring to the glibc definition, the > > kernel one or simply that force_o_largefile() is true by default. > > Because the type off_t for 32-bit kernel builds is still, well, 32-bit. > > > > Otherwise it seems that the first paragraph in the description above > > should read "all new 32-bit ABIs on a 64-bit kernel..." but then > > AArch64/ILP32 is no longer the same as a new, pure 32-bit architecture. > > This is all about userspace off_t types, like Arnd told in the comment > to patch 13. I'll underline it in the comment to the patch. If it's > not enough, I can also rename the config option to > CONFIG_ARCH_32BIT_USER_OFF_T or similar. For me it's too much, but if > you find it reasonable, I'll do it. Just let me know. Thanks for clarification. I had the impression that it should match the kernel's off_t (which is exported in the kernel headers as 32-bit) but compiling with -mabi=ilp32 indeed shows sizeof(off_t) == 8. So that's just a user decision to use loff_t instead and such port shouldn't use any of the syscalls that pass the kernel's off_t. I would rather see the comment in the arch/Kconfig help entry in this patch for future reference. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/20] ILP32 for ARM64
Hi Yury, On Mon, Jun 19, 2017 at 06:49:43PM +0300, Yury Norov wrote: > This series enables aarch64 with ilp32 mode. Thanks for putting this series together, I do appreciate the effort. There are still some review comments coming in but I'm happy with how the ABI looks now. I did some LTP testing (AArch64/LP64, AArch64/ILP32, AArch32) and benchmarking and didn't see any regressions (apart from an LTP bug with sync_file_range2). James Morse is working on reproducing similar testing in ARM. Szabolcs reported some glibc test-suite regressions on the libc-alpha list which I assume will be followed up. VDSO in C is another issue I'd like sorted but this is not strictly specific to ILP32 and can be done as a follow up. Note that I didn't run any big-endian tests, though this is something that needs doing. Now, having agreed on the ABI and implementation very close to being ready doesn't necessarily make the code suitable for upstream. With my maintainer hat on, I'm trying to see where ILP32 will be in 2-5-10 years, whether anyone still cares about it in this time frame. The difference from a driver or SoC support is that ABIs are very hard to revert, though are as (or even more) likely to bit-rot when not in use or regularly tested (we have the big-endian experience here). There are two main aspects to make the code upstream-worthy: 1. Actual/real users (current, future). I don't mean just a few distros showing that it can be done but actual/planned real deployments 2. Long term testing/maintenance plan. This is not about kernel code maintenance but a healthy ILP32 ecosystem: a) readily available toolchains (x86-hosted and AArch64-hosted) b) filesystems (can be large distros like openSUSE or more embedded-oriented like Yocto or OpenEmbedded) c) suitable continuous regression testing (kernel + userland) d) commitment from all parties involved (including ARM Ltd) to treat the ILP32 ABI as a (nearly) first class citizen It is pretty clear from private discussions that there are potential users but at the moment I can't tell if those would turn into real deployments of production systems. As for (2), the long term plans are not convincing (or I haven't spotted them yet), so I'd like to see the interested parties putting a plan together (something along the lines of kernelci.org + LTP, glibc buildbot). What I'd like to propose is that Will and I (as arm64 maintainers, maybe with with the help of others including this series' authors) take over the series and push it to a staging branch under the arm64 kernel on git.kernel.org. This is aimed as a commitment to keep the ABI *stable* and will be rebased with every kernel release (starting with 4.13). The decision to merge upstream will be revisited every 6 months, assessing the progress on the points I mentioned above, with a time limit of 2 years when, if still not upstream, we will stop maintaining such branch. I am aware that the above proposal has an impact on the glibc patches since they will not merge a new ABI upstream until officially supported by the kernel. I cc'ed some of the glibc developers and they will follow up on the libc-alpha list. > As supporting work, it introduces ARCH_32BIT_OFF_T configuration > option that is enabled for existing 32-bit architectures but disabled > for new arches (so 64-bit off_t userspace type is used by new userspace). > Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64. [...] > Patches 1, 2, 3 and 8 are general, and may be applied separately. These 4 patches should be merged independently, I don't see a point in carrying them with the ILP32 series. Arnd, are you ok to push them upstream? BTW, patch 3 seems to never make it to the linux-arm-kernel list, I guess too many on cc. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/20] ILP32 for ARM64
Hi Yury, Just a quick reply as I'm about to go on holiday for the next two weeks. On Fri, Jul 07, 2017 at 12:59:02AM +0300, Yury Norov wrote: > On Thu, Jun 29, 2017 at 05:10:36PM +0100, Catalin Marinas wrote: > > On Mon, Jun 19, 2017 at 06:49:43PM +0300, Yury Norov wrote: > > > This series enables aarch64 with ilp32 mode. > > > > What I'd like to propose is that Will and I (as arm64 maintainers, maybe > > with with the help of others including this series' authors) take over > > the series and push it to a staging branch under the arm64 kernel on > > git.kernel.org. This is aimed as a commitment to keep the ABI *stable* > > and will be rebased with every kernel release (starting with 4.13). The > > decision to merge upstream will be revisited every 6 months, assessing > > the progress on the points I mentioned above, with a time limit of 2 > > years when, if still not upstream, we will stop maintaining such branch. > > Thanks for the email. I appreciate your concern about long-term > support load for a new ABI. I also think that stabilizing the ABI > is a good idea. > > At this point, most people expect the ABI to not change unless > critical issues are uncovered. IMO, if there is a good technical reason > to change the ABI -- the change will happen even on the "staging" branch. What I would like is that we change the ABI *only* if there is a serious bug, otherwise we should try hard to keep it stable. The "good technical reason" can be subjective (i.e. let's pass 64-bit arguments in a single register because some benchmark is slightly faster; with a wider user base, we may get more suggestions for ABI changes that we should reject). > And vise-versa, if there is no need for a change, the ABI will be stable > on my local branch, just like on staging branch you propose. I think it > will be that way until there will appear strong community of users who > will resist the changing of ABI. From this point of view, I don't see > major difference for ILP32 where to host the patchset. If we don't treat the ABI as stable now, regardless of the number of users, then it is not ready for upstream (we already have a user in the openSUSE build). The arm64 git.kernel.org suggestion was more of an endorsement from the maintainers that the ABI is stable. If you want to maintain it in your own tree, that's fine by me. If you want wider visibility, we could mirror it to git.kernel.org (though given how many trees are there, it may not mean much). > Is my understanding correct that you, Will and me will be responsible > for rebasing and maintainance of patches? To be clear, this it not an > automatic task - sometimes simple rebase take the whole day of my time, > and I rebase almost every week. The rebase in 2-month timeframe may become > unpredictable task, by time and amount of work. I think you understand what > I mean, as once before you already told that the series is too intrusive. I am fully aware there is significant effort to rebase the series and if you can help maintain such branch it would be great. I don't see the point of rebasing weekly though and it's not just the git handling process but doing the validation of such branch, regression testing etc. If it's too time consuming, we could do it only for LTS releases. > To make it more easy for maintainance, I would suggest to split the series > to 3 parts: > - arch and generic patches that not related to ilp32 or arm64 (already >done); That's fine. > - arm64 patches that do cleanup and refactoring in aim to apply >ilp32 patches smoothly, but not ilp32-specific; If there are such changes, that's fine. However, I wouldn't merge the AARCH32_EL0 #ifdef'ery since it's unnecessary if we never merge the ILP32 patches. > If we'll follow your suggestion, does it mean that you expect the 4.12-based > branch from me soon to put on staging? We can create one for 4.12 if you want (in about 2-3 weeks time when I'm back). If there is no rush we could aim for 4.13 (there are some non-trivial conflicts in the sigframe handling code as we are preparing it for SVE support). > > The decision to merge upstream will be revisited every 6 months, > > assessing the progress on the points I mentioned above, with a time > > limit of 2 years > > IIUC, this is your personal decision based on responses and comments > from community? Yes, as arm64 kernel maintainer. > If so, I would like to ask you to do the first ILP32 community poll > now, not in 6 months. So we'll collect opinions and requests from > people interested in ILP32, and in 6 month will be able to check the > progress. I would like to see this thread public because if we are not > taking ILP32
Re: [PATCH v8 00/20] ILP32 for ARM64
Hi Yury, On Thu, Nov 16, 2017 at 02:11:30PM +0300, Yury Norov wrote: > This is ILP32 patches on top of 4.14 kernel: > https://github.com/norov/linux/commits/ilp32-4.14 > > I tested the series with LTP lite built by Linaro toolchain, and no > regressions found. Thanks. I gave it a try as well with LTP and pushed a staging/ilp32-4.14 branch to the arm64 tree on git.kernel.org. BTW, you've added two random patches on top of this branch (which I removed). > By the way, do you have plans to upstream arch subseries? > https://lkml.org/lkml/2017/9/25/574 I was hoping Arnd would pick them up since they are rather UAPI specific than arm64. BTW, I wonder whether the nds32 patches actually follow the proposed defaults in your patches like force_o_largefile() and get/setrlimit: https://lkml.org/lkml/2017/11/27/474 (I haven't reviewed the nds32 patches in detail though) -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] arm64: v8.4: Support for new floating point multiplication instructions
On Fri, Jan 05, 2018 at 04:22:24PM +0800, gengdongjiu wrote: > On 2018/1/5 15:57, Greg KH wrote: > > On Fri, Jan 05, 2018 at 09:22:54AM +0800, gengdongjiu wrote: > >> Hi will/catalin > >> > >> On 2017/12/13 18:09, Suzuki K Poulose wrote: > >>> On 13/12/17 10:13, Dongjiu Geng wrote: > ARM v8.4 extensions add new neon instructions for performing a > multiplication of each FP16 element of one vector with the corresponding > FP16 element of a second vector, and to add or subtract this without an > intermediate rounding to the corresponding FP32 element in a third > vector. > > This patch detects this feature and let the userspace know about it via a > HWCAP bit and MRS emulation. > > Cc: Dave Martin > Cc: Suzuki K Poulose > Signed-off-by: Dongjiu Geng > Reviewed-by: Dave Martin > >>> > >>> Looks good to me. > >>> > >>> Reviewed-by: Suzuki K Poulose > >> > >> sorry to disturb you. Reminder, hope this patch can be applied to Linux > >> 4.15-rc7. > > > > New features should not be going into 4.15-rc, that should be a 4.16-rc1 > > thing, right? > > It is also great if it can be applied to 4.16-rc1. Thanks a lot! I will queue it for 4.16-rc1. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/8] arm64: untag user pointers passed to the kernel
Hi Andrey, On Thu, Nov 08, 2018 at 03:48:10PM +0100, Andrey Konovalov wrote: > On Thu, Nov 8, 2018 at 3:36 PM, Andrey Konovalov > wrote: > > Changes in v8: > > - Rebased onto 65102238 (4.20-rc1). > > - Added a note to the cover letter on why syscall wrappers/shims that untag > > user pointers won't work. > > - Added a note to the cover letter that this patchset has been merged into > > the Pixel 2 kernel tree. > > - Documentation fixes, in particular added a list of syscalls that don't > > support tagged user pointers. > > I've changed the documentation to be more specific, please take a look. > > I haven't done anything about adding a way for the user to find out > that the kernel supports this ABI extension. I don't know what would > the the preferred way to do this, and we haven't received any comments > on that from anybody else. Probing "on some innocuous syscall > currently returning -EFAULT on tagged pointer arguments" works though, > as you mentioned. We've had some internal discussions and also talked to some people at Plumbers. I think the best option is to introduce an AT_FLAGS bit to describe the ABI relaxation on tagged pointers. Vincenzo is going to propose a patch on top of this series. > As mentioned in the cover letter, this patchset has been merged into > the Pixel 2 kernel tree. I just hope it's not enabled on production kernels, it would introduce a user ABI that may differ from what ends up upstream. -- Catalin
Re: [PATCH v8 1/8] arm64: add type casts to untagged_addr macro
On Thu, Nov 08, 2018 at 03:36:08PM +0100, Andrey Konovalov wrote: > This patch makes the untagged_addr macro accept all kinds of address types > (void *, unsigned long, etc.) and allows not to specify type casts in each > place where it is used. This is done by using __typeof__. > > Signed-off-by: Andrey Konovalov > --- > arch/arm64/include/asm/uaccess.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/uaccess.h > b/arch/arm64/include/asm/uaccess.h > index 07c34087bd5e..c1325271e368 100644 > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@ -101,7 +101,8 @@ static inline unsigned long __range_ok(const void __user > *addr, unsigned long si > * up with a tagged userland pointer. Clear the tag to get a sane pointer to > * pass on to access_ok(), for instance. > */ > -#define untagged_addr(addr) sign_extend64(addr, 55) > +#define untagged_addr(addr) \ > + ((__typeof__(addr))sign_extend64((__u64)(addr), 55)) Nitpick: same comment as here (use u64): http://lkml.kernel.org/r/20181123173739.osgvnnhmptdgt...@lakrids.cambridge.arm.com Acked-by: Catalin Marinas (not acking the whole series just yet, only specific patches to remember what I reviewed)
Re: [PATCH v8 2/8] uaccess: add untagged_addr definition for other arches
On Thu, Nov 08, 2018 at 03:36:09PM +0100, Andrey Konovalov wrote: > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h > index efe79c1cdd47..c045b4eff95e 100644 > --- a/include/linux/uaccess.h > +++ b/include/linux/uaccess.h > @@ -13,6 +13,10 @@ > > #include > > +#ifndef untagged_addr > +#define untagged_addr(addr) addr > +#endif Nitpick: add braces around (addr). Otherwise: Acked-by: Catalin Marinas
Re: [PATCH v8 3/8] arm64: untag user addresses in access_ok and __uaccess_mask_ptr
On Thu, Nov 08, 2018 at 03:36:10PM +0100, Andrey Konovalov wrote: > copy_from_user (and a few other similar functions) are used to copy data > from user memory into the kernel memory or vice versa. Since a user can > provided a tagged pointer to one of the syscalls that use copy_from_user, > we need to correctly handle such pointers. > > Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr, > before performing access validity checks. > > Signed-off-by: Andrey Konovalov Reviewed-by: Catalin Marinas
Re: [PATCH v8 0/8] arm64: untag user pointers passed to the kernel
On Thu, Dec 06, 2018 at 01:44:24PM +0100, Andrey Konovalov wrote: > On Thu, Nov 29, 2018 at 7:16 PM Catalin Marinas > wrote: > > On Thu, Nov 08, 2018 at 03:48:10PM +0100, Andrey Konovalov wrote: > > > On Thu, Nov 8, 2018 at 3:36 PM, Andrey Konovalov > > > wrote: > > > > Changes in v8: > > > > - Rebased onto 65102238 (4.20-rc1). > > > > - Added a note to the cover letter on why syscall wrappers/shims that > > > > untag > > > > user pointers won't work. > > > > - Added a note to the cover letter that this patchset has been merged > > > > into > > > > the Pixel 2 kernel tree. > > > > - Documentation fixes, in particular added a list of syscalls that don't > > > > support tagged user pointers. > > > > > > I've changed the documentation to be more specific, please take a look. > > > > > > I haven't done anything about adding a way for the user to find out > > > that the kernel supports this ABI extension. I don't know what would > > > the the preferred way to do this, and we haven't received any comments > > > on that from anybody else. Probing "on some innocuous syscall > > > currently returning -EFAULT on tagged pointer arguments" works though, > > > as you mentioned. > > > > We've had some internal discussions and also talked to some people at > > Plumbers. I think the best option is to introduce an AT_FLAGS bit to > > describe the ABI relaxation on tagged pointers. Vincenzo is going to > > propose a patch on top of this series. > > So should I wait for a patch from Vincenzo before posting v9 or post > it as is? Or try to develop this patch myself? The reason Vincenzo hasn't posted his patches yet is that we are still debating internally how to document which syscalls accept non-zero top-byte, what to do with future syscalls for which we don't know the semantics. Happy to take the discussion to the public list if Vincenzo posts his patches. The conclusion of the ABI discussion may have an impact on the actual implementation that you are proposing in this series. -- Catalin
Re: [RFC][PATCH 0/3] arm64 relaxed ABI
Hi Andrey, On Wed, Dec 12, 2018 at 03:23:25PM +0100, Andrey Konovalov wrote: > On Mon, Dec 10, 2018 at 3:31 PM Vincenzo Frascino > wrote: > > On arm64 the TCR_EL1.TBI0 bit has been set since Linux 3.x hence > > the userspace (EL0) is allowed to set a non-zero value in the top > > byte but the resulting pointers are not allowed at the user-kernel > > syscall ABI boundary. > > > > This patchset proposes a relaxation of the ABI and a mechanism to > > advertise it to the userspace via an AT_FLAGS. > > > > The rationale behind the choice of AT_FLAGS is that the Unix System V > > ABI defines AT_FLAGS as "flags", leaving some degree of freedom in > > interpretation. > > There are two previous attempts of using AT_FLAGS in the Linux Kernel > > for different reasons: the first was more generic and was used to expose > > the support for the GNU STACK NX feature [1] and the second was done for > > the MIPS architecture and was used to expose the support of "MIPS ABI > > Extension for IEEE Std 754 Non-Compliant Interlinking" [2]. > > Both the changes are currently _not_ merged in mainline. > > The only architecture that reserves some of the bits in AT_FLAGS is > > currently MIPS, which introduced the concept of platform specific ABI > > (psABI) reserving the top-byte [3]. > > > > When ARM64_AT_FLAGS_SYSCALL_TBI is set the kernel is advertising > > to the userspace that a relaxed ABI is supported hence this type > > of pointers are now allowed to be passed to the syscalls when they are > > in memory ranges obtained by anonymous mmap() or brk(). > > > > The userspace _must_ verify that the flag is set before passing tagged > > pointers to the syscalls allowed by this relaxation. > > > > More in general, exposing the ARM64_AT_FLAGS_SYSCALL_TBI flag and mandating > > to the software to check that the feature is present, before using the > > associated functionality, it provides a degree of control on the decision > > of disabling such a feature in future without consequently breaking the > > userspace. [...] > Acked-by: Andrey Konovalov Thanks for the ack. However, if we go ahead with this ABI proposal it means that your patches need to be reworked to allow a non-zero top byte in all syscalls, including mmap() and friends, ioctl(). There are ABI concerns in either case but I'd rather have this discussion in the open. It doesn't necessarily mean that I endorse this proposal, I would like feedback and not just from kernel developers but user space ones. The summary of our internal discussions (mostly between kernel developers) is that we can't properly describe a user ABI that covers future syscalls or syscall extensions while not all syscalls accept tagged pointers. So we tweaked the requirements slightly to only allow tagged pointers back into the kernel *if* the originating address is from an anonymous mmap() or below sbrk(0). This should cover some of the ioctls or getsockopt(TCP_ZEROCOPY_RECEIVE) where the user passes a pointer to a buffer obtained via mmap() on the device operations. (sorry for not being clear on what Vincenzo's proposal implies) -- Catalin
Re: [RFC][PATCH 0/3] arm64 relaxed ABI
On Tue, Dec 18, 2018 at 04:03:38PM +0100, Andrey Konovalov wrote: > On Wed, Dec 12, 2018 at 4:02 PM Catalin Marinas > wrote: > > The summary of our internal discussions (mostly between kernel > > developers) is that we can't properly describe a user ABI that covers > > future syscalls or syscall extensions while not all syscalls accept > > tagged pointers. So we tweaked the requirements slightly to only allow > > tagged pointers back into the kernel *if* the originating address is > > from an anonymous mmap() or below sbrk(0). This should cover some of the > > ioctls or getsockopt(TCP_ZEROCOPY_RECEIVE) where the user passes a > > pointer to a buffer obtained via mmap() on the device operations. > > > > (sorry for not being clear on what Vincenzo's proposal implies) > > OK, I see. So I need to make the following changes to my patchset AFAIU. > > 1. Make sure that we only allow tagged user addresses that originate > from an anonymous mmap() or below sbrk(0). How exactly should this > check be performed? I don't think we should perform such checks. That's rather stating that the kernel only guarantees that the tagged pointers work if they originated from these memory ranges. > 2. Allow tagged addressed passed to memory syscalls (as long as (1) is > satisfied). Do I understand correctly that this means that I need to > locate all find_vma() callers outside of mm/ and fix them up as well? Yes (unless anyone as a better idea or objections to this approach). BTW, I'll be off until the new year, so won't be able to follow up. -- Catalin
Re: [PATCH v9 0/8] arm64: untag user pointers passed to the kernel
Hi Dave, On Wed, Dec 12, 2018 at 05:01:12PM +, Dave P Martin wrote: > On Mon, Dec 10, 2018 at 01:50:57PM +0100, Andrey Konovalov wrote: > > arm64 has a feature called Top Byte Ignore, which allows to embed pointer > > tags into the top byte of each pointer. Userspace programs (such as > > HWASan, a memory debugging tool [1]) might use this feature and pass > > tagged user pointers to the kernel through syscalls or other interfaces. [...] > It looks like there's been a lot of progress made here towards smoking > out most of the sites in the kernel where pointers need to be untagged. In summary, based on last summer's analysis, there are two main (and rather broad) scenarios of __user pointers use in the kernel: (a) uaccess macros, together with access_ok() checks and (b) identifying of user address ranges (find_vma() and related, some ioctls). The patches here handle the former by allowing sign-extension in access_ok() and subsequent uaccess routines work fine with tagged pointers. Identifying the latter is a bit more problematic and the approach we took was tracking down pointer to long conversion which seems to cover the majority of cases. However, this approach doesn't scale as, for example, we'd rather change get_user_pages() to sign-extend the address rather than all the callers. In lots of other cases we don't even need untagging as we don't expect user space to tag such pointers (i.e. mmap() of device memory). We might be able to improve the static analysis by introducing a virt_addr_t but that's significant effort and we still won't cover all cases (e.g. it doesn't necessarily catch tcp_zerocopy_receive() which wouldn't use a pointer, just a u64 for address). > However, I do think that we need a clear policy for how existing kernel > interfaces are to be interpreted in the presence of tagged pointers. > Unless we have that nailed down, we are likely to be able to make only > vague guarantees to userspace about what works, and the ongoing risk > of ABI regressions and inconsistencies seems high. I agree. > Can we define an opt-in for tagged-pointer userspace, that rejects all > syscalls that we haven't checked and whitelisted (or that are > uncheckable like ioctl)? Defining an opt-in is not a problem, however, rejecting all syscalls that we haven't whitelisted is not feasible. We can have an opt-in per process (that's what we were going to do with MTE) but the only thing we can reasonably do is change the behaviour of access_ok(). That's too big a knob and a new syscall that we haven't got around to whitelist may just work. This eventually leads to de-facto ABI and our whitelist would simply be ignored. I'm not really keen on a big syscall shim in the arm64 kernel which checks syscall arguments, including in-struct values. If we are to do this, I'd rather keep it in user space as part of the C library. > In the meantime, I think we really need to nail down the kernel's > policies on > > * in the default configuration (without opt-in), is the presence of > non-address bits in pointers exchanged with the kernel simply > considered broken? (Even with this series, the de factor answer > generally seems to be "yes", although many specific things will now > work fine) Without these patches, passing non-address bits in pointers is considered broken. I couldn't find a case where it would still work with non-zero tag but maybe I haven't looked hard enough. > * if not, how do we tighten syscall / interface specifications to > describe what happens with pointers containing non-address bits, while > keeping the existing behaviour for untagged pointers? > > We would want a general recipe that gives clear guidance on what > userspace should expect an arbitrarily chosen syscall to do with its > pointers, without having to enumerate each and every case. That's what we are aiming with the pointer origins, to move away from a syscall whitelist to a generic definition. That said, the two approaches are orthogonal, we can use the pointer origins as the base rule for which syscalls can be whitelisted. If we step back a bit to look at the use-case for TBI (and MTE), the normal application programmer shouldn't really care about this ABI (well, most of the time). The app gets a tagged pointer from the C library as a result of a malloc()/realloc() (possibly alloca()) call and it expects to be able to pass it back into the kernel (usually via the C library) without any awareness of the non-address bits. Now, we can't define a user/kernel ABI based on the provenance of the pointer in user space (i.e. we only support tags for heap and stack), so we are trying to generalise this based where the pointer originated from in the kernel (e.g. anonymous mmap()). > There may already be some background on these topics -- can you throw me > a link if so? That's an interesting sub-thread to read: https://lore.kernel.org/lkml/5d54526e5ff2e5ad63d0dfdd9ab17cf359afa4f2.1535629099.git.andreyk...@google.com/ -- Catal
Re: [RFC][PATCH 0/3] arm64 relaxed ABI
On Mon, Feb 11, 2019 at 12:32:55PM -0800, Evgenii Stepanov wrote: > On Mon, Feb 11, 2019 at 9:28 AM Kevin Brodsky wrote: > > On 19/12/2018 12:52, Dave Martin wrote: > > > Really, the kernel should do the expected thing with all "non-weird" > > > memory. > > > > > > In lieu of a proper definition of "non-weird", I think we should have > > > some lists of things that are explicitly included, and also excluded: > > > > > > OK: > > > kernel-allocated process stack > > > brk area > > > MAP_ANONYMOUS | MAP_PRIVATE > > > MAP_PRIVATE mappings of /dev/zero > > > > > > Not OK: > > > MAP_SHARED > > > mmaps of non-memory-like devices > > > mmaps of anything that is not a regular file > > > the VDSO > > > ... > > > > > > In general, userspace can tag memory that it "owns", and we do not assume > > > a transfer of ownership except in the "OK" list above. Otherwise, it's > > > the kernel's memory, or the owner is simply not well defined. > > > > Agreed on the general idea: a process should be able to pass tagged > > pointers at the > > syscall interface, as long as they point to memory privately owned by the > > process. I > > think it would be possible to simplify the definition of "non-weird" memory > > by using > > only this "OK" list: > > - mmap() done by the process itself, where either: > >* flags = MAP_PRIVATE | MAP_ANONYMOUS > >* flags = MAP_PRIVATE and fd refers to a regular file or a well-defined > > list of > > device files (like /dev/zero) > > - brk() done by the process itself > > - Any memory mapped by the kernel in the new process's address space during > > execve(), > > with the same restrictions as above ([vdso]/[vvar] are therefore excluded) Sounds reasonable. > > > * Userspace should set tags at the point of allocation only. > > > > Yes, tags are only supposed to be set at the point of either allocation or > > deallocation/reallocation. However, allocators can in principle be nested, > > so an > > allocator could take a region allocated by malloc() as input and subdivide > > it > > (changing tags in the process). That said, this suballocator must not > > free() that > > region until all the suballocations themselves have been freed (thereby > > restoring the > > tags initially set by malloc()). > > > > > * If you don't know how an object was allocated, you cannot modify the > > > tag, period. > > > > Agreed, allocators that tag memory can only operate on memory with a > > well-defined > > provenance (for instance anonymous mmap() or malloc()). > > > > > * A single C object should be accessed using a single, fixed pointer tag > > > throughout its entire lifetime. > > > > Agreed. Allocators themselves may need to be excluded though, depending on > > how they > > represent their managed memory. > > > > > * Tags can be changed only when there are no outstanding pointers to > > > the affected object or region that may be used to access the object > > > or region (i.e., if the object were allocated from the C heap and > > > is it safe to realloc() it, then it is safe to change the tag; for > > > other types of allocation, analogous arguments can be applied). > > > > Tags can only be changed at the point of deallocation/reallocation. > > Pointers to the > > object become invalid and cannot be used after it has been deallocated; > > memory > > tagging allows to catch such invalid usage. All the above sound well but that's mostly a guideline on what a C library can do. It doesn't help much with defining the kernel ABI. Anyway, it's good to clarify the use-cases. > > > * When the kernel dereferences a pointer on userspace's behalf, it > > > shall behave equivalently to userspace dereferencing the same pointer, > > > including use of the same tag (where passed by userspace). > > > > > > * Where the pointer tag affects pointer dereference behaviour (i.e., > > > with hardware memory colouring) the kernel makes no guarantee to > > > honour pointer tags correctly for every location a buffer based on a > > > pointer passed by userspace to the kernel. > > > > > > (This means for example that for a read(fd, buf, size), we can check > > > the tag for a single arbitrary location in *(char (*)[size])buf > > > before passing the buffer to get_user_pages(). Hopefully this could > > > be done in get_user_pages() itself rather than hunting call sites. > > > For userspace, it means that you're on your own if you ask the > > > kernel to operate on a buffer than spans multiple, independently- > > > allocated objects, or a deliberately striped single object.) > > > > I think both points are reasonable. It is very valuable for the kernel to > > access > > userspace memory using the user-provided tag, because it enables kernel > > accesses to > > be checked in the same way as user accesses, allowing to detect bugs that > > are > > potentially hard to find. For insta
Re: [RFC][PATCH 0/3] arm64 relaxed ABI
Hi Szabolcs, Thanks for looking into this. Comments below. On Tue, Feb 19, 2019 at 06:38:31PM +, Szabolcs Nagy wrote: > i think these rules work for the cases i care about, a more > tricky question is when/how to check for the new syscall abi > and when/how the TCR_EL1.TBI0 setting may be turned off. I don't think turning TBI0 off is critical (it's handy for PAC with 52-bit VA but then it's short-lived if you want more security features like MTE). > consider the following cases (tb == top byte): > > binary 1: user tb = any, syscall tb = 0 > tbi is on, "legacy binary" > > binary 2: user tb = any, syscall tb = any > tbi is on, "new binary using tb" > for backward compat it needs to check for new syscall abi. > > binary 3: user tb = 0, syscall tb = 0 > tbi can be off, "new binary", > binary is marked to indicate unused tb, > kernel may turn tbi off: additional pac bits. > > binary 4: user tb = mte, syscall tb = mte > like binary 3, but with mte, "new binary using mte" > does it have to check for new syscall abi? > or MTE HWCAP would imply it? > (is it possible to use mte without new syscall abi?) I think MTE HWCAP should imply it. > in userspace we want most binaries to be like binary 3 and 4 > eventually, i.e. marked as not-relying-on-tbi, if a dso is > loaded that is unmarked (legacy or new tb user), then either > the load fails (e.g. if mte is already used? or can we turn > mte off at runtime?) or tbi has to be enabled (prctl? does > this work with pac? or multi-threads?). We could enable it via prctl. That's the plan for MTE as well (in addition maybe to some ELF flag). > as for checking the new syscall abi: i don't see much semantic > difference between AT_HWCAP and AT_FLAGS (either way, the user > has to check a feature flag before using the feature of the > underlying system and it does not matter much if it's a syscall > abi feature or cpu feature), but i don't see anything wrong > with AT_FLAGS if the kernel prefers that. The AT_FLAGS is aimed at capturing binary 2 case above, i.e. the relaxation of the syscall ABI to accept tb = any. The MTE support will have its own AT_HWCAP, likely in addition to AT_FLAGS. Arguably, AT_FLAGS is either redundant here if MTE implies it (and no harm in keeping it around) or the meaning is different: a tb != 0 may be checked by the kernel against the allocation tag (i.e. get_user() could fail, the tag is not entirely ignored). > the discussion here was mostly about binary 2, That's because passing tb != 0 into the syscall ABI is the main blocker here that needs clearing out before merging the MTE support. There is, of course, a variation of binary 1 for MTE: binary 5: user tb = mte, syscall tb = 0 but this requires a lot of C lib changes to support properly. > but for > me the open question is if we can make binary 3/4 work. > (which requires some elf binary marking, that is recognised > by the kernel and dynamic loader, and efficient handling of > the TBI0 bit, ..if it's not possible, then i don't see how > mte will be deployed). If we ignore binary 3, we can keep TBI0 = 1 permanently, whether we have MTE or not. > and i guess on the kernel side the open question is if the > rules 1/2/3/4 can be made to work in corner cases e.g. when > pointers embedded into structs are passed down in ioctl. We've been trying to track these down since last summer and we came to the conclusion that it should be (mostly) fine for the non-weird memory described above. -- Catalin
Re: [PATCH v5 01/10] arm64: Provide a command line to disable spectre_v2 mitigation
On Thu, Feb 28, 2019 at 06:14:34PM +, Suzuki K Poulose wrote: > On 27/02/2019 01:05, Jeremy Linton wrote: > > There are various reasons, including bencmarking, to disable spectrev2 > > mitigation on a machine. Provide a command-line to do so. > > > > Signed-off-by: Jeremy Linton > > Cc: Jonathan Corbet > > Cc: linux-doc@vger.kernel.org > > > > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c > > index 9950bb0cbd52..d2b2c69d31bb 100644 > > --- a/arch/arm64/kernel/cpu_errata.c > > +++ b/arch/arm64/kernel/cpu_errata.c > > @@ -220,6 +220,14 @@ static void qcom_link_stack_sanitization(void) > > : "=&r" (tmp)); > > } > > +static bool __nospectre_v2; > > +static int __init parse_nospectre_v2(char *str) > > +{ > > + __nospectre_v2 = true; > > + return 0; > > +} > > +early_param("nospectre_v2", parse_nospectre_v2); > > + > > static void > > enable_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry) > > { > > @@ -231,6 +239,11 @@ enable_smccc_arch_workaround_1(const struct > > arm64_cpu_capabilities *entry) > > if (!entry->matches(entry, SCOPE_LOCAL_CPU)) > > return; > > + if (__nospectre_v2) { > > + pr_info_once("spectrev2 mitigation disabled by command line > > option\n"); > > + return; > > + } > > + > > Could we not disable the "cap" altogether instead, rather than disabling the > work around ? Or do we need that information ? There are a few ideas here but I think we settled on always reporting in sysfs even if the mitigation is disabled in .config. So I guess we need the "cap" around for the reporting part. -- Catalin
Re: [PATCH v10 07/12] fs, arm64: untag user pointers in fs/userfaultfd.c
On Tue, Feb 26, 2019 at 03:39:08PM +0100, Andrey Konovalov wrote: > On Sat, Feb 23, 2019 at 12:06 AM Dave Hansen wrote: > > > > On 2/22/19 4:53 AM, Andrey Konovalov wrote: > > > userfaultfd_register() and userfaultfd_unregister() use provided user > > > pointers for vma lookups, which can only by done with untagged pointers. > > > > So, we have to patch all these sites before the tagged values get to the > > point of hitting the vma lookup functions. Dumb question: Why don't we > > just patch the vma lookup functions themselves instead of all of these > > callers? > > That might be a working approach as well. We'll still need to fix up > places where the vma fields are accessed directly. Catalin, what do > you think? Most callers of find_vma*() always follow it by a check of vma->vma_start against some tagged address ('end' in the userfaultfd_(un)register()) case. So it's not sufficient to untag it in find_vma(). -- Catalin
Re: [PATCH v2] kmemleak: add oom= runtime parameter
On Mon, Jul 24, 2017 at 05:16:34PM +0800, shuw...@redhat.com wrote: > When running memory stress tests, kmemleak could be easily disabled in > function create_object as system is out of memory and kmemleak failed to > alloc from object_cache. Since there's no way to enable kmemleak after > it's off, simply ignore the object_cache alloc failure will just loses > track of some memory objects, but could increase the usability of kmemleak > under memory stress. I wonder how usable kmemleak is when not recording all the allocated objects. If some of these memory blocks contain references to others, such referenced objects could be reported as leaks (basically increasing the false positives rate). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/20] ILP32 for ARM64
Hi Yury, On Mon, Jul 24, 2017 at 02:26:24PM +0300, Yury Norov wrote: > On Fri, Jul 07, 2017 at 06:11:36PM +0100, Catalin Marinas wrote: > > On Fri, Jul 07, 2017 at 12:59:02AM +0300, Yury Norov wrote: > > > If so, I would like to ask you to do the first ILP32 community poll > > > now, not in 6 months. So we'll collect opinions and requests from > > > people interested in ILP32, and in 6 month will be able to check the > > > progress. I would like to see this thread public because if we are not > > > taking ILP32 to official sources right now, this is the only way to > > > inform people that the project exists and is ready to use. > > > > That's an ongoing process, I'm not going to ask for people's opinion > > every 6 months. It's just that I will revisit periodically the progress > > on automated testing, public availability of a cross-toolchain, > > Tested/Acked/Reviewed-by tags on these patches from interested parties. > > Since I haven't seen any of these now, I don't see any point in asking. > > > > To be clear, I'm not really interested in "we need this too" opinions, I > > get lots of these via the marketing channels. I'm looking for actual > > users with a long-term view of making/keeping ILP32 a first class ABI. > > From my side, there are people who ask me for help with ilo32, and who > write from big companies mailservers. But they don't want to ask > their questions publicly for some reason. From my point of view, there > is not so big but stable interest in ILP32. It would be nice if they were more open about what they need/use. > This is the 4.12 and linux-next - based kernel patches: > https://github.com/norov/linux/tree/ilp32-4.12 > https://github.com/norov/linux/tree/ilp32-20170724 Thanks. I'll publish the 4.12-based branch sometime next week. At this stage I don't see much value in a linux-next based ILP32. I would rather like to see a 4.13-rc3 based one, in preparation for a 4.13 branch once released. > Should I resend kernel patches to LKML, or links above are enough for > you? The 4.12 link is ok. However, could you please post a 4.13-rcX based series, maybe split in two so that a few generic patches can be merged in 4.14? Given the reworking of the sigcontext code in 4.13, it would be good to review the ILP32 changes in this area again. Thanks. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/20] ILP32 for ARM64
On Mon, Jul 24, 2017 at 02:26:24PM +0300, Yury Norov wrote: > This is the 4.12 and linux-next - based kernel patches: > https://github.com/norov/linux/tree/ilp32-4.12 > https://github.com/norov/linux/tree/ilp32-20170724 I published the 4.12 branch here: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=staging/ilp32-4.12 (or git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git staging/ilp32-4.12) There are two patches on top, one to fix SET_PERSONALITY and the other to make ILP32 default y since you'd expect people using this branch to need such option enabled. I'll publish a 4.13-based branch when the corresponding kernel version is released. Thanks. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/20] ILP32 for ARM64
On Mon, Sep 04, 2017 at 02:54:50PM +0300, Yury Norov wrote: > This is 4.13-based and next-20170901-based ilp32 patches. > https://github.com/norov/linux/tree/ilp32-4.13 Thanks. I'll mirror it on kernel.org sometime this week after doing some tests (I've been mostly away for the past two weeks). Regarding the generic patches in this series, could you please post them to linux-arch (without the whole ILP32 series)? It would be good to get them merged for 4.15. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ILP32 for ARM64: testing with glibc testsuite
On Wed, Nov 16, 2016 at 03:22:26PM +0400, Maxim Kuvyrkov wrote: > Regarding ILP32 runtime, my opinion is that it is acceptable for ILP32 > to have extra failures compared to LP64, since these are not > regressions, but, rather, failures of a new configuration. I disagree with this. We definitely need to understand why they fail, otherwise we run the risk of potential glibc or kernel implementation bugs becoming ABI. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ILP32 for ARM64 - testing with lmbench
On Mon, Dec 05, 2016 at 06:16:09PM +0800, Zhangjian (Bamvor) wrote: > Do you have suggestion of next move of upstreaming ILP32? I mentioned the steps a few time before. I'm pasting them again here: 1. Complete the review of the Linux patches and ABI (no merge yet) 2. Review the corresponding glibc patches (no merge yet) 3. Ask (Linaro, Cavium) for toolchain + filesystem (pre-built and more than just busybox) to be able to reproduce the testing in ARM 4. More testing (LTP, trinity, performance regressions etc.) 5. Move the ILP32 PCS out of beta (based on the results from 4) 6. Check the market again to see if anyone still needs ILP32 7. Based on 6, decide whether to merge the kernel and glibc patches What's not explicitly mentioned in step 4 is glibc testing. Point 5 is ARM's responsibility (toolchain folk). > There are already the test results of lmbench and specint. Do you they > are ok or need more data to prove no regression? I would need to reproduce the tests myself, see step 3. > I have also noticed that there are ILP32 failures in glibc testsuite. > Is it the only blocker for merge ILP32(in technology part)? It's probably not the only blocker but I have to review the kernel patches again to make sure. I'd also like to see whether the libc-alpha community is ok with the glibc counterpart (but don't merge the patches until the ABI is agreed on both sides). On performance, I want to make sure there are no regressions on AArch32/compat and AArch64/LP64. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/18] arm64: introduce binfmt_elf32.c
On Fri, Oct 21, 2016 at 11:33:08PM +0300, Yury Norov wrote: > As we support more than one compat formats, it looks more reasonable > to not use fs/compat_binfmt.c. Custom binfmt_elf32.c allows to move aarch32 > specific definitions there and make code more maintainable and readable. Can you remind me why we need this patch (rather than using the default fs/compat_binfmt_elf.c which you include here anyway)? > --- /dev/null > +++ b/arch/arm64/kernel/binfmt_elf32.c > @@ -0,0 +1,31 @@ > +/* > + * Support for AArch32 Linux ELF binaries. > + */ > + > +/* AArch32 EABI. */ > +#define EF_ARM_EABI_MASK 0xff00 > + > +#define compat_start_thread compat_start_thread > +#define COMPAT_SET_PERSONALITY(ex) \ > +do { \ > + clear_thread_flag(TIF_32BIT_AARCH64); \ > + set_thread_flag(TIF_32BIT); \ > +} while (0) You introduce this here but it seems to still be present in asm/elf.h. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/18] arm64: ilp32: introduce binfmt_ilp32.c
On Fri, Oct 21, 2016 at 11:33:09PM +0300, Yury Norov wrote: > binfmt_ilp32.c is needed to handle ILP32 binaries > > Signed-off-by: Yury Norov > Signed-off-by: Bamvor Zhang Jian > --- > arch/arm64/include/asm/elf.h | 6 +++ > arch/arm64/kernel/Makefile | 1 + > arch/arm64/kernel/binfmt_ilp32.c | 97 > > 3 files changed, 104 insertions(+) > create mode 100644 arch/arm64/kernel/binfmt_ilp32.c > > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h > index f259fe8..be29dde 100644 > --- a/arch/arm64/include/asm/elf.h > +++ b/arch/arm64/include/asm/elf.h > @@ -175,10 +175,16 @@ extern int arch_setup_additional_pages(struct > linux_binprm *bprm, > > #define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3) > > +#ifndef USE_AARCH64_GREG > /* AArch32 registers. */ > #define COMPAT_ELF_NGREG 18 > typedef unsigned int compat_elf_greg_t; > typedef compat_elf_greg_tcompat_elf_gregset_t[COMPAT_ELF_NGREG]; > +#else /* AArch64 registers for AARCH64/ILP32 */ > +#define COMPAT_ELF_NGREG ELF_NGREG > +#define compat_elf_greg_telf_greg_t > +#define compat_elf_gregset_t elf_gregset_t > +#endif I think you only need compat_elf_gregset_t definition here and leave the other two undefined. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/18] arm64: signal32: move ilp32 and aarch32 common code to separated file
On Fri, Oct 21, 2016 at 11:33:13PM +0300, Yury Norov wrote: > Signed-off-by: Yury Norov Please add some description, even if it means copying the subject. > --- > arch/arm64/include/asm/signal32.h| 3 + > arch/arm64/include/asm/signal32_common.h | 27 +++ > arch/arm64/kernel/Makefile | 2 +- > arch/arm64/kernel/signal32.c | 107 > arch/arm64/kernel/signal32_common.c | 135 > +++ > 5 files changed, 166 insertions(+), 108 deletions(-) > create mode 100644 arch/arm64/include/asm/signal32_common.h > create mode 100644 arch/arm64/kernel/signal32_common.c I wonder whether you can make such patches more readable by setting "diff.renames" to "copy" in your gitconfig (unless it's set already and Git cannot detect partial file code moving/copying). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/18] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
On Fri, Oct 21, 2016 at 11:33:15PM +0300, Yury Norov wrote: > New aarch32 ptrace syscall handler is introduced to avoid run-time > detection of the task type. What's wrong with the run-time detection? If it's just to avoid a negligible overhead, I would rather keep the code simpler by avoiding duplicating the generic compat_sys_ptrace(). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/18] arm64: ilp32: share aarch32 syscall handlers
On Fri, Oct 21, 2016 at 11:33:10PM +0300, Yury Norov wrote: > off_t is passed in register pair just like in aarch32. > In this patch corresponding aarch32 handlers are shared to > ilp32 code. [...] > +/* > + * Note: off_4k (w5) is always in units of 4K. If we can't do the > + * requested offset because it is not page-aligned, we return -EINVAL. > + */ > +ENTRY(compat_sys_mmap2_wrapper) > +#if PAGE_SHIFT > 12 > + tst w5, #~PAGE_MASK >> 12 > + b.ne1f > + lsr w5, w5, #PAGE_SHIFT - 12 > +#endif > + b sys_mmap_pgoff > +1: mov x0, #-EINVAL > + ret > +ENDPROC(compat_sys_mmap2_wrapper) For compat sys_mmap2, the pgoff argument is in multiples of 4K. This was traditionally used for architectures where off_t is 32-bit to allow mapping files to 2^44. Since off_t is 64-bit with AArch64/ILP32, should we just pass the off_t as a 64-bit value in two different registers (w5 and w6)? -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/18] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
On Tue, Dec 06, 2016 at 11:55:08AM +0530, Yury Norov wrote: > On Mon, Dec 05, 2016 at 04:34:23PM +0000, Catalin Marinas wrote: > > On Fri, Oct 21, 2016 at 11:33:15PM +0300, Yury Norov wrote: > > > New aarch32 ptrace syscall handler is introduced to avoid run-time > > > detection of the task type. > > > > What's wrong with the run-time detection? If it's just to avoid a > > negligible overhead, I would rather keep the code simpler by avoiding > > duplicating the generic compat_sys_ptrace(). > > Nothing wrong. This is how Arnd asked me to do. You already asked this > question: http://lkml.iu.edu/hypermail/linux/kernel/1604.3/00930.html Hmm, I completely forgot about this ;). There is still an advantage to doing run-time checking if we avoid touching core code (less acks to gather and less code duplication). Let's see what Arnd says but the initial patch looked simpler. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/18] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
On Wed, Dec 07, 2016 at 09:40:13PM +0100, Arnd Bergmann wrote: > On Wednesday, December 7, 2016 4:59:13 PM CET Catalin Marinas wrote: > > On Tue, Dec 06, 2016 at 11:55:08AM +0530, Yury Norov wrote: > > > On Mon, Dec 05, 2016 at 04:34:23PM +0000, Catalin Marinas wrote: > > > > On Fri, Oct 21, 2016 at 11:33:15PM +0300, Yury Norov wrote: > > > > > New aarch32 ptrace syscall handler is introduced to avoid run-time > > > > > detection of the task type. > > > > > > > > What's wrong with the run-time detection? If it's just to avoid a > > > > negligible overhead, I would rather keep the code simpler by avoiding > > > > duplicating the generic compat_sys_ptrace(). > > > > > > Nothing wrong. This is how Arnd asked me to do. You already asked this > > > question: http://lkml.iu.edu/hypermail/linux/kernel/1604.3/00930.html > > > > Hmm, I completely forgot about this ;). There is still an advantage to > > doing run-time checking if we avoid touching core code (less acks to > > gather and less code duplication). > > > > Let's see what Arnd says but the initial patch looked simpler. > > I don't currently have either version of the patch in my inbox > (the archive is on a different machine), but in general I'd still > think it's best to avoid the runtime check for aarch64-ilp32 > altogether. I'd have to look at the overall kernel source to > see if it's worth avoiding one or two instances though, or > if there are an overwhelming number of other checks that we > can't avoid at all. Just in case you haven't found them already, current version: https://marc.info/?l=linux-arm-kernel&m=147708276818318&w=2 Original version: https://patchwork.kernel.org/patch/7980521/ The old one looks more readable and given that ptrace is not really a fast path, I'm not two worried about run-time checks > Regarding ptrace, I notice that arch/tile doesn't even use > the compat entry point for its ilp32 user space on 64-bit > kernels, it just calls the regular 64-bit one. Would that > help here? I don't know whether it would work, we have incompatible siginfo_t on AArch64/ILP32. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] arm64: untag user addresses in copy_from_user and others
On Wed, Apr 18, 2018 at 08:53:12PM +0200, Andrey Konovalov wrote: > @@ -238,12 +239,15 @@ static inline void uaccess_enable_not_uao(void) > /* > * Sanitise a uaccess pointer such that it becomes NULL if above the > * current addr_limit. > + * Also untag user pointers that have the top byte tag set. > */ > #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr) > static inline void __user *__uaccess_mask_ptr(const void __user *ptr) > { > void __user *safe_ptr; > > + ptr = untagged_addr(ptr); > + > asm volatile( > " bicsxzr, %1, %2\n" > " csel%0, %1, xzr, eq\n" First of all, passing a tagged user pointer throughout the kernel is safe with uaccess routines but not suitable for find_vma() etc. With this change, we may have an inconsistent behaviour on the tag masking, depending on whether the entry code uses __uaccess_mask_ptr() or not. We could preserve the tag with something like: diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index e66b0fca99c2..ed15bfcbd797 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -244,10 +244,11 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr) void __user *safe_ptr; asm volatile( - " bicsxzr, %1, %2\n" + " bicsxzr, %3, %2\n" " csel%0, %1, xzr, eq\n" : "=&r" (safe_ptr) - : "r" (ptr), "r" (current_thread_info()->addr_limit) + : "r" (ptr), "r" (current_thread_info()->addr_limit), + "r" (untagged_addr(ptr)) : "cc"); csdb(); -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] mm, arm64: untag user addresses in mm/gup.c
On Wed, Apr 18, 2018 at 08:53:13PM +0200, Andrey Konovalov wrote: > diff --git a/mm/gup.c b/mm/gup.c > index 76af4cfeaf68..fb375de7d40d 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -386,6 +386,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma, > struct page *page; > struct mm_struct *mm = vma->vm_mm; > > + address = untagged_addr(address); > + > *page_mask = 0; > > /* make this handle hugepd */ Does having a tagged address here makes any difference? I couldn't hit a failure with my simple tests (LD_PRELOAD a library that randomly adds tags to pointers returned by malloc). > @@ -647,6 +649,8 @@ static long __get_user_pages(struct task_struct *tsk, > struct mm_struct *mm, > if (!nr_pages) > return 0; > > + start = untagged_addr(start); > + > VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET)); > > /* > @@ -801,6 +805,8 @@ int fixup_user_fault(struct task_struct *tsk, struct > mm_struct *mm, > struct vm_area_struct *vma; > int ret, major = 0; > > + address = untagged_addr(address); > + > if (unlocked) > fault_flags |= FAULT_FLAG_ALLOW_RETRY; > > @@ -854,6 +860,8 @@ static __always_inline long > __get_user_pages_locked(struct task_struct *tsk, > long ret, pages_done; > bool lock_dropped; > > + start = untagged_addr(start); > + > if (locked) { > /* if VM_FAULT_RETRY can be returned, vmas become invalid */ > BUG_ON(vmas); Isn't __get_user_pages() untagging enough to cover this case as well? Can this function not cope with tagged pointers? > @@ -1751,6 +1759,8 @@ int __get_user_pages_fast(unsigned long start, int > nr_pages, int write, > unsigned long flags; > int nr = 0; > > + start = untagged_addr(start); > + > start &= PAGE_MASK; > addr = start; > len = (unsigned long) nr_pages << PAGE_SHIFT; > @@ -1803,6 +1813,8 @@ int get_user_pages_fast(unsigned long start, int > nr_pages, int write, > unsigned long addr, len, end; > int nr = 0, ret = 0; > > + start = untagged_addr(start); > + > start &= PAGE_MASK; > addr = start; > len = (unsigned long) nr_pages << PAGE_SHIFT; Have you hit a problem with the fast gup functions and tagged pointers? The page table walking macros (e.g. p*d_index()) should mask the tag out already. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] arm64: untag user pointers passed to the kernel
On Wed, Apr 25, 2018 at 04:45:37PM +0200, Andrey Konovalov wrote: > On Thu, Apr 19, 2018 at 11:33 AM, Kirill A. Shutemov > wrote: > > On Wed, Apr 18, 2018 at 08:53:09PM +0200, Andrey Konovalov wrote: > >> arm64 has a feature called Top Byte Ignore, which allows to embed pointer > >> tags into the top byte of each pointer. Userspace programs (such as > >> HWASan, a memory debugging tool [1]) might use this feature and pass > >> tagged user pointers to the kernel through syscalls or other interfaces. > >> > >> This patch makes a few of the kernel interfaces accept tagged user > >> pointers. The kernel is already able to handle user faults with tagged > >> pointers and has the untagged_addr macro, which this patchset reuses. > >> > >> We're not trying to cover all possible ways the kernel accepts user > >> pointers in one patchset, so this one should be considered as a start. > > > > How many changes do you anticipate? > > > > This patchset looks small and reasonable, but I see a potential to become a > > boilerplate. Would we need to change every driver which implements ioctl() > > to strip these bits? > > I've replied to somewhat similar question in one of the previous > versions of the patchset. > > """ > There are two different approaches to untagging the user pointers that I see: > > 1. Untag user pointers right after they are passed to the kernel. > > While this might be possible for pointers that are passed to syscalls > as arguments (Catalin's "hack"), this leaves user pointers, that are > embedded into for example structs that are passed to the kernel. Since > there's no specification of the interface between user space and the > kernel, different kernel parts handle user pointers differently and I > don't see an easy way to cover them all. > > 2. Untag user pointers where they are used in the kernel. > > Although there's no specification on the interface between the user > space and the kernel, the kernel still has to use one of a few > specific ways to access user data (copy_from_user, etc.). So the idea > here is to add untagging into them. This patchset mostly takes this > approach (with the exception of memory subsystem syscalls). > > If there's a better approach, I'm open to suggestions. > """ > > So if we go with the first way, we'll need to go through every syscall > and ioctl handler, which doesn't seem feasible. I agree with you that (1) isn't feasible. My hack is sufficient for the pointer arguments but doesn't help with pointers in user structures passed to the kernel. Now, since the hardware allows access to user pointers with non-zero top 8-bit, the kernel uaccess routines can also use such pointers directly. What's needed, as per your patches, is the access_ok() macro and whatever ends up calling find_vma() (at a first look, there may be other cases). I don't think drivers need changing as long as the in-kernel API they use performs the untagging (e.g. get_user_pages()). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] mm, arm64: untag user addresses in mm/gup.c
On Wed, May 02, 2018 at 07:25:17PM +0200, Andrey Konovalov wrote: > On Wed, May 2, 2018 at 5:36 PM, Kirill A. Shutemov > wrote: > > On Wed, May 02, 2018 at 02:38:42PM +, Andrey Konovalov wrote: > >> > Does having a tagged address here makes any difference? I couldn't hit a > >> > failure with my simple tests (LD_PRELOAD a library that randomly adds > >> > tags to pointers returned by malloc). > >> > >> I think you're right, follow_page_mask is only called from > >> __get_user_pages, which already untagged the address. I'll remove > >> untagging here. > > > > It also called from follow_page(). Have you covered all its callers? > > Oh, missed that, will take a look. > > Thinking about that, would it make sense to add untagging to find_vma > (and others) instead of trying to cover all find_vma callers? I don't think adding the untagging to find_vma() is sufficient. In many cases the caller does a subsequent check like 'start < vma->vm_start' (see sys_msync() as an example, there are a few others as well). What I did in my tests was a WARN_ON_ONCE() in find_vma() if the address is tagged. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 11/11] arm64: annotate user pointers casts detected by sparse
Hi Andrey, (sorry for the delay) On Mon, Sep 17, 2018 at 07:01:00PM +0200, Andrey Konovalov wrote: > I took another look at the changes this patchset does to the kernel > and here are my thoughts: > > I see two ways how a (potentially tagged) user pointer gets into the kernel: > > 1. A pointer is passed to a syscall (directly as an argument or > indirectly as a struct field). > 2. A pointer is extracted from user context (registers, etc.) by some > kind of a trap/fault handler. > (Is there something else?) Not AFAICT. > In case 1 we also have a special case of a pointer passed to one of > the memory syscalls (mmap, mprotect, etc.). These syscalls "are not > doing memory accesses but rather dealing with the memory range, hence > an untagged pointer is better suited" as pointed out by Catalin (these > syscalls do not always use "unsigned long" instead of "void __user *" > though, for example shmat uses "void __user *"). If it makes things any simpler, we could revisit this though it seems to me more consistent not to pass a tagged pointer back into the kernel when the original one was untagged (i.e. mmap vs munmap). (if it wasn't for pointers in structures, I would have just left the problem entirely to the C library to do the untagging before calling the kernel) > Looking at patch #8 ("usb, arm64: untag user addresses in devio") in > this series, it seems that that devio ioctl actually accepts a pointer > into a vma, so we shouldn't actually be untagging its argument and the > patch needs to be dropped. You are right, the pointer seems to have originated from the kernel as already untagged (mmap() on the driver), so we would expect the user to pass it back an untagged pointer. > Otherwise there's quite a few more cases that needs to be changed > (like tcp_zerocopy_receive() for example, more can be found by > grepping find_vma() in generic code). Yes, it's similar to the devio one. > Regarding case 2, it seems that analyzing casts of __user pointers > won't really help, since the code (arch/arm64/mm/fault.c) doesn't > really use them. However all of this code is arch specific, so it > shouldn't really change over time (right?). It looks like dealing with > tags passed to the kernel through these fault handlers is already > resolved with these patches (and therefore patch #6 ("arm64: untag > user address in __do_user_fault") in this series is not actually > needed and can be dropped (need to test that)): I'm less worried about (2) since, as you say, it's under the arch control and, even if it changes slightly over time, we can be aware of this. > Now, I also see two cases when kernel behavior changes depending on > whether a pointer is tagged: > > 1. Kernel code checks that a pointer belongs to userspace by comparing > it with TASK_SIZE/addr_limit/user_addr_max()/USER_DS/... . > 2. A pointer gets passed to find_vma() or similar functions. > (Is there something else?) I think these are the main cases. > The initial thought that I had here is that the pointers that reach > find_vma() must be passed through memory syscalls and therefore > shouldn't be untagged and don't require any fixes. There are at least > two exceptions to this: 1. get_user_pages() (see patch #4 ("mm, arm64: > untag user addresses in mm/gup.c") in this patch series) and 2. > __do_page_fault() in arch/arm64/mm/fault.c. Are there any other > obvious exceptions? Vincenzo F did some more in-depth analysis, I'll let him answer whether he has found any more cases. At a quick grep, arm64_notify_segfault() (it changes the siginfo si_code). There are a few other places which assume that the address is already untagged but we don't seem to have a clear guideline on the ABI. For example, prctl(PR_SET_MM) takes user addresses and we assume they are untagged (as in the mmap() and friends). > I've tried adding BUG_ON(has_tag(addr)) to find_vma() and running a > modified syzkaller version that passes tagged pointers to the kernel > and failed to find anything else. I added a similar test with an LD_PRELOAD'ed malloc/free implementation on Debian and seemed alright but I'd rather want something more consistent when defining the user ABI, otherwise we have places where find_vma() is called but requires untagging. > As for case 1, the places where pointers are compared with TASK_SIZE > and others can be found with grep. Maybe it makes sense to introduce > some kind of routine like is_user_pointer() that handles tagged > pointers and refactor the existing code to use it? And maybe add a > rule to checkpatch.pl that forbids the direct usage of TASK_SIZE and > others. > > So I think detecting direct comparisons with TASK_SIZE and others > would more useful than finding __user pointer casts (it seems that the > latter requires a lot of annotations to be fixed/added), and I should > just drop this patch with annotations. I think point (1) is not too bad, usually found with grep. As I've said in my previous reply, I kind of came to the sa
Re: [PATCH v7 7/8] arm64: update Documentation/arm64/tagged-pointers.txt
On Tue, Oct 02, 2018 at 03:12:42PM +0200, Andrey Konovalov wrote: > diff --git a/Documentation/arm64/tagged-pointers.txt > b/Documentation/arm64/tagged-pointers.txt > index a25a99e82bb1..ae877d185fdb 100644 > --- a/Documentation/arm64/tagged-pointers.txt > +++ b/Documentation/arm64/tagged-pointers.txt > @@ -17,13 +17,21 @@ this byte for application use. > Passing tagged addresses to the kernel > -- > > -All interpretation of userspace memory addresses by the kernel assumes > -an address tag of 0x00. > +Some initial work for supporting non-zero address tags passed to the > +kernel has been done. As of now, the kernel supports tags in: With my maintainer hat on, the above statement leads me to think this new ABI is work in progress, so not yet suitable for upstream. Also, how is user space supposed to know that it can now pass tagged pointers into the kernel? An ABI change (or relaxation), needs to be advertised by the kernel, usually via a new HWCAP bit (e.g. HWCAP_TBI). Once we have a HWCAP bit in place, we need to be pretty clear about which syscalls can and cannot cope with tagged pointers. The "as of now" implies potential further relaxation which, again, would need to be advertised to user in some (additional) way. > -This includes, but is not limited to, addresses found in: > + - user fault addresses While the kernel currently supports this in some way (by clearing the tag exception entry, el0_da), the above implies (at least to me) that sigcontext.fault_address would contain the tagged address. That's not the case (unless I missed it in your patches). > - - pointer arguments to system calls, including pointers in structures > - passed to system calls, > + - pointer arguments (including pointers in structures), which don't > +describe virtual memory ranges, passed to system calls I think we need to be more precise here... > +All other interpretations of userspace memory addresses by the kernel > +assume an address tag of 0x00. This includes, but is not limited to, > +addresses found in: > + > + - pointer arguments (including pointers in structures), which describe > + virtual memory ranges, passed to memory system calls (mmap, mprotect, > + etc.) ...and probably a full list here. -- Catalin
Re: [PATCH v9 00/24] ILP32 for ARM64
On Wed, Oct 10, 2018 at 04:10:21PM +0200, Eugene Syromiatnikov wrote: > I have some questions regarding AArch64 ILP32 implementation for which I > failed to find an answer myself: > * How ptrace() tracer is supposed to distinguish between ILP32 and LP64 >tracees? For MIPS N32 and x32 this is possible based on syscall >number, but for AArch64 ILP32 I do not see such a sign. There's also >ARM_ip is employed for signalling entering/exiting, I wonder whether >it's possible to employ it also for signalling tracee's personality. With the current implementation, I don't think you can distinguish. From the kernel perspective, the register set is the same. What is the use-case for this? We could add a new regset to expose the ILP32 state (NT_ARM_..., I can't think of a name now but probably not PER* as this implies PER_LINUX_... which is independent from TIF_32BIT_*). > * What's the reasoning behind capping syscall arguments to 32 bit? x32 >and MIPS N32 do not have such a restriction (and do not need special >wrappers for syscalls that pass 64-bit values as a result, except >when they do, as it is the case for preadv2 on x32); moreover, that >would lead to insurmountable difficulties for AArch64 ILP32 tracers >that try to trace LP64 tracees, as it would be impossible to pass >64-bit addresses to process_vm_{read,write} or ptrace PEEK/POKE. We've attempted in earlier versions to allow a mix of 32 and 64-bit register values from ILP32 but it got pretty complicated. The entry code would need to know which registers need zeroing of the top 32-bit and the generic unistd.h wrapper hacks were not very nice. Some past discussions: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1211716.html -- Catalin
Re: [PATCH v2] Documentation/arm64: HugeTLB page implementation
On Tue, Oct 09, 2018 at 12:50:49PM +0100, Will Deacon wrote: > On Tue, Oct 09, 2018 at 11:02:01AM +0100, Punit Agrawal wrote: > > Randy Dunlap writes: > > > > > On 10/8/18 3:03 AM, Punit Agrawal wrote: > > >> Arm v8 architecture supports multiple page sizes - 4k, 16k and > > >> 64k. Based on the active page size, the Linux port supports > > >> corresponding hugepage sizes at PMD and PUD(4k only) levels. > > >> > > >> In addition, the architecture also supports caching larger sized > > >> ranges (composed of multiple entries) at the PTE and PMD level in the > > >> TLBs using the contiguous bit. The Linux port makes use of this > > >> architectural support to enable additional hugepage sizes. > > >> > > >> Describe the two different types of hugepages supported by the arm64 > > >> kernel and the hugepage sizes enabled by each. > > >> > > >> Signed-off-by: Punit Agrawal > > >> Cc: Catalin Marinas > > >> Cc: Will Deacon > > >> Cc: Jonathan Corbet > > > > > > Acked-by: Randy Dunlap > > > > Thanks! > > > > Catalin, Will - I assume you'll pick this up at some point? Or do arm64 > > documentation patches get routed by another tree? > > Acked-by: Will Deacon > > Catalin can pick this up for 4.20. Done. -- Catalin
Re: [PATCH v9 00/24] ILP32 for ARM64
On Sat, Oct 13, 2018 at 04:07:31AM +0200, Eugene Syromiatnikov wrote: > On Wed, Oct 10, 2018 at 03:39:07PM +0100, Szabolcs Nagy wrote: > > On 10/10/18 15:10, Eugene Syromiatnikov wrote: > > > * What's the reasoning behind capping syscall arguments to 32 bit? x32 > > >and MIPS N32 do not have such a restriction (and do not need special > > >wrappers for syscalls that pass 64-bit values as a result, except > > >when they do, as it is the case for preadv2 on x32); moreover, that > > >would lead to insurmountable difficulties for AArch64 ILP32 tracers > > >that try to trace LP64 tracees, as it would be impossible to pass > > >64-bit addresses to process_vm_{read,write} or ptrace PEEK/POKE. > > > > but that's necessarily the case for all ilp32 abis: > > the userspace syscall function receives 32bit > > arguments so even if the kernel abi takes 64bit > > args you cannot use that from c code. (the libc > > does not even know which args should be sign or > > zero extended.) > > glibc's syscall() prototype has kernel_ulong_t as its arguments (more > specifically, to __syscall_ulong_t, which is 64-bit wide on x32; it > should also have kernel_long_t as its return type instead of long, > but that's another story), so it works perfectly fine in case of x32. This would have been my preferred approach but the libc community were not entirely happy with it as it breaks POSIX compatibility: https://sourceware.org/bugzilla/show_bug.cgi?id=16437 http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/323348.html (there are other threads around x32 and people arguing whether POSIX is wrong) Some sometime around version 4 or 5 of this series, we made the move (back) to compat-like ABI. -- Catalin
Re: [PATCH v9 00/24] ILP32 for ARM64
On Sat, Oct 13, 2018 at 04:14:16AM +0200, Eugene Syromiatnikov wrote: > On Wed, Oct 10, 2018 at 04:36:56PM +0100, Catalin Marinas wrote: > > On Wed, Oct 10, 2018 at 04:10:21PM +0200, Eugene Syromiatnikov wrote: > > > I have some questions regarding AArch64 ILP32 implementation for which I > > > failed to find an answer myself: > > > * How ptrace() tracer is supposed to distinguish between ILP32 and LP64 > > >tracees? For MIPS N32 and x32 this is possible based on syscall > > >number, but for AArch64 ILP32 I do not see such a sign. There's also > > >ARM_ip is employed for signalling entering/exiting, I wonder whether > > >it's possible to employ it also for signalling tracee's personality. > > > > With the current implementation, I don't think you can distinguish. From > > the kernel perspective, the register set is the same. What is the > > use-case for this? > > Err, a ptrace()-based tracer trying to trace a process, for example? I first thought it wouldn't matter for ptrace-based tracers since the syscall numbers are (mostly) the same. But the arguments layout in register is indeed different, so I see your point now about having to distinguish. > > We could add a new regset to expose the ILP32 state (NT_ARM_..., I can't > > think of a name now but probably not PER* as this implies PER_LINUX_... > > which is independent from TIF_32BIT_*). > > So that would require an additional ptrace() call on each syscall stop, > is that correct? The ILP32 state does not change at run-time, so it could only do a ptrace() call once and save the information. No need to re-read it on each syscall stop. We could set a high bit in the syscall number reported to the ptrace caller (though not changing the syscall ABI) but I haven't thought of other consequences. For example, can the ptrace caller change the syscall number? > > > * What's the reasoning behind capping syscall arguments to 32 bit? x32 > > >and MIPS N32 do not have such a restriction (and do not need special > > >wrappers for syscalls that pass 64-bit values as a result, except > > >when they do, as it is the case for preadv2 on x32); moreover, that > > >would lead to insurmountable difficulties for AArch64 ILP32 tracers > > >that try to trace LP64 tracees, as it would be impossible to pass > > >64-bit addresses to process_vm_{read,write} or ptrace PEEK/POKE. > > > > We've attempted in earlier versions to allow a mix of 32 and 64-bit > > register values from ILP32 but it got pretty complicated. The entry code > > would need to know which registers need zeroing of the top 32-bit > > If kernel specifies 64-bit wide registers for syscalls, then it's the > caller's (libc's) responsibility to properly sign-extend arguments when > needed, and glibc, for example, already has proper type definitions that > aimed to handle this. We tried, see my other reply. -- Catalin
Re: [PATCH v9 00/24] ILP32 for ARM64
On Sun, Oct 14, 2018 at 09:49:01PM +0200, Arnd Bergmann wrote: > On Sat, Oct 13, 2018 at 9:36 PM Andy Lutomirski wrote: > > > > On Wed, May 16, 2018 at 1:19 AM Yury Norov > > wrote: > > > > > > This series enables AARCH64 with ILP32 mode. > > > > > > As supporting work, it introduces ARCH_32BIT_OFF_T configuration > > > option that is enabled for existing 32-bit architectures but disabled > > > for new arches (so 64-bit off_t userspace type is used by new userspace). > > > Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64. > > > > Second, ILP32 user code is highly unlikely > > to end up with the same struct layout as ILP64 code. The latter seems > > like it should be solved entirely in userspace by adding a way to > > annotate a structure as being a kernel ABI structure and getting the > > toolchain to lay it out as if it were ILP64 even though the target is > > ILP32. > > The syscall ABI could be almost completely abstracted in glibc, the > main issue is ioctl and a couple of related interfaces that pass data > structures (read() on /dev/input/*, mmap on /dev/snd/* > or raw sockets, fcntl). There is another case on struct siginfo which has some pointers and it wouldn't look like an LP64 structure at all (and glibc doesn't normally intercept the sighandler call to rewrite the structure). We could add padding around void * members as the kernel zeros them, I don't recall the kernel reading these pointers from user. Anyway, using something that resembles compat_siginfo looked the simplest for ILP32. -- Catalin
Re: [PATCH v7 7/8] arm64: update Documentation/arm64/tagged-pointers.txt
On Wed, Oct 10, 2018 at 04:09:25PM +0200, Andrey Konovalov wrote: > On Wed, Oct 3, 2018 at 7:32 PM, Catalin Marinas > wrote: > > On Tue, Oct 02, 2018 at 03:12:42PM +0200, Andrey Konovalov wrote: [...] > > Also, how is user space supposed to know that it can now pass tagged > > pointers into the kernel? An ABI change (or relaxation), needs to be > > advertised by the kernel, usually via a new HWCAP bit (e.g. HWCAP_TBI). > > Once we have a HWCAP bit in place, we need to be pretty clear about > > which syscalls can and cannot cope with tagged pointers. The "as of now" > > implies potential further relaxation which, again, would need to be > > advertised to user in some (additional) way. > > How exactly should I do that? Something like this [1]? Or is it only > for hardware specific things and for this patchset I need to do > something else? > > [1] > https://github.com/torvalds/linux/commit/7206dc93a58fb76421c4411eefa3c003337bcb2d Thinking some more on this, we should probably keep the HWCAP_* bits for actual hardware features. Maybe someone else has a better idea (the linux-abi list?). An option would be to make use of AT_FLAGS auxv (currently 0) in Linux. I've seen some MIPS patches in the past but nothing upstream. Yet another option would be for the user to probe on some innocuous syscall currently returning -EFAULT on tagged pointer arguments but I don't particularly like this. > >> - - pointer arguments to system calls, including pointers in structures > >> - passed to system calls, > >> + - pointer arguments (including pointers in structures), which don't > >> +describe virtual memory ranges, passed to system calls > > > > I think we need to be more precise here... > > In what way? In the way of being explicit about which syscalls support tagged pointers, unless we find a good reason to support tagged pointers on all syscalls and avoid any lists. -- Catalin
Re: [PATCH v7 0/8] arm64: untag user pointers passed to the kernel
On Wed, Oct 17, 2018 at 01:25:42PM -0700, Evgenii Stepanov wrote: > On Wed, Oct 17, 2018 at 7:20 AM, Andrey Konovalov > wrote: > > On Wed, Oct 17, 2018 at 4:06 PM, Vincenzo Frascino > > wrote: > >> I have been thinking a bit lately on how to address the problem of > >> user tagged pointers passed to the kernel through syscalls, and > >> IMHO probably the best way we have to catch them all and make sure > >> that the approach is maintainable in the long term is to introduce > >> shims that tag/untag the pointers passed to the kernel. > >> > >> In details, what I am proposing can live either in userspace > >> (preferred solution so that we do not have to relax the ABI) or in > >> kernel space and can be summarized as follows: > >> - A shim is specific to a syscall and is called by the libc when > >> it needs to invoke the respective syscall. > >> - It is required only if the syscall accepts pointers. > >> - It saves the tags of a pointers passed to the syscall in memory > >> (same approach if the we are passing a struct that contains > >> pointers to the kernel, with the difference that all the tags of > >> the pointers in the struct need to be saved singularly) > >> - Untags the pointers > >> - Invokes the syscall > >> - Retags the pointers with the tags stored in memory > >> - Returns > >> > >> What do you think? > > > > If I correctly understand what you are proposing, I'm not sure if that > > would work with the countless number of different ioctl calls. For > > example when an ioctl accepts a struct with a bunch of pointer fields. > > In this case a shim like the one you propose can't live in userspace, > > since libc doesn't know about the interface of all ioctls, so it can't > > know which fields to untag. The kernel knows about those interfaces > > (since the kernel implements them), but then we would need a custom > > shim for each ioctl variation, which doesn't seem practical. > > The current patchset handles majority of pointers in a just a few > common places, like copy_from_user. Userspace shims will need to untag > & retag all pointer arguments - we are looking at hundreds if not > thousands of shims. They will also be located in a different code base > from the syscall / ioctl implementations, which would make them > impossible to keep up to date. I think ioctls are a good reason not to attempt such user-space shim layer (though it would have been much easier for the kernel ;)). -- Catalin
Re: [PATCH v10 08/22] kasan, arm64: untag address in __kimg_to_phys and _virt_addr_is_linear
On Tue, Nov 06, 2018 at 06:30:23PM +0100, Andrey Konovalov wrote: > --- a/arch/arm64/include/asm/memory.h > +++ b/arch/arm64/include/asm/memory.h > @@ -92,6 +92,15 @@ > #define KASAN_THREAD_SHIFT 0 > #endif > > +#ifdef CONFIG_KASAN_SW_TAGS > +#define KASAN_TAG_SHIFTED(tag) ((unsigned long)(tag) << 56) > +#define KASAN_SET_TAG(addr, tag) (((addr) & ~KASAN_TAG_SHIFTED(0xff)) | \ > + KASAN_TAG_SHIFTED(tag)) > +#define KASAN_RESET_TAG(addr)KASAN_SET_TAG(addr, 0xff) > +#else > +#define KASAN_RESET_TAG(addr)addr > +#endif I think we should reuse the untagged_addr() macro we have in uaccess.h (make it more general and move to another header file). -- Catalin
Re: [PATCH v10 12/22] kasan, arm64: fix up fault handling logic
On Tue, Nov 06, 2018 at 06:30:27PM +0100, Andrey Konovalov wrote: > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 7d9571f4ae3d..d9a84d6f3343 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -141,6 +142,8 @@ void show_pte(unsigned long addr) > pgd_t *pgdp; > pgd_t pgd; > > + addr = (unsigned long)kasan_reset_tag((void *)addr); > + > if (addr < TASK_SIZE) { > /* TTBR0 */ > mm = current->active_mm; I think we should clear the tag earlier on in the fault handling code, before reaching show_pte(). -- Catalin
Re: [RESEND PATCH v2] ARM64: ACPI: Update documentation for latest specification version
On Wed, Mar 16, 2016 at 10:20:23AM +0530, Vikas Sajjan wrote: > On Wed, Mar 16, 2016 at 1:58 AM, Al Stone wrote: > > The ACPI 6.1 specification was recently released at the end of January 2016, > > but the arm64 kernel documentation for the use of ACPI was written for the > > 5.1 version of the spec. There were significant additions to the spec that > > had not yet been mentioned -- for example, the 6.0 mechanisms added to make > > it easier to define processors and low power idle states, as well as the > > 6.1 addition allowing regular interrupts (not just from GPIO) be used to > > signal ACPI general purpose events. > > > > This patch reflects going back through and examining the specs in detail > > and updating content appropriately. Whilst there, a few odds and ends of > > typos were caught as well. This brings the documentation up to date with > > ACPI 6.1 for arm64. > > > > RESEND: > >-- Corrected From: header and added missing Cc's > > > > Changes for v2: > >-- Clean up white space (Harb Abdulhahmid) > >-- Clarification on _CCA usage (Harb Abdulhamid) > >-- IORT moved to required from recommended (Hanjun Guo) > >-- Clarify IORT description (Hanjun Guo) > > > > Signed-off-by: Al Stone > > Cc: Catalin Marinas > > Cc: Will Deacon > > Cc: Jonathan Corbet > > --- > > Documentation/arm64/acpi_object_usage.txt | 445 > > ++ > > Documentation/arm64/arm-acpi.txt | 28 +- > > 2 files changed, 356 insertions(+), 117 deletions(-) > > > > diff --git a/Documentation/arm64/acpi_object_usage.txt > > b/Documentation/arm64/acpi_object_usage.txt > > index a6e1a18..29bc1a1 100644 > > --- a/Documentation/arm64/acpi_object_usage.txt > > +++ b/Documentation/arm64/acpi_object_usage.txt > > @@ -11,15 +11,16 @@ outside of the UEFI Forum (see Section 5.2.6 of the > > specification). > > > > For ACPI on arm64, tables also fall into the following categories: > > > > - -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT > > + -- Required: DSDT, FADT, GTDT, IORT, MADT, MCFG, RSDP, SPCR, XSDT > > > > - -- Recommended: BERT, EINJ, ERST, HEST, SSDT > > + -- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT > > > > - -- Optional: BGRT, CPEP, CSRT, DRTM, ECDT, FACS, FPDT, MCHI, MPST, > > - MSCT, RASF, SBST, SLIT, SPMI, SRAT, TCPA, TPM2, UEFI > > + -- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, MCHI, > > + MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO, TCPA, > > + TPM2, UEFI, XENV > > > > - -- Not supported: BOOT, DBG2, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, > > - LPIT, MSDM, RSDT, SLIC, WAET, WDAT, WDRT, WPBT > > + -- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, > > + MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT > > > > > > Table Usage for ARMv8 Linux > > @@ -50,7 +51,8 @@ CSRT Signature Reserved (signature == "CSRT") > > > > DBG2 Signature Reserved (signature == "DBG2") > > == DeBuG port table 2 == > > - Microsoft only table, will not be supported. > > + License has changed and should be usable. Optional if used instead > > + of earlycon= on the command line. > > > > DBGP Signature Reserved (signature == "DBGP") > > == DeBuG Port table == > > @@ -133,10 +135,11 @@ GTDT Section 5.2.24 (signature == "GTDT") > > > > HEST Section 18.3.2 (signature == "HEST") > > == Hardware Error Source Table == > > - Until further error source types are defined, use only types 6 (AER > > - Root Port), 7 (AER Endpoint), 8 (AER Bridge), or 9 (Generic Hardware > > - Error Source). Firmware first error handling is possible if and > > only > > - if Trusted Firmware is being used on arm64. > > + ARM-specific error sources have been defined; please use those or > > the > > + PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 > > (AER > > + Bridge), or use type 9 (Generic Hardware Error Source). Firmware > > first > > + error handling is possible if and only if Trusted Firmware is being > > + used on arm64. > > > > Must be supplied if RAS support is provided by the platform. It > > is recommended this table be supplied. > > @@ -149,20 +152,27 @@ IBFT Signature Reserved (signature == "