From: Navneet Singh
Dynamic Capacity CXL regions must allow memory to be added or removed
dynamically. In addition to the quantity of memory available the
location of the memory within a DC partition is dynamic based on the
extents offered by a device. CXL DAX regions must accommodate the
spars
Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal. BIOS may have control over non-DCD event
processing. DCD interrupt configuration needs to be separate from
memory event interrupt configuration.
Factor out event interrupt setting validation.
Reviewed
Basics and overview
===
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain contro
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta
Acked-by: Rob Herring (Arm)
---
Documentation/devicetree/bindings/riscv/extensions.yaml | 14 ++
1 file changed, 14 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml
b
From: Samuel Holland
Some bits in the [ms]envcfg CSR, such as the CFI state and pointer
masking mode, need to be controlled on a per-thread basis. Support this
by keeping a copy of the CSR value in struct thread_struct and writing
it during context switches. It is safe to discard the old CSR valu
cxl_test provides a good way to ensure quick smoke and regression
testing. The complexity of Dynamic Capacity (DC) extent processing as
well as the complexity of the new sparse DAX regions can mostly be
tested through cxl_test. This includes management of sparse regions and
DAX devices on those r
Carves out space in arch specific thread struct for cfi status and shadow
stack in usermode on riscv.
This patch does following
- defines a new structure cfi_status with status bit for cfi feature
- defines shadow stack pointer, base and size in cfi_status structure
- defines offsets to new member
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for shadow stack memory since they lead to windows
where memory is a
`fork` implements copy on write (COW) by making pages readonly in child
and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
Assumption is that page is readable and on fault copy on write happens.
To implement COW on shadow stack pages, clearing up W bit makes them XWR
zicfiss / zicfilp introduces a new exception to priv isa `software check
exception` with cause code = 18. This patch implements software check
exception.
Additionally it implements a cfi violation handler which checks for code
in xtval. If xtval=2, it means that sw check exception happened because
From: Andy Chiu
The function save_v_state() served two purposes. First, it saved
extension context into the signal stack. Then, it constructed the
extension header if there was no fault. The second part is independent
of the extension itself. As a result, we can pull that part out, so
future exte
Implement architecture agnostic prctls() interface for setting and getting
shadow stack status.
prctls implemented are PR_GET_SHADOW_STACK_STATUS,
PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS.
As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only
PR_SHADOW_STACK_ENA
prctls implemented are:
PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and
PR_LOCK_INDIR_BR_LP_STATUS.
On trap entry, ELP state is recorded in sstatus image on stack and SR_ELP
in CSR_STATUS is cleared.
Signed-off-by: Deepak Gupta
---
arch/riscv/include/asm/usercfi.h | 16 -
arch/
Three architectures (x86, aarch64, riscv) have support for indirect branch
tracking feature in a very similar fashion. On a very high level, indirect
branch tracking is a CPU feature where CPU tracks branches which uses
memory operand to perform control transfer in program. As part of this
tracking
Save shadow stack pointer in sigcontext structure while delivering signal.
Restore shadow stack pointer from sigcontext on sigreturn.
As part of save operation, kernel uses `ssamoswap` to save snapshot of
current shadow stack on shadow stack itself (can be called as a save
token). During restore o
Adding enumeration of zicfilp and zicfiss extensions in hwprobe syscall.
Signed-off-by: Deepak Gupta
---
arch/riscv/include/uapi/asm/hwprobe.h | 2 ++
arch/riscv/kernel/sys_hwprobe.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h
b/arch/riscv/
Expose a new register type NT_RISCV_USER_CFI for risc-v cfi status and
state. Intentionally both landing pad and shadow stack status and state
are rolled into cfi state. Creating two different NT_RISCV_USER_XXX would
not be useful and wastage of a note type. Enabling or disabling of feature
is not
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ |
VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is
updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ).
This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protect
From: Samuel Holland
Currently, we enable cbo.zero for usermode on each hart that supports
the Zicboz extension. This means that the [ms]envcfg CSR value may
differ between harts. Other features, such as pointer masking and CFI,
require setting [ms]envcfg bits on a per-thread basis. The combinati
From: Clément Léger
Add necessary SBI definitions to use the FWFT extension.
Signed-off-by: Clément Léger
---
arch/riscv/include/asm/sbi.h | 27 +++
1 file changed, 27 insertions(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 98f631b0
This patch adds support for detecting zicfiss and zicfilp. zicfiss and
zicfilp stands for unprivleged integer spec extension for shadow stack
and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy
lights up bit in cpu feature
This commit adds a kernel command line option using which user cfi can be
disabled.
Signed-off-by: Deepak Gupta
---
arch/riscv/kernel/usercfi.c | 20
1 file changed, 20 insertions(+)
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index 04b0305943b1..
Kernel will have to perform shadow stack operations on user shadow stack.
Like during signal delivery and sigreturn, shadow stack token must be
created and validated respectively. Thus shadow stack access for kernel
must be enabled.
In future when kernel shadow stacks are enabled for linux kernel,
Adds kselftest for RISC-V control flow integrity implementation for user
mode. There is not a lot going on in kernel for enabling landing pad for
user mode. cfi selftest are intended to be compiled with zicfilp and
zicfiss enabled compiler. Thus kselftest simply checks if landing pad and
shadow sta
This patch creates a config for shadow stack support and landing pad instr
support. Shadow stack support and landing instr support can be enabled by
selecting `CONFIG_RISCV_USER_CFI`. Selecting `CONFIG_RISCV_USER_CFI` wires
up path to enumerate CPU support and if cpu support exists, kernel will
sup
Adding documentation on shadow stack for user mode on riscv and kernel
interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta
---
Documentation/arch/riscv/index.rst | 1 +
Documentation/arch/riscv/zicfiss.rst | 176 +++
2 files change
Adding documentation on landing pad aka indirect branch tracking on riscv
and kernel interfaces exposed so that user tasks can enable it.
Signed-off-by: Deepak Gupta
---
Documentation/arch/riscv/index.rst | 1 +
Documentation/arch/riscv/zicfilp.rst | 115 +++
From: Samuel Holland
Now that the [ms]envcfg CSR value is maintained per thread, not per
hart, riscv_user_isa_enable() only needs to be called once during boot,
to set the value for the init task. This also allows it to be marked as
__init.
Reviewed-by: Andrew Jones
Reviewed-by: Conor Dooley
R
Userspace specifies CLONE_VM to share address space and spawn new thread.
`clone` allow userspace to specify a new stack for new thread. However
there is no way to specify new shadow stack base address without changing
API. This patch allocates a new shadow stack whenever CLONE_VM is given.
In cas
This patch implements creating shadow stack pte (on riscv). Creating
shadow stack PTE on riscv means that clearing RWX and then setting W=1.
Signed-off-by: Deepak Gupta
Reviewed-by: Alexandre Ghiti
---
arch/riscv/include/asm/pgtable.h | 10 ++
1 file changed, 10 insertions(+)
diff --gi
From: Mark Brown
Three architectures (x86, aarch64, riscv) have announced support for
shadow stacks with fairly similar functionality. While x86 is using
arch_prctl() to control the functionality neither arm64 nor riscv uses
that interface so this patch adds arch-agnostic prctl() support to
get
From: Navneet Singh
Extent information can be helpful to the user to coordinate memory usage
with the external orchestrator and FM.
Expose the details of region extents by creating the following
sysfs entries.
/sys/bus/cxl/devices/dax_regionX/extentX.Y
/sys/bus/cxl/devices/dax_r
Additional DCD region (partition) information is contained in the DSMAS
CDAT tables, including performance, read only, and shareable attributes.
Match DCD partitions with DSMAS tables and store the meta data.
Reviewed-by: Jonathan Cameron
Signed-off-by: Ira Weiny
---
Changes:
[Fan: remove unwan
Updating __show_regs to print captured shadow stack pointer as well.
On tasks where shadow stack is disabled, it'll simply print 0.
Signed-off-by: Deepak Gupta
Reviewed-by: Alexandre Ghiti
---
arch/riscv/kernel/process.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ar
Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal. BIOS may have control over non-DCD event
processing. DCD interrupt configuration needs to be separate from
memory event interrupt configuration.
Split cxl_event_config_msgnums() from irq setup in prepa
create_pmem_region_store() and create_ram_region_store() are identical
with the exception of the region mode. With the addition of DC region
mode this would end up being 3 copies of the same code.
Refactor create_pmem_region_store() and create_ram_region_store() to use
a single common function to
pte_mkwrite creates PTEs with WRITE encodings for underlying arch.
Underlying arch can have two types of writeable mappings. One that can be
written using regular store instructions. Another one that can only be
written using specialized store instructions (like shadow stack stores).
pte_mkwrite ca
From: Navneet Singh
Dynamic Capacity Devices (DCD) support extent change notifications
through the event log mechanism. The interrupt mailbox commands were
extended in CXL 3.1 to support these notifications. Firmware can't
configure DCD events to be FW controlled but can retain control of
memor
From: Navneet Singh
Until now region modes and decoder modes were equivalent in that both
modes were either PMEM or RAM. The addition of Dynamic
Capacity partitions defines up to 8 DC partitions per device.
The region mode is thus no longer equivalent to the endpoint decoder
mode. IOW the endp
The event buffer does not need to be allocated if something has failed in
setting up event irq's.
In prep for adjusting event configuration for DCD events move the buffer
allocation to the end of the event configuration.
Reviewed-by: Davidlohr Bueso
Reviewed-by: Dave Jiang
Reviewed-by: Jonathan
From: Navneet Singh
Devices which optionally support Dynamic Capacity (DC) are configured
via mailbox commands. CXL 3.1 requires the host to issue the Get DC
Configuration command in order to properly configure DCDs. Without the
Get DC Configuration command DCD can't be supported.
Implement th
Additional DCD functionality is being added to this call which will be
simplified by the use of guard() with the cxl_dpa_rwsem.
Convert the function to use guard() prior to adding DCD functionality.
Suggested-by: Jonathan Cameron
Signed-off-by: Ira Weiny
---
Changes:
[Jonathan: new patch]
---
From: Navneet Singh
Per the CXL 3.1 specification software must check the Command Effects
Log (CEL) for dynamic capacity command support.
Detect support for the DCD commands while reading the CEL, including:
Get DC Config
Get DC Extent List
Add DC Response
Releas
From: Navneet Singh
To properly configure CXL regions on Dynamic Capacity Devices (DCD),
user space will need to know the details of the DC partitions available.
Expose dynamic capacity capabilities through sysfs.
Signed-off-by: Navneet Singh
Reviewed-by: Jonathan Cameron
Co-developed-by: Ira
From: Navneet Singh
Endpoint decoder mode is used to represent the partition the decoder
points to such as ram or pmem.
Expand the mode to allow a decoder to point to a specific DC partition
(Region).
Signed-off-by: Navneet Singh
Reviewed-by: Dave Jiang
Reviewed-by: Jonathan Cameron
Co-devel
From: Navneet Singh
One or more decoders each pointing to a Dynamic Capacity (DC) partition
form a CXL software region. The region mode reflects composition of
that entire software region. Decoder mode reflects a specific DC
partition. DC partitions are also known as DC regions per CXL
specifi
From: Navneet Singh
A dynamic capacity device (DCD) sends events to signal the host for
changes in the availability of Dynamic Capacity (DC) memory. These
events contain extents describing a DPA range and meta data for memory
to be added or removed. Events may be sent from the device at any tim
cxl_dpa_to_region() finds the region from a tuple.
The search involves finding the device endpoint decoder as well.
Dynamic capacity extent processing uses the endpoint decoder HPA
information to calculate the HPA offset. In addition, well behaved
extents should be contained within an endpoint d
VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) is used to encode shadow stack
VMA on three architectures (x86 shadow stack, arm GCS and RISC-V shadow
stack). In case architecture doesn't implement shadow stack, it's VM_NONE
Introducing a helper `is_shadow_stack_vma` to determine shadow stack vma
or not.
The event logs test was created as static arrays as an easy way to mock
events. Dynamic Capacity Device (DCD) test support requires events be
generated dynamically when extents are created or destroyed.
The current event log test has specific checks for the number of events
seen including log ove
From: Navneet Singh
CXL rev 3.1 section 8.2.9.2.1 adds the Dynamic Capacity Event Records.
User space can use trace events for debugging of DC capacity changes.
Add DC trace points to the trace log.
Signed-off-by: Navneet Singh
Reviewed-by: Jonathan Cameron
Reviewed-by: Dave Jiang
Reviewed-b
Dynamic Capacity regions must limit dev dax resources to those areas
which have extents backing real memory. Such DAX regions are dubbed
'sparse' regions. In order to manage where memory is available four
alternatives were considered:
1) Create a single region resource child on region creation w
The Coherent Device Attribute Table (CDAT) Device Scoped Memory Affinity
Structure (DSMAS) version 1.04 [1] defines flags to indicate if a DPA range
is read only and/or shared.
Add read only and shareable flag definitions.
This change was merged in ACPI via PR 976.[2]
Link:
https://uefi.org/sit
On Wed, Oct 09, 2024 at 12:28:03PM +0100, Mark Brown wrote:
On Tue, Oct 08, 2024 at 03:36:48PM -0700, Deepak Gupta wrote:
riscv will need an implementation for exit_thread to clean up shadow stack
when thread exits. If current thread had shadow stack enabled, shadow
stack is allocated by defaul
The device DAX structure is being enhanced to track additional DCD
information. Specifically the range tuple needs additional parameters.
The current range tuple is not fully documented and is large enough to
warrant its own definition.
Separate the struct dax_dev_range definition and document it
From: Navneet Singh
To support Dynamic Capacity Devices (DCD) endpoint decoders will need to
map DC partitions (regions). In addition to assigning the size of the
DC partition, the decoder must assign any skip value from the previous
decoder. This must be done within a contiguous DPA space.
Tw
zicfiss and zicfilp extension gets enabled via b3 and b2 in *envcfg CSR.
menvcfg controls enabling for S/HS mode. henvcfg control enabling for VS
while senvcfg controls enabling for U/VU mode.
zicfilp extension extends *status CSR to hold `expected landing pad` bit.
A trap or interrupt can occur b
On Tue, Oct 29, 2024 at 04:45:00PM +, Marc Zyngier wrote:
> Mark Brown wrote:
> > + ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
> > + ID_AA64ISAR3_EL1_FAMINMAX)),
> Please add the required sanitisation of the register so that we do not
> get an
On Mon, 28 Oct 2024 20:24:17 +,
Mark Brown wrote:
>
> ID_AA64ISAR3_EL1 is currently marked as unallocated in KVM but does have a
> number of bitfields defined in it. Expose FPRCVT and FAMINMAX, two simple
> instruction only extensions to guests.
>
> Signed-off-by: Mark Brown
> ---
> arch/a
On 9/12/24 13:55, Charlie Jenkins wrote:
> Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0" which
> allows userspace to probe for the new RISCV_ISA_VENDOR_EXT_XTHEADVECTOR
> vendor extension.
I believe it's more advantageous to use RISCV_HWPROBE_KEY_VENDOR_EXT_0
to ensure that this k
From: Navneet Singh
DAX regions which map dynamic capacity partitions require that memory be
allowed to come and go. Recall sparse regions were created for this
purpose. Now that extents can be realized within DAX regions the DAX
region driver can start tracking sub-resource information.
The t
From: Mark Brown
Since multiple architectures have support for shadow stacks and we need to
select support for this feature in several places in the generic code
provide a generic config option that the architectures can select.
Suggested-by: David Hildenbrand
Acked-by: David Hildenbrand
Signe
From: Navneet Singh
Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash. In this case it is expected
that the creation of a new region on top of a DC partition can read
those extents and surface them for continued use.
Once all endpoint d
Dan Williams writes:
> Alistair Popple wrote:
> [..]
>
>> >> > It follows that that the DMA-idle condition still needs to look for the
>> >> > case where the refcount is > 1 rather than 0 since refcount == 1 is the
>> >> > page-mapped-but-DMA-idle condition.
>>
>> Because if the DAX page-cache
To be able to switch VMware products running on Linux to KVM some minor
changes are required to let KVM run/resume unmodified VMware guests.
First allow enabling of the VMware backdoor via an api. Currently the
setting of the VMware backdoor is limited to kernel boot parameters,
which forces all V
VMware products handle hypercalls in userspace. Give KVM the ability
to run VMware guests unmodified by fowarding all hypercalls to the
userspace.
Enabling of the KVM_CAP_X86_VMWARE_HYPERCALL_ENABLE capability turns
the feature on - it's off by default. This allows vmx's built on top
of KVM to sup
Allow enabling of the vmware backdoor on a per-vm basis. The vmware
backdoor could only be enabled systemwide via the kernel parameter
kvm.enable_vmware_backdoor which required modifying the kernels boot
parameters.
Add the KVM_CAP_X86_VMWARE_BACKDOOR cap that enables the backdoor at the
hyperviso
Add a testcase to exercise KVM_CAP_X86_VMWARE_HYPERCALL and validate
that KVM exits to userspace on hypercalls and registers are correctly
preserved.
Signed-off-by: Zack Rusin
Cc: Doug Covelli
Cc: Paolo Bonzini
Cc: Jonathan Corbet
Cc: Sean Christopherson
Cc: Thomas Gleixner
Cc: Ingo Molnar
68 matches
Mail list logo