Dear Linux Hardening, Security, and Memory Management Mailing Lists,
This is primarily an FYI and an RFC. I have some code, included below,
that could be dropped into a *.ko for the 6.1.X kernel, but really this
mail is to query about ideas for acceptable upstream changes.
Thank you ahead of time for reading! If the title alone of this email
sticks out and makes sense immediately, feel free to skip the
introduction below.
INTRODUCTION
For the past few months, I have been sparring with recent CVE PoCs in
the kernel, applying monkey patches to dynamic data structure
allocations, attempting to prevent data-only attacks which use write
gadgets to modify dynamically allocated struct fields otherwise declared
constant.
I wanted to share, briefly, what I feel is a reasonable and general
solution to the standard contemporary exploit procedure. For those
unfamiliar with recent PoC's, see a case study of recent exploits in Man
Yue Mo's article here:
https://github.blog/security/vulnerability-research/the-android-kernel-mitigations-obstacle-race/
Particularly, understanding the "Running arbitrary root commands using
ret2kworker(TM)" section will give a general idea of the issue.
Summarizing, there are thousands of dynamic data structures alloc'd and
free'd in the kernel all the time, for files, for processes, and so
forth, and it is elementary to manipulate any instance of data, but hard
to protect every single one of them. These range from trng device
pointers to kworker queues---everything passing through vmalloc.
The strawman approach presented here is for security engineers to read
CVE-XYZ-ABC PoC, identify the portion of the system being manipulated,
and patch the allocation handler to protect just that data at the
page-table layer, by:
- Reorganizing allocations of those structures so that they are on
the same 2MB hugepage, adjacently, as otherwise existing hardware
support to prevent their mutation (PTE flags) will trigger for unrelated
data allocated adjacently.
- Writing a handler to ensure non-malicious modifications, e.g. keeping
"const" fields const, ensuring modifications to other fields happen at
the right physical PC values and the right pages, handling atomic
updates so that the exception fault on these values maintains ordering
under race conditions (maybe "doubling up" on atomic assembly operations
due to certain microarch issues at the chipset level, see below), and so
on, and so forth.
Eventually, this Sisyphean task amounts to a mountain worth of
point-patches and encoded wisdom, valuable but absurd insofar as there
are a thousand more places for an exploit to manipulate instead of the
protected ones.
DATATYPE PARTITIONED VIRTUAL MEMORY ALLOCATION
The above process can be generalized by changing Linux's vmalloc to
behave more like seL4 (though not identically), by tying allocation
itself to the typing of an object:
https://docs.sel4.systems/Tutorials/untyped.html "objects should be
Without the caveat that objects must be "allocated in order of size,
largest first, to avoid wasting memory."
I demonstrated something similar previously to prevent the intermixed
allocation of SECCOMP BPF code pages with data on ARM64's Android Kernel
here (with which you may be familiar):
https://lore.kernel.org/all/20240423095843.446565600-1-mbl...@motorola.com/
That said, the above patch does not do the same for other critical
dynamically allocated data.
So, for instance, to prevent struct file manipulation, I've written the
following code into a init-time loaded kernel (v6.1.x) module:
filp_cachep_ind =
(struct kmem_cache **)kallsyms_lookup_name_ind("filp_cachep");
/* Just nix the existing file cache for one which is page-aligned */
*filp_cachep_ind = kmem_cache_create(
"filp", sizeof(struct file), PAGE_SIZE,
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
I.e. aligning cache allocations to PAGE_SIZE. See the appendix for
associated module code.
Of course, this is a little insane since:
(1) I'm effectively double allocating the cache to change how
the structs are allocated, because I can't change the kernel's
init process (part of this has to do with Google's GKI).
(2) The kmem infrastructure needs to be also monkey patched so
that this "PAGE_SIZE" alignment actually indicates that objects
can still be allocated next to eachother at the originally
set alignment, reducing dead space due to wasted bytes (not
implemented). And, most important
(3) struct file is just one case of thousands.
However, it seems fine for protecting a specific, given file allocation
targeted by something like:
https://github.com/chompie1337/s8_2019_2215_poc/blob/34f6481ed4ed4cff661b50ac465fc73655b82f64/poc/knox_bypass.c#L50
given you also have the appropriate protection handlers (see appendix
below), this works fine even outside of access to a HVCI system.
Hopefully the above reasoning is clear enough. If so, the proposal
(though it