On 12/17/24 16:11, Mark Johnston wrote:
On Tue, Dec 17, 2024 at 03:46:53PM -0600, Kyle Evans wrote:
On 12/17/24 15:19, Mark Johnston wrote:
We have a number of sysctls which are defined as tunables, whose values
cannot be changed after boot.  Some of these sysctls, such as net.fibs,
are per-VNET so could in principle be changed at jail creation time.
I'd find it useful to be able to pass a set of tunables to jail_set(2),
so that corresponding VNET jail has tunables set to the specified
values.  For instance, it'd be useful in test suites where I want to
exercise the network stack with different VNET sysctl settings, without
having to configure the test runner at boot time.

I think the implementation would involve passing an environment to
vnet_alloc(), which would copy the parent VNET context and then iterate
over all VNET tunables in the system, invoking
sysctl_load_tunable_by_oid_locked() in such a way that the custom
environment is used to update the tunable's value.


Related-ish, I've wanted to float the idea of "virtualizing" kenv by making
it a property of struct prison instead of global.  Primarily, because:

   1) kenv today is super wide-open.  Unprivileged users and jails can all
view kenv, and while we do an OK-ish job of zapping privileged stuff from
it, we do have some notable exceptions that it'd be better to not leak.

   2.) I can imagine some use-cases for products where kenv is read from
userland, being able to override those on a per-jail basis for product
testing is generally a good thing (as an extension of the idea of just
sysctl-tunables)

The idea being that kenv(2) could be used from within a jail, since it
modifies only that jail's kenv?  I'd worry a bit about the implications
of supporting that for variables that aren't explicitly virtualized,
like VNET tunables are.


I thought the priv(9) in use today for modification was allowed for jailed root, but I see that I hallucinated that. I don't see any reason we couldn't call it read-only -- it would still be an improvement over today, where a jailed user can use kenv(2) and dump the actual kenv from the host.

I guess it's probably okay since tunables are mostly consumed by
SYSINITs that run in prison0, so wouldn't be affected by per-jail
settings, but that'd need some more auditing.

We'd address #1 by just switching the targets for fetching/dumping in
kenv(2) to the jail's own kenv, and possibly keeping it immutable without a
priv(9).  With the right design, vnet_alloc() wouldn't need to become aware
of an environment; just the rest of your proposal.

I like the idea of referencing kenv via the current thread's prison.  It
doesn't seem too difficult to refactor existing kenvp[] references to
support that.

Yeah, I wouldn't think implementation would be too bad- there are probably some debates that could be had about semantics when fetching from the environment, whether all existing getenv() should just search the current prison's environment or search all the way up the prison hierarchy until it hits prison0. KENV_GET would need special handling if getenv*() searches the hierarchy, but nothing too horrible.

Thanks,

Kyle Evans

Reply via email to