RFC: Jail Capsules

Kyle Evans Sun, 31 Aug 2025 20:27:14 -0700

Hi,

I've been toying around with this idea for a bit, and I wanted to solicit some 
opinions on the
design and whether this seems like something that FreeBSD would be find useful 
outside of my
own tree.


Background:  I've been thinking a lot lately about secure product design and 
threat models, and
wondering what kinds of things one could incorporate into their design as a 
defense in depth
kind of thing.  This is, of course, not something I would pitch as a strong 
isolation mechanism,
but rather as a mechanism to protect against some less sophisticated threats.

The basic idea that I'm proposing is the ability to seal a jail to turn it into 
a 'capsule'.  You
can either seal it at creation time, or while it's already running.  If you 
create the jail as a
capsule, you must attach to it at the same time.  Sealing it later is a 
compromise to give the
system some runway to configure the jail first, presumably before other user 
activity could
start and try to compromise the capsule before it's sealed.

Once sealed, the capsule has the following properties (that I've thought about, 
at least):

 - The capsule may not be unsealed
 - Processes outside of the capsule may not attach to it
 - Unprivileged users in the parent cannot see or tamper with processes in the 
jail, regardless of
    the security.bsd.see_* sysctls.  persist and all of the 
allow.unprivileged_* jail knobs will be
    forcibly unset and result in errors if one attempts to set them after
 - Privileged processes may see and signal the processes in a capsule if 
securelevel is <= 0, but it
    cannot attach to, debug, or cpuset individual processes in a capsule at any 
securelevel

The premise of a capsule is that you (attempt to) seal off access points into 
the jail besides for a
well-defined (by the software in the capsule) security boundary.  It is 
naturally not protected if
the kernel is compromised or in some other scenarios, but you eliminate a 
number of threats where an
attacker can manage to make syscalls but doesn't have the tools available to 
escalate further.  Capsules
would simply be a building block to a larger secure design.

An obvious elephant in the room here is filesystem access.  A capsule would 
force an attacker to get
a little more creative if they want to tamper with capsule processes, in 
particular if it's combined
with a heightened securelevel (or removal of other features like /dev/mem 
entirely), but it does not
stop an attacker from filesystem tampering to disrupt capsule activities.  This 
kind of leaves a huge
part of protecting itself up to application design, which arguably eliminates 
many benefits of the idea.

I don't really have a good answer for how one might solve that.  The rest of 
the design is fairly
straightforward to implement, but I would rather suspect it might get hairy if 
you try to block off parts
of the filesystem (even from root, maybe contingent on securelevel) based on 
whether the path has been
used for a capsule or not.

Comments/questions/tomatoes welcome.  The idea was somewhat inspired by 
enclaves and a design where one
can slice off some CPUs to dedicate to the capsule alone to try and mitigate 
some side-channel
possibilities from other user processes, but the initial capsule thought 
process doesn't go to the
extent of trying to carve out memory to dedicate to a capsule.

Thanks,

Kyle Evans

RFC: Jail Capsules

Reply via email to