On 2025-07-11 02:15, Bjoern A. Zeeb wrote:
On Fri, 11 Jul 2025, Doug Rabson wrote:

I do have if_bridge loaded on the base system. With your examples, I can verify that creating bridges and epairs as well as adding some of those
epairs to the bridge works in a simple vnet=new jail. For the scenario
where network management for the host is delegated to a trusted jail, I
haven't been able to create a bridge:

jail -c host.hostname=foo vnet=inherit path=/ persist
jexec <JID>
root@foo:/ # ifconfig bridge create
ifconfig: socket(family 2,SOCK_DGRAM): Protocol not supported

...

Thanks for the feedback - it does seem that nesting Podman containers
should work already - I was working on debugging the vnet=inherit use case
and assumed vnet=new would be the same.

I am a bit surprised too.  I was expecting PR_VNET to also be inherited
and with that the priv checks being the same.  After all the "parent"
says to a "child" 'you can have all I have'.

vnet is an exception to the "all I have," that comes from the pre-vnet
behavior of a jail inheriting the (only existing) network stack and not
having these permissions by default.  With vnet=inherit, you inherit
the vnet but only in the same way a regular jail inherits the regular
network.  So if you want extra protocol support, you still need
allow.socket_af.

Unlike most subsystem flags, vnet doesn't have the "disable" setting.
There's only "new" which is "your own vnet and all the freedom that
comes with it", or "inherit" which "not your own vnet and all the
restrictions that come with it."  This is unlike host=inherit, where
the sub-jail is allow to change the hostname in both jails for example.
It might have been better to call it "novnet" but I wanted to keep
with the subsystem flags like the rest.

socreate(): does

943 if (prison_check_af(cred, prp->pr_domain->dom_family) != 0)
    944                 return (EPROTONOSUPPORT);

and

   3458 int
   3459 prison_check_af(struct ucred *cred, int af)
...
  3467 #ifdef VIMAGE
3468 /* Prisons with their own network stack are not limited. */
   3469         if (prison_owns_vnet(cred))
   3470                 return (0);
   3471 #endif

You can probably work around this using:

   3504         default:
   3505                 if (!(pr->pr_allow & PR_ALLOW_SOCKET_AF))
   3506                         error = EAFNOSUPPORT;
   3507         }
   3508         return (error);

But that would likely have to come all the way up, which is no good.

So from what I said, the first part is right, where prison_owns_vnet stops
the rest from being checked.

Yes, it looks like allow.socket_af would have to come all the way up,
and that's where the problem lies.  Currently, we bypass that permission
check for vnet jails, but it would make sense for them to actually have
that and other similar inet-safety flags set in vnet jails by default.
That doesn't change the behavior of the vnet jails themselves, since
they bypass those bits anyway.  It also doesn't change the default
behavior of those sub-jails, since those bits aren't inherited without
explicit mention.  But it would make sense to allow the vnet jail to
provide its children with those bits regardless of the vnet jail's
parent.

I think defining a set of permission bits that are always turned on for
vnet jails wouldn't cause any unpleasant surprises.

- Jamie

Reply via email to