Re: starting apache22 in a jail - name-based shared memory failure

2013-06-21 Thread Mateusz Guzik
On Sat, Jun 22, 2013 at 02:35:14AM +0200, Julian H. Stacey wrote:
> Hi all,
> Any ideas ?
> I have a jail running 9.1-RELEASE in a jail, with a kernel as
> shown by uname -a
> 
>   FreeBSD land.berklix.org 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4
>   #0: Mon Jun 17 11:42:37 UTC 2013
>   r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
>   amd64
> 
> cd /usr/ports/www/apache22 ; su ; make install ;
> cd /usr/local/etc/rc.d # using default httpd.conf
> ./apache22 stop ; ./apache22 start
> tail -F /var/log/httpd-error.log
> 
>   [info] mod_unique_id: using ip addr 144.76.10.75
>   [info] Init: Seeding PRNG with 144 bytes of entropy
>   [info] Init: Generating temporary RSA private keys (512/1024 bits)
>   [info] Init: Generating temporary DH parameters (512/1024 bits)
>   [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
>   [info] Init: Initializing (virtual) servers for SSL
>   [info] mod_ssl/2.2.23 compiled against Server: Apache/2.2.23, Library: 
> OpenSSL/0.9.8x
>   [notice] Digest: generating secret for digest authentication ...
>   [notice] Digest: done
>   [info] mod_unique_id: using ip addr 144.76.10.75
>   [info] Init: Seeding PRNG with 144 bytes of entropy
>   [info] Init: Generating temporary RSA private keys (512/1024 bits)
>   [info] Init: Generating temporary DH parameters (512/1024 bits)
>   [info] Init: Initializing (virtual) servers for SSL
>   [info] mod_ssl/2.2.23 compiled against Server: Apache/2.2.23, Library: 
> OpenSSL/0.9.8x
>   [warn] pid file /var/run/httpd.pid overwritten -- Unclean shutdown of 
> previous Apache run?
>   [crit] (78)Function not implemented: unable to create or access scoreboard 
> "/var/run/httpd.scoreboard" (name-based shared memory failure)
> 

Can you include truss/ktrace output so that actual syscall is shown?

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: mergemaster and better support for ezjails

2014-07-12 Thread Mateusz Guzik
On Sat, Jul 12, 2014 at 08:08:52PM -0600, Warren Block wrote:
> A couple of patches to make mergemaster work better with ezjails.
> 
> These are only very superficially tested.  Feedback welcome.
> 
> 1. If /etc/mergemaster.rc exists in the jail, it is sourced.  This
>allows IGNORE_FILES to be set in the jail.  And other settings, but
>that's the one I wanted.
> 

How exactly does it work?

Is jailed root allowed to create /etc/mergemaster.rc?

If so, that would be a jail escape vector - an attacker puts commands they
want to execute inside and mergemaster sourcing the file will trigger
executing them.

In fact running mergemaster from "outside" on an untrusted jail seems
like a security weakness even without jailed-root controlled rc file
since they can try to do something fishy with symlinks which now resolve
to stuff on the host.

The following should be safe enough:
- have a dedicated RO jail
- mount to-be-updated jail under /mnt/jail or whatever
- mount sources/whatever RO under /usr/src or whatever
- run update process from inside dedicated RO jail

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: what are the differences freebsd jails and docker

2015-04-21 Thread Mateusz Guzik
On Tue, Apr 21, 2015 at 11:30:06AM -0400, Allan Jude wrote:
> On 2015-04-21 08:58, freekai wrote:
> > 
> > Nowdays,docker is popular,but what are the differences freebsd jails 
> > and docker?
> 
> Jails actually provide security and isolation. Docker, according to
> their documentation, does not.
> 
> If you want a nice GUI for your jails, try the Warden utility from
> PCBSD, it is in the FreeBSD ports tree.
> 

I would say this is grossly oversimplified and the question itself is
incorrect.

According to http://docs.docker.com/articles/security/ they do make some
claims about isolation and security.

*jail* is a mechanism in the kernel, Docker is just a set of scripts
using Linux counterpart.

I don't know full extent of what's possible with Linux containers.
Modulo some bugs and minor deficiences on either front I would expect
them to be roughly feature-comparable, especially I don't expect either
solution to have something inherently unfixable which would not be
present in the other solution as well.

Or in other words I would expect someone bored enough to be able to
implement docker on top of jails.

Docker folks definitely had some questionable stuff (like their
capability handling, not to be confused with capsicum in FreeBSD), but
that's standard with new projects and one could expect such issues to be
plugged for the most part.

The real security concern related to this stuff comes from the fact that
there is only one kernel, so a flaw allowing e.g. arbitrafy code
execution within it results in a compromise of the entire machine.

So the question is what kernel exploitation prevention measures are put
in place, what is the general state of kernel security etc. (for
instance if you don't need a fully featured container and just want to
sandbox something, capsicum on FreeBSD gives you great flexibility,
which can be achieved to some extent with seccomp + selinux)

Or in other words, a significant time effort is needed to come up with a
reasonable comparison.

However, in the meantime you can reasonably safely assume either
solution will do the trick similarly well.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [patch] separate SysV IPC namespace for jail

2015-06-05 Thread Mateusz Guzik
On Sat, Jun 06, 2015 at 07:24:21AM +0900, kikuchan wrote:
> Hello,
> 
> I want to run multiple instances of PostgreSQL with jail.
> 
> Changing UID is not suitable for my case,
> so I created a simple patch for stable/10 to separate SysV IPC
> namespace for each jail.
> 
> In contrast to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=48471 ,
> this patch comes with;
>  - All objects are visible by ipcs(1) whether in jails or not.
>  - Trying to access the objects beyond the jail will be rejected with EACCES.
>  - Treat (key_t, prison) pair as the key for a named object.
>  - Very simple implementation; I just added to check
> msqkptr->cred->cr_prison == td->td_ucred->cr_prison, for example.
> 
> Is this approach suitable for FreeBSD kernel?
> 
> If you find it is useful, or bugs, please let me know.
> 
> P.S.
>  There is no way to know from userland which jails own the objects, so far.
> 

I don't like this approach.

I would expect completely separate namespaces.

Extending struct prison with relevant pointers and updating the code to
look at them is mostly mechanical work, but making it committable
requires fixing some deficiencies and answering some questions.

First off with the support for multi-level jails, jailing is no longer a
privileged operation. There are documented harmless races related to
that, but it is unclear if they transform into something serious with
sysvipc involved. Single-threading the process for jailing should be
fine.

Address space can be shared between multiple jails, what happens if such
a pair ends up in different jails? Preferably such a scenario would be
prohibited to avoid future accidents.

What about existing sysvshm mappings when jailing?

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [patch] separate SysV IPC namespace for jail

2015-06-06 Thread Mateusz Guzik
On Sun, Jun 07, 2015 at 12:04:17AM +0900, kikuchan wrote:
> Sorry for cross-post to freebsd-stable, but I want to get more
> feedback for my patch.
> (The patch is; 
> http://lists.freebsd.org/pipermail/freebsd-jail/attachments/20150606/7736309b/attachment.bin)
> 
> 
> I believe this patch FIXES current SysV IPC for jail WITHOUT changing
> current kernel architecture.
> (so I hope it will be merged into stable/10)
> 
> Let me explain what happens currently, with and without my patch,
> since it's little confusing.
> 
> 
> I use SysV IPC shared memory (SYSVSHM) as an example here, because
> it's easy to understand.
> Remember shmget / shmat / shmdt / shmctl, are syscalls of SYSVSHM.
> 
> All normal processes have its own virtual memory space, it is done by kernel.
> A backend component of virtual memory is a page, is on real memory or
> on swap devices.
> 
> SYSVSHM provides a way to share memory segments on the page between
> processes on userland.
> A process can load the page into its own virtual memory space with
> shmat syscall.
> Once the page is loaded into the virtual memory space, the page is
> accessible until further shmdt syscall or exit of process.
> 
> Another process can obtain the exact same page, by calling shmat syscall.
> So, permission of shmat syscall is very important.
> 
> 
> > Address space can be shared between multiple jails
> 

This was a typo. Let me quote fixed version:

"Address space can be shared between multiple PROCESSES, what happens if
such a pair ends up in different jails? Preferably such a scenario would
be prohibited to avoid future accidents."

However, sysvipc namespace sharing is an ok feature esp. with
multi-level jails. In the simplest scenario upon jail creation you
decide whether it gets its own namespace or inherits it.

> > What about existing sysvshm mappings when jailing?
> 
> Real (not jailed) environment is treated as a jail with jid=0 in kernel.
> If you create sysvshm memory segment before entering a jail, the
> segment simply owned by jid=0.
> 

The point is you get a process with sysvshm segments from 2 different
jails. Looks like solid trouble protential.

> 
> > Extending struct prison with relevant pointers and updating the code to
> 
> You don't need to extend the struct to separate IPC namespaces.
> The word "namespaces" means a key (key_t) of IPC syscall, here.
> 
> Whether the struct should be extended or not, depends on how we want
> to control IPC resources for each jail.
> If you want to control SysV IPC resources by changing sysctl
> parameters from inside of jail for each jail,
> then it might be yes.
> But I think per-jail resource control should be done with RACCT, and
> it might be applied to my implementation too.
> 
> 
> The one missing feature is how to export information to userland.
> This should be discuss separately, even if my patch is rejected.
> (If visibility control is needed for ipcs, maybe it should use similar
> technique to ps or netstat?)
> 
> 
> Conclusion;
> I think my patch is better than broken. (SysV IPC + jail is buggy over
> 10 years!)
> 

The feature in question is definitely desirable, but your patch is hack,
with the "hack" part visible to userspace.

As mentioned earlier there are some things to do before any kind of
jail-aware ipcs land in the tree. As a minimum this is singlethreading
when jailing, prevention of jailing processes with shared virtual address
spaces and ones with existing sysvshm mappings. All this is to reduce
amount of bugs one would have to deal with. 

After the work is completed there is no problem whatsoever with
providing per-jail sysvipcs. This avoids information leaks (no id list
to look at) and conflicts.

Exporting is not a problem either - a dedicated sysctl grabs JID and
dumps its ipcs. It also gets a 'recursive' flag to know whether ipcs
for its own jails should be dumped as well (if different).

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [patch] separate SysV IPC namespace for jail

2015-06-07 Thread Mateusz Guzik
On Sun, Jun 07, 2015 at 04:43:16PM +0900, kikuchan wrote:
>  Hi Mateusz,
> 
> Thanks for your reply!
> 
> First of all, I intend to *jail* SysV IPC user completely.
> (unless user really want to interact with each other between jails)
> 
> I think SysV IPC is simple but obsolete, so you can design whatever
> you want for jail system.
> Also, I want keep everything simple.
> 
> My design (to be sure):
>  - Each entry of the list (shown in ipcs) belongs to a jail.
>  - Any operation to SHM/SEM/MSG attempted from another jail, will just
> fail with EACCES.
> 

But why? See below.

> 
> > "Address space can be shared between multiple PROCESSES, what happens if
> > such a pair ends up in different jails? Preferably such a scenario would
> > be prohibited to avoid future accidents."
> >
> > However, sysvipc namespace sharing is an ok feature esp. with
> > multi-level jails. In the simplest scenario upon jail creation you
> > decide whether it gets its own namespace or inherits it.
> >
> > > > What about existing sysvshm mappings when jailing?
> > >
> > > Real (not jailed) environment is treated as a jail with jid=0 in kernel.
> > > If you create sysvshm memory segment before entering a jail, the
> > > segment simply owned by jid=0.
> > >
> >
> > The point is you get a process with sysvshm segments from 2 different
> > jails. Looks like solid trouble protential.
> 
> Ok, I think I've got what you'd concerned.
> 
> In my design, setting up such processes would be difficult.
> This wouldn't be happend normally, because shared memory segments
> should be obtained BEFORE entering a jail;
> 
>  1. Create a segment on jid=0 with shmget()
>  2. shmat() to attach (get void *ptr)
>  3. fork()
>  4. A child process entering to jid=1 with jail_attach()
>  5. The child process and the parent process can share the address
> space (via *ptr).
>  6. If the child process do shmat() on the same ID again, it simply
> failed with EACCES.
> 
> It means, there is NO way to obtain a segment created in other jail
> AFTER jailed (even if you're root or obtaining the segment created on
> jid=0).


This is sharing a page, not an address space (see below).

This poses serious problems if actual separate namespaces are
implemented, otherwise it only leaves a potential for bugs for no real
gain.

> > As a minimum this is singlethreading
> > when jailing, prevention of jailing processes with shared virtual address
> > spaces and ones with existing sysvshm mappings. All this is to reduce
> > amount of bugs one would have to deal with.
> 
> Virtual memory allocation and related stuff are protected and done by
> kernel already, because it's an IPC (Inter Process Communication).
> Moreover, you cannot change an owner of the IPC entry after creation,
> so we don't need an additional protection in kernel.
> 

Here is an example race: on fork memory mappings are copied first,
sysvshm data is updated /later/. What happens if one of the calling
threads enters a jail while some other thread is forking? This may be
buggy as it is already, but that's roughly the scheme.

It looks like we have some weird miscommunication here, so let me
restate.

I do see great benefit in having jail-aware ipcs.

I do not believe the way to achieve it is to add jail-aware permission
checks. Support in question should provide support for separate
namespaces. The are several upsides, including lack of conflict between
jails and plugged infoleaks.

In general I don't understand why you insist on your approach, I does
not have any advantage over separate namespaces that I could see.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [patch] separate SysV IPC namespace for jail

2015-06-08 Thread Mateusz Guzik
On Mon, Jun 08, 2015 at 01:42:21AM +0900, kikuchan wrote:
> From my curiosity, is my patch a technically bad?
> Is there race condition in it? Or, enabling key_t separation for jail
> could trigger race condition, perhaps?
> 

I only briefly looked at the patch. The fact that you perform outside of
ipcperm looks suspicious but may be harmless, so at best it's a bad
style. If you need ipc mechanism-specifc functions, make them call
ipcperm instead.

The jail check is too simplistic. Jails higher in the hierarchy should be
able to access whatever lower jails produced.

> 
> > I do see great benefit in having jail-aware ipcs.
> >
> > I do not believe the way to achieve it is to add jail-aware permission
> > checks. Support in question should provide support for separate
> > namespaces. The are several upsides, including lack of conflict between
> > jails and plugged infoleaks.
> 
> Sorry but I might misunderstand what your "separate namespaces" means.
> What namespace are you going to separate? key_t, shmid, kernel
> structure of shm, or others?
> What features do your "jail-aware ipcs" provide?
> 

Well, as I said in my first mail the idea is to make ipc code look at
structures assigned to given jail, so that we can have multiple jails
with only their own objects. No "well, this id is used by other jail",
unless the namespace is explicitly shared.

I did have a patch with a meh implementation doing this, but I lost it
along the way. It is easy to implement it for "private purposes" (i.e.
disregarding possible attacks with jailing processes). The real work is
making the whole business safe.

For instance back then I could not find any reliable mechanism to tell
me whether given process has a shared address space. There is only a
vm_refcnt counter in vmspace which is modified on various occasions,
thus is not suitable. Adding a separate counter sucks and adding a "once
set, never cleared flag" sucks as well. Maybe there is a good method.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [patch] separate SysV IPC namespace for jail

2015-06-09 Thread Mateusz Guzik
On Wed, Jun 10, 2015 at 01:43:59AM +0900, kikuc...@uranus.dti.ne.jp wrote:
> > I only briefly looked at the patch. The fact that you perform outside of
> > ipcperm looks suspicious but may be harmless, so at best it's a bad
> > style. If you need ipc mechanism-specifc functions, make them call
> > ipcperm instead.
> 
> Sorry, I guess EACCES misled you.
> I should have chosen other value and/or concealed information for each jail 
> completely.
> 
> I intended to demonstrate it's enought to achieve IPC key_t space separation 
> (to PostgreSQL work) for each jails without having shmid_kernel struct for 
> each jails.
> 

There is no technical problem with providing entirely separate ipcs
which would not have to be solved with this approach.

This approach is actually harder to get right and has no benefit that I
would see.

One example downside is resource limiting - implementing per-namespace
limits is a non-problem.

> > Well, as I said in my first mail the idea is to make ipc code look at
> > structures assigned to given jail, so that we can have multiple jails
> > with only their own objects. No "well, this id is used by other jail",
> > unless the namespace is explicitly shared.
> 
> Ok, now I've understood what the idea is, and maybe it's done by Nick once 
> before on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=48471
> 

> There are two identifier, ID and KEY (shmid and shmkey for SHM) on SysV IPC 
> object.
> The KEY space must be separated for each jails I think, but about the ID, is 
> determined by kernel and userland users don't care its value, do you really 
> think it should be managed separately in kernel?
> I agree that "multiple jails with only their own objects" is good design 
> basically, but especially, if you want to support hierarchical jails, the 
> objects will be referenced by multiple jails.
> If ID space separation is not so important, separating internal namespace for 
> each jail is too complicate for simple KEY space separation, I think.
> 

Not separating stuff is more complicated.

> I really should have implemented to conceal information instead of returning 
> EACCES, sorry. ;/
> 
> Before jumping to the conclusion, I want to know that *current* code relative 
> to SHM whether have any problems on sharing underlaying vm page between 
> processes that jailed to different jails each other, especially on fork and 
> jail_attach. multithreaded process perhaps?
> 

There is definitely no problem sharing /a page/. There may be a problem
sharing a page which was obtained from syvshm.

> I'm also trying to port Nick's code to 10/stable

This patch is old and deals with the mostly mechanical part of the work.
In particular, it DOES NOT deal with any concerns I already expressed.
This is understanadble to some extent since there were no multilevel
jails at the time and some people may have felt securing against host
root is not necessary.

The patch will likely have a lot of conflicts and it will be way faster
to write from scratch.

> I guess it was not happned on 4.8 because lack of jail_attach.
> For example, a process attached to shmid = 65535 on jid=1, then the process 
> changes its jail to jid=2, and if shmid = 65535 exists on jid=2, the process 
> refers wrong vm mapping unless maintain shmmap_state data for the process 
> every jail_attach.

This is an example problem.

> Maybe this behavior is something relative to the race that you mentioned 
> before?

It is not.

> 
> 
> > For instance back then I could not find any reliable mechanism to tell
> > me whether given process has a shared address space. There is only a
> > vm_refcnt counter in vmspace which is modified on various occasions,
> 
> Hmm, sorry I can't understand what the problem is here...
> I'm not good at kernel internals yet, so I don't know details of when the 
> processes share the address space, and I have no idea why you want to know 
> whether the process has a shared address space or not...
> 

rfork has a flag which makes the new process share the address space
with the parent. So when one of these processes jails somewhere, we can
end up with mappings from separate namespaces.

> 
> > It is easy to implement it for "private purposes" (i.e.
> > disregarding possible attacks with jailing processes). The real work is
> > making the whole business safe.
> 
> I agree.
> 
> Is there any project ongoing for this sysvipc issue?
> If any, what is needed to be done?
> 

I am unaware of any work being done in the area.

I stated what needs to be done in my first e-mail.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: How to implement jail-aware SysV IPC (with my nasty patch)

2015-06-15 Thread Mateusz Guzik
On Mon, Jun 15, 2015 at 09:53:53AM +, Bjoern A. Zeeb wrote:
> Hi,
> 
> removed hackers, added virtualization.
> 
> 
> > On 12 Jun 2015, at 01:17 , kikuc...@uranus.dti.ne.jp wrote:
> > 
> > Hello,
> > 
> > I’m (still) trying to figure out how jail-aware SysV IPC mechanism should 
> > be.
> 
> The best way probably is to finally get the “common” VIMAGE framework into 
> HEAD to allow easy virtualisation of other services.  That work has been 
> sitting in perforce for a few years and simply needs updating for sysctls I 
> think.
> 
> Then use that to virtualise things and have a vipc like we have vnets.  The 
> good news is that you have identified most places and have the cleanup 
> functions already so it’d be a matter of transforming your changes (assuming 
> they are correct and working fine; haven’t actually read the patch in 
> detail;-)  to the different infrastructure.  And that’s the easiest part.
> 
> 

I have not looked at vimage too closely, maybe indeed it's the right to
go. Would definitely be interested in seeing it cleaned up and in
action.

In the meantime, as I tried to explain in the previous thread, a
jail-aware sysvshm poses several questions which need to be
answered/taken care of before it can hit the tree. I doubt any
reasonable implementation can magically avoid problems they pose and I
definitely want to get an analysis how proposed implementation behaves
(or how it prevents given scenario from occuring).

Fundamentally the basic question is how does the implementation cope
with processes having sysvshm mappings obtained from 2 different jails
(provided they use different sysvshms).

Preferably the whole business would be /prevented/. Prevention mechanism
would have to deal with shared address spaces (rfork(2) + RFMEM),
threads and pre-existing mappings.

The patch posted here just puts permission checks in several places,
while leaving the namespace shared, which I find to be a user-visible
hack with no good justification. There is also no analysis how this
behaves when presented with aforementioned scenario. Even if it turns
out the resut is harmless with resulting code, this leaves us with a
very error-prone scheme.

There is no technical problem adding a pointer to struct prison and
dereferencing it instead of current global vars. Adding proper sysctls
dumping the content for given jail is trivial and so is providing
resource limits when creating a first-level jail with a separate
sysvshm. Something which cannot be as easily achieved with the patch in
question.

Possible later switch to vimage would be transparent to users.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"

Re: How to implement jail-aware SysV IPC (with my nasty patch)

2015-06-15 Thread Mateusz Guzik
On Tue, Jun 16, 2015 at 03:45:34AM +0900, kikuc...@uranus.dti.ne.jp wrote:
> On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik  wrote:
> > Fundamentally the basic question is how does the implementation cope
> > with processes having sysvshm mappings obtained from 2 different jails
> > (provided they use different sysvshms).
> > 
> > Preferably the whole business would be /prevented/. Prevention mechanism
> > would have to deal with shared address spaces (rfork(2) + RFMEM),
> > threads and pre-existing mappings.
> > 
> > The patch posted here just puts permission checks in several places,
> > while leaving the namespace shared, which I find to be a user-visible
> > hack with no good justification. There is also no analysis how this
> > behaves when presented with aforementioned scenario. Even if it turns
> > out the resut is harmless with resulting code, this leaves us with a
> > very error-prone scheme.
> > 
> > There is no technical problem adding a pointer to struct prison and
> > dereferencing it instead of current global vars. Adding proper sysctls
> > dumping the content for given jail is trivial and so is providing
> > resource limits when creating a first-level jail with a separate
> > sysvshm. Something which cannot be as easily achieved with the patch in
> > question.
> 
> Could you try the latest patch, please?
> I justify user-visibility, make it hierarchical jail friendly, and use EINVAL 
> instead of EACCES to conceal information leak.
> https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed)
> 
> 
> I realized my method is a bit better, when I'm trying to port/write the real 
> namespace separation.
> Let me explain (again) why I choose this method for sysv ipc, and could you 
> tell me how it should be, please?
> 
> struct shmmap_state {
>   vm_offset_t va;
>   int shmid;
> };
> 
> In sysv_shm.c, struct shmmap_state, exist per process as 
> p->p_vmspace->vm_shm, is a lookup-table for va -> shm object lookup.
> The shmmap_state entry holds a reference (here, shmid) to shm object for 
> further detach, and entries are simply copied on fork.
> 
> If you split namespace (includes shmid space) completely, shmid would be no 
> longer a unique identifier for IPC object in kernel.
> To make it unique, adding a reference to prison into shmmap_state like this;
> 
> struct shmmap_state {
>   vm_offset_t va;
>   struct prison *prison;
>   int shmid;
> };
> 
> would be bad idea, because after a process calls jail_attach(), the process 
> holds a reference to another (creator) prison, or copy the IPC object 
> completely on every jail_attach() occurs?

As I explained in the previous thread, with a separate namespace it is a
strict requirement to prevent sharing of sysvshm mappings. With the
requirement met, there is no issue. As you will see later in the mail,
even your approach would benefit greatly from having such a restriction.


> How do you deal with hierarchical jail?
> 

If proper resource limiting for hierarchical jails is implemented, the
new jail either inherits or gets a new namespace, depending on used
options.

With only simplistic support first level jails can inherit or get a new
namespace, the rest must inherit.

There is no issue here due to sharing prevention.

> My method didn't touch anything about the mapping stuff, thus it behaves 
> exactly the same as current FreeBSD behave on this point.
> 

Sure it did. As you noticed yourself it makes sense to clean up sysvshms
on jail destruction, which you do in sysvshm_cleanup_for_prison_myhook.

Your code does:
   if ((shmseg->u.shm_perm.mode & SHMSEG_ALLOCATED) &&
   shmseg->cred->cr_prison == pr) {
   shm_remove(shmseg, i);


which differs from what is executed by kern_shmdt_locked.

Now let's consider a process which rforks and shared the address space
with it's child. The child enters a jail and grabs a sysvshm mapping,
then exits and we kill the jail.

In effect we got a process with an address space which used a mapping
created in a now-destroyed jail. Is this situation problematic? I don't
see any anlysis provided.

Maybe it is, maybe it so happens it is not. The mere posibility of this
scenario needlessly complicates maintenance, and such a scenario has
likely no practical purpose. As such, it is best /prevented/.

With it prevented there is nothing positive about your approach that I
could see.

> I'm not sure I could understand properly what the shared address space 
> problem is, (Could someone help me to understand, perhaps in code?)
> and, I'm not sure whether the current FreeBSD has the shared address space 
> prob

Re: SHM objects cannot be isolated in jails, any evolution in future FreeBSD versions?

2016-03-19 Thread Mateusz Guzik
On Sat, Mar 12, 2016 at 12:05:57PM +0100, Simon wrote:
> The shm_open()(2) function changed since FreeBSD 7.0: the SHM objects
> path are now uncorrelated from the physical file system to become
> just abstract objects. Probably due to this, the jail system do not
> provide any form of filtering regarding shared memory created using
> this function. Therefore:
> 
> - Anyone can create unauthorized communication channels between jails,
> - Users with enough privileges in any jail can access and modify any
> SHM objects system-wide, ie. shared memory objects created in any
> other jail and in the host system.
> 
> I've seen a few claims that SHM objects were being handled
> differently whether they were created inside or outside a jail.
> However, I tested on FreeBSD 10.1 and 9.3 but found no evidence of
> this: both version were affected by the same issue.
> 
> A reference of such claim: 
> https://lists.freebsd.org/pipermail/freebsd-ports-bugs/2015-July/312665.html
> 
> My initial post on FreeBSD forum discussing the issue with more
> details: https://forums.freebsd.org/threads/55468/
> 
> Currently, there does not seem to be any way to prevent this.
> 
> I'm therefore wondering if there are any concrete plans to change
> this situation in future FreeBSD versions? Be able to block the
> currently free inter-jail SHM-based communication seems a minimum,
> however such setting would also most likely prevent SHM-based
> application to work.
> 
> Using file based SHM objects in jails seemed a good ideas but it does
> not seem implemented this way, I don't know why. Is this planned, or
> are there any greater plans ongoing also involving IPC's similar
> issue?
> 

Last time I checked there were no inherent problems preventing getting
this to work.

A half-assed implementation is trivial and boils down semi-automatically
changing several places which reference a global pointer to something
taken from struct prison and then adding support to jail(8) and ipcs(1).

An acceptable implementation would first take several steps to prevent
foot-shooting like e.g. make jail_attach'ing processes singlethreaded.

The preferred way would first go over the existing codebase and perform
necessary cleanups and bugfixes (if any).

Either way, there are no apparent actual problems to be solved here, or
in other words this looks like a moderate (option 2) to long time effort
(option 3).

One unclear bit is involvement with VIMAGE. To be more exact it is quite
unclear what's the relationship between VNET and VIMAGE. If one had an
entire network stack for their jail, they would not mind separate ipcs
either. On the other hand having separate ipcs should not require a
separate network stack. As such, does not look like this would duplicate
too much effort if VIMAGE was to become usable.

That said, this is definitely doable. I can have another look, but no
promises.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: enforce_statfs showing leading path

2019-01-08 Thread Mateusz Guzik
On 1/8/19, Michael W. Lucas  wrote:
> Hi,
>
> I'm experimenting with enforce_statfs for the jails book, and have hit
> an inconsistency. Not sure if the bug should go to src or doc. Running
> last week's -current.
>
> According to jail(8):
>
>  When set to 1, only mount points below the jail's chroot
>  directory are visible.  In addition to that, the path to the
>  jail's chroot directory is removed from the front of their
> path‐
>  names.
>
> Seems pretty clear that I shouldn't see anything other than
>
> # jls -h name enforce_statfs
> ...
> ioc-www1 1
>
> So, as I read it, the jail's chroot directory should be stripped down
> to /. But inside the jail:
>
> root@www1:~ # mount
> iocage/iocage/jails/www1/root on / (zfs, local, nfsv4acls)
> devfs on /dev (devfs, local, multilabel)
> fdescfs on /dev/fd (fdescfs)
>
> I see the jail's chroot directory.
>
> This seems to contradict the man page, unless I'm misunderstanding.
>
> Is this a software bug? A ZFS thing? A doc bug? Or am I just an idiot?
>
> Also, should this path be stripped when enforce_statfs is set to 1 *or
> above*? Or is this strictly when set to 1? If I'm filing a bug, it
> might as well be complete...
>

The "path" you are seeing is dataset name, which you made to resemble
the mount point.

Whether full dataset name should be exposed or not is a very different
question, does illumos do it?

Worst case it should be trivial to add a sysctl to just obfuscate the name.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: Is it possible to employ epoch to simplify managing prison lifecycle

2022-12-16 Thread Mateusz Guzik
On 12/16/22, Zhenlei Huang  wrote:
> Hi,
>
> While hacking `sys/kern/kern_jail.c` I got lost.
>
> There're lots of ref / unref and flags to prevent visit invalid prison
> while
>  concurrent modification is possible and some refs looks weird.
>
> Is it possible to employ epoch(9) to simplify managing of prison lifecycle
> ?
>

Some of the ref/unref cycles are probably avoidable to begin with, but
ultimately the thing to do here is to employ per-cpu reference
counting, if at all needed.

I have a wip patch to provide such a mechanism, it may or may not land
this month.

-- 
Mateusz Guzik 



Re: Is it possible to employ epoch to simplify managing prison lifecycle

2022-12-23 Thread Mateusz Guzik
On 12/23/22, Alexander V. Chernikov  wrote:
>
>
>> On 16 Dec 2022, at 16:29, Mateusz Guzik  wrote:
>>
>> On 12/16/22, Zhenlei Huang  wrote:
>>> Hi,
>>>
>>> While hacking `sys/kern/kern_jail.c` I got lost.
>>>
>>> There're lots of ref / unref and flags to prevent visit invalid prison
>>> while
>>> concurrent modification is possible and some refs looks weird.
>>>
>>> Is it possible to employ epoch(9) to simplify managing of prison
>>> lifecycle
>>> ?
>>>
>>
>> Some of the ref/unref cycles are probably avoidable to begin with, but
>> ultimately the thing to do here is to employ per-cpu reference
>> counting, if at all needed.
>>
>> I have a wip patch to provide such a mechanism, it may or may not land
>> this month.
> That would be nice. I’d love to convert nextops refcounting to that one.
> Do you envision similar semantics as Linux percpu_ref? I mean, does one need
> to explicitly mark “not in active use” stage?

There *something* needed to disable per-cpu operation, otherwise how
can you ever know if the count is 0, apart from going over all cpus
every time, which defeats the point.

More specifically, I have a on/off switch for said per-cpu op. This is
modeled after what I did for counters in vfs, see vfs_ref et al.

-- 
Mateusz Guzik 



Re: debian jail, setting max open files soft limit does not work

2022-12-27 Thread Mateusz Guzik
On 12/27/22, Mathias Picker  wrote:
> Hi all,
>
>
> I’ve set up a jail on 13.1 running debian stretch, and now a
> triplestore needing many openfiles for a data import.
>
> Since on Linux the soft limit is pretty hard :) I need to set the
> soft limit.
>
> I’ve edited /etc/security/limits.conf to set soft and hard limit
> to 2 (just to check), but after login the soft limit stays at
> 1024.
>
> Using prlimit I can change the limits of a running process, but
> that is not passed on to subprocesses, which the app creates
> constantly for import, there, the soft limit returns to 1024 :(
>
> Running prlimit --nofile 2 or prlimit --nofile 2:2
> does not work, either, the soft limit stays at 1024, and the
> import fails.
>
> Does anyone know a way to change the soft limit permenantly?
>

kernel code is buggy here, from a quick read you should be able to
work around it by:

sysctl compat.linux.default_openfiles=-1


-- 
Mateusz Guzik 



Re: kern/126368: Running ktrace/kdump in jail leads to stale jails

2008-08-08 Thread Mateusz Guzik
The following reply was made to PR kern/126368; it has been noted by GNATS.

From: "Mateusz Guzik" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc:  
Subject: Re: kern/126368: Running ktrace/kdump in jail leads to stale jails
Date: Fri, 8 Aug 2008 19:30:22 +0200

 Err, I made a mistake. crfree() will be called in case of failure
 (loop starting at line 959), so the following patch should be ok:
 
 --- sys/kern/kern_ktrace.c.orig2008-08-08 16:37:45.0 +0200
 +++ sys/kern/kern_ktrace.c 2008-08-08 19:25:16.0 +0200
 @@ -933,12 +933,14 @@
error = VOP_WRITE(vp, &auio, IO_UNIT | IO_APPEND, cred);
VOP_UNLOCK(vp, 0, td);
vn_finished_write(mp);
vrele(vp);
VFS_UNLOCK_GIANT(vfslocked);
 -  if (!error)
 +  if (!error) {
 +  crfree(cred);
return;
 +  }
/*
 * If error encountered, give up tracing on this vnode.  We defer
 * all the vrele()'s on the vnode until after we are finished walking
 * the various lists to avoid needlessly holding locks.
 */
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/126368: Running ktrace/kdump in jail leads to stale jails

2008-08-09 Thread Mateusz Guzik
On Fri, Aug 08, 2008 at 06:43:38PM +, Bjoern A. Zeeb wrote:
> >The following reply was made to PR kern/126368; it has been noted by GNATS.
> >
> >From: "Mateusz Guzik" <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Cc:
> >Subject: Re: kern/126368: Running ktrace/kdump in jail leads to stale jails
> >Date: Fri, 8 Aug 2008 19:30:22 +0200
> >
> >Err, I made a mistake. crfree() will be called in case of failure
> >(loop starting at line 959), so the following patch should be ok:
> >
> >--- sys/kern/kern_ktrace.c.orig  2008-08-08 16:37:45.0 +0200
> >+++ sys/kern/kern_ktrace.c   2008-08-08 19:25:16.0 +0200
> >@@ -933,12 +933,14 @@
> > error = VOP_WRITE(vp, &auio, IO_UNIT | IO_APPEND, cred);
> > VOP_UNLOCK(vp, 0, td);
> > vn_finished_write(mp);
> > vrele(vp);
> > VFS_UNLOCK_GIANT(vfslocked);
> >-if (!error)
> >+if (!error) {
> >+crfree(cred);
> > return;
> >+}
> 
> that sounds more plausible w/o seeing the surrounding code. I had
> wondered already earlier today when I was pointed at.
> 
> I'll look into this.
> 

Sorry for the noise -- the first patch was right. ;)

ktr_writerequest() is called multiple times and it _always_ calls
crhold(), so crfree() must be called before it returns (even in case of
failure).

Also, in this function one can find:

[..]
crhold(cred)
[..]
if (vp == NULL) {
KASSERT(cred == NULL, ("ktr_writerequest: cred != NULL"));
return;
}

`Normal' kernel might leak credentials in this case, so I believe crfree() 
should be added there too.

Thanks, and again, sorry for the noise.
--
Mateusz Guzik
--- sys/kern/kern_ktrace.c.orig	2008-08-08 16:37:45.0 +0200
+++ sys/kern/kern_ktrace.c	2008-08-10 01:42:07.0 +0200
@@ -889,10 +889,12 @@
 	 * request, so just drop it.  Make sure the credential and vnode are
 	 * in sync: we should have both or neither.
 	 */
 	if (vp == NULL) {
 		KASSERT(cred == NULL, ("ktr_writerequest: cred != NULL"));
+		if (cred != NULL)
+			crfree(cred);
 		return;
 	}
 	KASSERT(cred != NULL, ("ktr_writerequest: cred == NULL"));
 
 	kth = &req->ktr_header;
@@ -933,10 +935,11 @@
 		error = VOP_WRITE(vp, &auio, IO_UNIT | IO_APPEND, cred);
 	VOP_UNLOCK(vp, 0, td);
 	vn_finished_write(mp);
 	vrele(vp);
 	VFS_UNLOCK_GIANT(vfslocked);
+	crfree(cred);
 	if (!error)
 		return;
 	/*
 	 * If error encountered, give up tracing on this vnode.  We defer
 	 * all the vrele()'s on the vnode until after we are finished walking
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/126368: Running ktrace/kdump in jail leads to stale jails

2008-09-03 Thread Mateusz Guzik
On Thu, Aug 14, 2008 at 08:16:38PM -0400, alexus wrote:
> where can I get latest patch? that I can apply to 7.0-RELEASE-p3 ?
> 

Sorry for very late reply, you can grab it from here:
http://student.agh.edu.pl/~frag/kern_ktrace.diff

Thanks,
--
Mateusz Guzik
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


[patch] use-after-free in kern_jail_set and lock leak in prison_racct_modify

2012-05-19 Thread Mateusz Guzik
Hello,

I'm using -CURRENT as of r235649.

Bugs I'd like to report:

1. a use-after-free bug in kern_jail_set triggerable by attempts to
clear persist flag from "empty" persistent jail.

[..]
if (!created) {
prison_deref(pr, (flags & JAIL_ATTACH) /* free */
? PD_DEREF
: PD_DEREF | PD_LIST_SLOCKED);

[..]
#ifdef RACCT
if (!created)
prison_racct_modify(pr); /* dereference */
#endif

td->td_retval[0] = pr->pr_id; /* dereference */
[..]


2. function prison_racct_modify leaks allprison and allproc locks when
modifications don't cause rename.

[..]
sx_slock(&allproc_lock);
sx_xlock(&allprison_lock);

if (strcmp(pr->pr_name, pr->pr_prison_racct->prr_name) == 0)
return;
[..]

=

How to reproduce:
jail -c persist=1
jail -n 1 -m persist=0 

or

jail -c path=/ command=/usr/bin/true

This causes panic:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xff8000e37010
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80562e0b
stack pointer   = 0x28:0xff807c995830
frame pointer   = 0x28:0xff807c995ad0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 23244 (jail)
[ thread pid 23244 tid 100077 ]
Stopped at  kern_jail_set+0x2dfb:   movslq  0x10(%r13),%r12
db> bt
Tracing pid 23244 tid 100077 td 0xfe0003075490
kern_jail_set() at kern_jail_set+0x2dfb
sys_jail_set() at sys_jail_set+0x62
amd64_syscall() at amd64_syscall+0x29e
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (507, FreeBSD ELF64, sys_jail_set), rip = 0x800ed9bdc, rsp = 
0x7fffd718, rbp = 0x7f
ffd790 ---


Proposed trivial patch:
http://student.agh.edu.pl/~mjguzik/patches/jail-use-after-free.patch

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: [jail] Allowing root privledged users to renice

2012-05-28 Thread Mateusz Guzik
On Fri, May 25, 2012 at 10:23:53AM -0700, Julian Elischer wrote:
> On 5/25/12 10:04 AM, Bjoern A. Zeeb wrote:
> >On 25. May 2012, at 16:48 , Sean Bruno wrote:
> >
> >>I've been toying with the idea of letting jails renice processes ... how
> >>dangerous and/or stupid is this idea?
> >>
> >> //depot/yahoo/ybsd_9/src/sys/kern/kern_jail.c#5 -
> >>/home/seanbru/ybsd_9/src/sys/kern/kern_jail.c 
> >>270a271,275
> >>+ int   jail_allow_renice = 0;
> >>+ SYSCTL_INT(_security_jail, OID_AUTO, allow_renice, CTLFLAG_RW,
> >>+&jail_allow_renice, 0,
> >>+"Prison root can renice processes");
> >>
> >>3857a3863,3865
> >>+  case PRIV_SCHED_SETPRIORITY:
> >>+  if (!jail_allow_renice)
> >>+   return (EPERM);
> >
> >I think sysctls are a bad idea given jails have per-jail flags these days.
> >
> >Maybe also only allow re-nicing to be nicer but not less nice?
>    for sure !  start a jail with it's max priority and the
> root within can allow nicer priorities only..
> you can always add priority from teh master (parent) environment outside.
> 

Unless I seriously misunderstood something, that's the case right now.

That is, PRIV_SCHED_SETPRIORITY matters only if resulting nice parameter
would be lower.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: jail: unknown parameter: ip6.addr

2012-11-07 Thread Mateusz Guzik
On Wed, Nov 07, 2012 at 03:39:26PM -0500, Mike Jakubik wrote:
> Hello,
> 
> I just updated a server to latest stable and my jails no longer start,
> troubleshooting the startup script shows us that the parameter ip6.addr
> is unknown, this system is compiled without INET6.
[..]
> + tail +2 /tmp/jail.PJ5ji3QH/jail.8101
> jail: unknown parameter: ip6.addr

Try this (lightly tested):
http://people.freebsd.org/~mjg/patches/rc-jail-ip-arg.diff

Basically the idea is to pass ip4.addr and ip6.addr only when respective
addresses are specified in configuration.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: misc/174436: [jail] Jails with numbers as names don't work

2012-12-14 Thread Mateusz Guzik
The following reply was made to PR misc/174436; it has been noted by GNATS.

From: Mateusz Guzik 
To: bug-follo...@freebsd.org, r...@bytecamp.net
Cc:  
Subject: Re: misc/174436: [jail] Jails with numbers as names don't work
Date: Fri, 14 Dec 2012 17:01:54 +0100

 Hi,
 
 can you provide backtrace from this panic? Are you able to reproduce it?
 
 Does not panic for me.
 
 -- 
 Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: misc/174436: [jail] Jails with numbers as names don't work

2012-12-17 Thread Mateusz Guzik
The following reply was made to PR kern/174436; it has been noted by GNATS.

From: Mateusz Guzik 
To: Robert Schulze 
Cc: bug-follo...@freebsd.org
Subject: Re: misc/174436: [jail] Jails with numbers as names don't work
Date: Mon, 17 Dec 2012 22:31:27 +0100

 On Mon, Dec 17, 2012 at 02:57:06PM +0100, Robert Schulze wrote:
 > Hello,
 > 
 > Am 14.12.2012 17:01, schrieb Mateusz Guzik:
 > >Hi,
 > >
 > >can you provide backtrace from this panic? Are you able to reproduce it?
 > 
 > I can reproduce this behaviour reliably.
 
 Can you send me your kernel or put somewhere for download?
 
 If not, provide output of the following:
 # addr2line -e /boot/kernel/kernel 0x804debff
 # addr2line -e /boot/kernel/kernel 0x804dbef6
 
 Also I don't think that '0' has any significance here. Can you change it
 to something else and try again? Can you try with different IP?
 Preferably 127.0.0.2 on lo0.
 
 > Here is a backtrace, I had to type that manually:
 > 
 > fault virtual address   = 0x110
 > fault code  = supervisor read data, page not present
 > instruction pointer = 0x20:0x804d9d54
 > stack pointer   = 0x28:0xff8489ce47d0
 > frame pointer   = 0x28:0xff8489ce47f0
 > code segment= base 0x0, limit 0xf, type 0x1b
 > = DPL 0, pres 1, long 1, def32 0, gran 1
 > processor eflags= interrupt enabled, resume, IOPL = 0
 > current process = 1058 (jail)
 > trap number = 12
 > panic: page fault
 > cpuid = 7
 > KDB: stack backtrace:
 > #0 0x8053de06 at kdb_backtrace+0x66
 > #1 0x80507c6e at panic+0x1ce
 > #2 0x807579f0 at trap_fatal+0x290
 > #3 0x80757d28 at trap_pfault+0x1e8
 > #4 0x8075832e at trap+0x3be
 > #5 0x80741bef at calltrap+0x8
 > #6 0x804dbef6 at prison_deref+0x1f6
 > #7 0x804debff at kern_jail_set+0x14af
 > #8 0x804e1282 at sys_jail_set+0x62
 > #9 0x807572d0 at amd64_syscall+0x540
 > #10 0x80741ed7 at Xfast_syscall+0xf7
 > 
 > with kind regards,
 > Robert Schulze
 
 -- 
 Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: misc/174436: [jail] Jails with numbers as names don't work

2012-12-18 Thread Mateusz Guzik
The following reply was made to PR kern/174436; it has been noted by GNATS.

From: Mateusz Guzik 
To: Robert Schulze 
Cc: bug-follo...@freebsd.org
Subject: Re: misc/174436: [jail] Jails with numbers as names don't work
Date: Tue, 18 Dec 2012 11:18:37 +0100

 On Tue, Dec 18, 2012 at 10:48:18AM +0100, Robert Schulze wrote:
 > Hi,
 > 
 > Am 17.12.2012 22:31, schrieb Mateusz Guzik:
 > >
 > >Can you send me your kernel or put somewhere for download?
 > >
 > >If not, provide output of the following:
 > ># addr2line -e /boot/kernel/kernel 0x804debff
 > ># addr2line -e /boot/kernel/kernel 0x804dbef6
 > 
 > # addr2line -e /boot/kernel/kernel 0x804debff
 > /usr/src/sys/kern/kern_jail.c:1848
 > # addr2line -e /boot/kernel/kernel 0x804dbef6
 > /usr/src/sys/kern/kern_jail.c:4537
 > 
 > >Also I don't think that '0' has any significance here. Can you change it
 > >to something else and try again? Can you try with different IP?
 > >Preferably 127.0.0.2 on lo0.
 > 
 
 I was reading wrong version of rc.d script. Name is passed with -n switch.
 
 Looks like we can get to prison_deref before RACCT is initialized for
 given prison.
 
 Please test the following:
 diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
 index 1dc43ab..7ca1d72 100644
 --- a/sys/kern/kern_jail.c
 +++ b/sys/kern/kern_jail.c
 @@ -2604,7 +2604,8 @@ prison_deref(struct prison *pr, int flags)
 cpuset_rel(pr->pr_cpuset);
 osd_jail_exit(pr);
  #ifdef RACCT
 -   prison_racct_detach(pr);
 +   if (pr->pr_prison_racct != NULL)
 +   prison_racct_detach(pr);
  #endif
 free(pr, M_PRISON);
 
 
 -- 
 Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: misc/174436: [jail] Jails with numbers as names don't work

2012-12-18 Thread Mateusz Guzik
The following reply was made to PR kern/174436; it has been noted by GNATS.

From: Mateusz Guzik 
To: Robert Schulze 
Cc: bug-follo...@freebsd.org
Subject: Re: misc/174436: [jail] Jails with numbers as names don't work
Date: Tue, 18 Dec 2012 12:08:30 +0100

 On Tue, Dec 18, 2012 at 12:01:43PM +0100, Robert Schulze wrote:
 > Hi,
 > 
 > Am 18.12.2012 11:18, schrieb Mateusz Guzik:
 > >I was reading wrong version of rc.d script. Name is passed with -n switch.
 > >
 > >Looks like we can get to prison_deref before RACCT is initialized for
 > >given prison.
 > >
 > >Please test the following:
 > >diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
 > >index 1dc43ab..7ca1d72 100644
 > >--- a/sys/kern/kern_jail.c
 > >+++ b/sys/kern/kern_jail.c
 > >@@ -2604,7 +2604,8 @@ prison_deref(struct prison *pr, int flags)
 > > cpuset_rel(pr->pr_cpuset);
 > > osd_jail_exit(pr);
 > >  #ifdef RACCT
 > >-   prison_racct_detach(pr);
 > >+   if (pr->pr_prison_racct != NULL)
 > >+   prison_racct_detach(pr);
 > >  #endif
 > > free(pr, M_PRISON);
 > >
 > 
 > this fixed the panic, but the jail can still not be started:
 > 
 > # /etc/rc.d/jail onestart 0
 > Configuring jails:.
 > Starting jails: cannot start jail "0":
 > .
 
 Forgot to add:
 '0' is explicitly forbidden. Underlying reason is that you already have
 jail 0 - your main system.
 
 The only problem here was that cleanup was incorrect. And possibly
 documentation should note that '0' is already taken.
 -- 
 Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: new jail(8) ignoring devfs_ruleset?

2013-02-18 Thread Mateusz Guzik
On Mon, Feb 18, 2013 at 09:26:42AM -0700, Jamie Gritton wrote:
> On 02/18/13 01:54, Harald Schmalzbauer wrote:
> >  schrieb Jamie Gritton am 16.02.2013 00:40 (localtime):
> >>On 02/15/13 09:27, Harald Schmalzbauer wrote:
> >>>   Hello,
> >>>
> >>>like already posted, on 9.1-R, I highly appreciate the new jail(8) and
> >>>jail.conf capabilities. Thanks for that extension!
> >>>
> >>>Accidentally I saw that "devfs_ruleset" seems to be ignored.
> >>>If I list /dev/ I see all the hosts disk devices etc.
> >>>I set "devfs_ruleset = 4;" and "enforce_statfs = 1;" in jail.conf.
> >>>Inside the jail,
> >>>sysctl security.jail.devfs_ruleset returnes "1".
> >>>But like mentioned, I can access all devices...
> >>>
> >>>Thanks for any help,
> >>>
> >>>-Harry
> >>
> >>devfs_ruleset is only used along with mount.devfs - do you also have
> >>that set in jail.conf?
> >
> >Thanks for your response.
> >
> >Yes, I have mount.devfs; set.
> >Otherwise I wouldn't have any device inside my jail. Verified - and like
> >intended, right?
> >Another notable discrepancy: The man page tells that devfs_rulset is "4"
> >by default.
> >But when I don't set devfs_rulset in jail.conf at all, inside the jail,
> >'sysctl security.jail.devfs_ruleset': 0
> >When set, like mentioned above, it returns the corresponding value, but
> >it doesn't have any effect.
> >How gets devfs_rulset handled? Does jail(8) do the whole job? I'd like
> >to help finding the source, but have missed the whole new jail evolution...
> >Inside my jails, I don't have a fstab, outside I have them defined and
> >enabled with "mount" - and noticed the non-reverted umounting.
> 
> I found the problem - I noticed you mentioned 9.1-R, and took a look at
> devfs(5). On CURRENT, there's a mount option "ruleset", that isn't there
> on 9.
> 
> So I'll have to get around it by running devfs(8) after the mount. I'll
> work on a patch for that.
> 

Why not MFC support for that mount option instead?

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


automatic garbage collection of stuff mounted (etc.) by jailed root

2013-04-22 Thread Mateusz Guzik
Hello,

This is something that imho could be done by GSoC student.

It is possible to allow jailed root to mount various filesystems. But
once all processes are dead, mounts done by jailed root that he didn't
clean up are still hanging around.

As time passes and more stuff gets jailable we should expect problems
like this in different subsystems.

So I propose that someone(tm) implements a solution which cleans this
stuff during jail destruction.

One idea how to do it: implement a list with clean up operations. Using
mount example: you add a filesystem to be cleaned up after it is
mounted, you delete it after it is unmounted. When the jail is going to
die you just traverse the list backwareds and call cleaning functions,
in this case unmounting filesystems. Maybe this is is a bad idea in the
first place and it is better to take a look at mount tree and traverse
that, I don't know, you should investigate. :) Note that the code has to
be robust in case of errors (e.g. given fs may not be unmountable
because someone from prison0 is inside).

Again, the goal is to have jails clean up automatically after anything
jailed root was permitted to do.

Thoughts?
-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: automatic garbage collection of stuff mounted (etc.) by jailed root

2013-04-24 Thread Mateusz Guzik
On Mon, Apr 22, 2013 at 12:29:38PM -0600, Jamie Gritton wrote:
> On 04/22/13 11:39, Miroslav Lachman wrote:
> >>This already happens when jails are created using a jail.conf file. Any
> >>mounts there are unmounted as part of the jail removal process. Just
> >>recently I fixed it to properly do this unmounting in reverse order.
> >
> >Do you mean mounts defined in jail.conf or all mounts manually done by
> >root user in jail?
> >
> 
> Ah, I see the difference. Yes, that's only for mounts in the jail.conf.
> For mounts done by the jail itself, I guess we would go off the mount
> record's credential. So is this something you expect to be happening
> entirely in the kernel?
> 

If we want to clean this up from userspace, we need to teach the kernel how
to export vnet and mount table of a jail and then it would be nice to teach
jls how to print it (or maybe create a separate tool - jstat?), and of
course jail(8) how to use this information to clean things up.

Bonus points if jail(8) -r is able to clean up the jail without looking at
config file.

I would prefer if the jail would be able to just die if no problems were
encountered and that is easly done with a kernel-only implementation,
but this still would benefit from features described above (the
difference would be that if someone wants to kill the jail, jail(8)
would only call jail_remove). If jail could not die because some clean
up operations failed, jls (or jstat) would show what resources are
remaining along with error message (say, fs could not be unmounted
because it was busy). And then the user can fix the problem and do
jail(8) -r to re-run kernel clean up or clean on his own (say, unmount
filesystems), which effectively should kill the jail.

Thoughts?

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"


Re: automatic garbage collection of stuff mounted (etc.) by jailed root

2013-04-25 Thread Mateusz Guzik
On Wed, Apr 24, 2013 at 07:40:21PM -0600, Jamie Gritton wrote:
> On 04/24/13 19:22, Mateusz Guzik wrote:
> >On Mon, Apr 22, 2013 at 12:29:38PM -0600, Jamie Gritton wrote:
> >>On 04/22/13 11:39, Miroslav Lachman wrote:
> >>>>This already happens when jails are created using a jail.conf file. Any
> >>>>mounts there are unmounted as part of the jail removal process. Just
> >>>>recently I fixed it to properly do this unmounting in reverse order.
> >>>
> >>>Do you mean mounts defined in jail.conf or all mounts manually done by
> >>>root user in jail?
> >>>
> >>
> >>Ah, I see the difference. Yes, that's only for mounts in the jail.conf.
> >>For mounts done by the jail itself, I guess we would go off the mount
> >>record's credential. So is this something you expect to be happening
> >>entirely in the kernel?
> >>
> >
> >If we want to clean this up from userspace, we need to teach the kernel how
> >to export vnet and mount table of a jail and then it would be nice to teach
> >jls how to print it (or maybe create a separate tool - jstat?), and of
> >course jail(8) how to use this information to clean things up.
> >
> >Bonus points if jail(8) -r is able to clean up the jail without looking at
> >config file.
> >
> >I would prefer if the jail would be able to just die if no problems were
> >encountered and that is easly done with a kernel-only implementation,
> >but this still would benefit from features described above (the
> >difference would be that if someone wants to kill the jail, jail(8)
> >would only call jail_remove). If jail could not die because some clean
> >up operations failed, jls (or jstat) would show what resources are
> >remaining along with error message (say, fs could not be unmounted
> >because it was busy). And then the user can fix the problem and do
> >jail(8) -r to re-run kernel clean up or clean on his own (say, unmount
> >filesystems), which effectively should kill the jail.
> >
> >Thoughts?
> 
> If the kernel was able to export vnet and mounts, I would want jls to be
> the tool to show it. At least I wouldn't want to add another tool; a "-j
> jailname" option to df and ifconfig is an intriguing option. If jail(8)
> can get this information, then I would definitely want jail -r to clean
> it up; it doesn't matter whether or not there's a config file, since
> we're talking about things that are done outside the config file anyway.
> 

Lack of precision here, my bad. Clearly, if we just started a jail there
is no problem making it record everything it did.

With bonus points I was thinking about a jail started with, say,
mount.devfs. IIRC jail(8) just mounts devfs but this is not stored anywhere
and when such jail dies, we have an old mount noone knows about. So
bonus points for making a jail able to clean this up as well.

I'm fine with either jls or jstat.

> Vnet's little tricky because there are two kinds of interfaces in a vnet
> jail: those that were imported into the jail, and those that the jail
> has created itself. I don't know if the kernel knows anything about the
> difference between them, but it would make sense for the former to be
> returned to the host (which is the case) and the latter to be delete
> (which I have no idea about).
> 

That's for project taker to invesitage then.

> I still prefer that this be done in the kernel. For example, mount
> points have a credential attached, and that means that a removed jail
> will stick around as a zombie until it's unmounted.
> 

I prefer kernel implementation as well.

Since we seem to have an agreement of usefulness of the project, would
you be willing to add it to IdeasPage as a proposed GSoC project and
mentor a student (if any) who wants to work on this? I'm no fit for
mentoring.

Details of actual implementation can be worked on later.

-- 
Mateusz Guzik 
___
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"