Re: POSIX shared memory and dying jails
On Fri, 25 Jun 2021 20:18:39 -0700 James Gritton wrote: > On 2021-06-25 09:58, Michael Gmelin wrote: > > Another problem caused by the lack of jail ownership is that access > > semantics are a bit strange. E.g., a jail based on / can easily list > > (and remove) all memory allocations in the system, while for other > > jails > > it depends. They can stat their own allocations like in: > > > > # posixshmcontrol stat /xyz > > output as expected... > > > > But not list them: > > > > # posixshmcontrol ls > > posixshmcontrol: cannot get kern.ipc.posix_shm_list length: > > Operation not permitted > > > > Probably related to matching the path of the allocation, I didn't > > look into the code. > > That's just a case of the sysctl not being marked as jail-safe. > Looking at the code, it's clear that it needs to be altered when > called from within a jail, but preventing it is definitely not the > right thing. > > > but having something automatic in the OS would be nice. Or being > > able to run `posixshmcontrol -j shmtest ls`. Seems like this would > > be quite some effort though to get it right - also in terms of who > > can access what - right now, it's simply based on the path, which > > also gives > > a lot of flexibility. > > Since access to the shared memory segments themselves is only on file > permissions and pathnames, just making a "posixshmcontrol -j" also > rely on pathnames actually makes sense. > > Put this into a bug report, and I'll take a closer look. Probably two > different bugs for different issues (listing and automatic removal). > Hi Jamie, I *finally* found the time to write the bug reports: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257554 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257555 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257556 I took the liberty to assign them to you. Best, Michael -- Michael Gmelin
POSIX shared memory, jails, and (lack of) limits
Hi, I've been playing a bit with POSIX shared memory and, unlike for SysV shared memory, I couldn't find any way to limit its use by jails. First, I looked at racct/rctl, but there is no resource for POSIX shared memory and memoryuse/vmemoryuse don't seem to have an effect (which makes sense). Then I checked if there are jail parameters that could help, but there doesn't seem to be anything like "allow.sysvshm" for POSIX shared memory to limit access to the feature. So, unless I'm missing something, it seems like all jails on a system have unlimited access to POSIX shared memory and therefore any single jail can use up the jailhost's virtual memory until the jailhost comes to a grinding halt. I wrote a little test program that keeps allocating POSIX shared memory inside of a jail and it can easily bring the host down to its knees: login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, was killed: out of swap space Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port /dev/ttyu0, sleeping 30 secs Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was killed: out of swap space Best, Michael -- Michael Gmelin
Re: POSIX shared memory, jails, and (lack of) limits
On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: > Hi, > > I've been playing a bit with POSIX shared memory and, unlike for SysV > shared memory, I couldn't find any way to limit its use by jails. > > First, I looked at racct/rctl, but there is no resource for POSIX shared > memory and memoryuse/vmemoryuse don't seem to have an effect (which > makes sense). > > Then I checked if there are jail parameters that could help, but there > doesn't seem to be anything like "allow.sysvshm" for POSIX shared > memory to limit access to the feature. > > So, unless I'm missing something, it seems like all jails on a system > have unlimited access to POSIX shared memory and therefore any single > jail can use up the jailhost's virtual memory until the jailhost comes > to a grinding halt. > > I wrote a little test program that keeps allocating POSIX shared memory > inside of a jail and it can easily bring the host down to its knees: > > login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, > was killed: out of swap space > Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port > /dev/ttyu0, sleeping 30 secs > Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was > killed: out of swap space Posix shm is limited by the swap accounting. For non-jail consumers, it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make RLIMIT_SWAP per-jail per-uid.
Re: POSIX shared memory, jails, and (lack of) limits
> On 2. Aug 2021, at 15:56, Konstantin Belousov wrote: > > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: >> Hi, >> >> I've been playing a bit with POSIX shared memory and, unlike for SysV >> shared memory, I couldn't find any way to limit its use by jails. >> >> First, I looked at racct/rctl, but there is no resource for POSIX shared >> memory and memoryuse/vmemoryuse don't seem to have an effect (which >> makes sense). >> >> Then I checked if there are jail parameters that could help, but there >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared >> memory to limit access to the feature. >> >> So, unless I'm missing something, it seems like all jails on a system >> have unlimited access to POSIX shared memory and therefore any single >> jail can use up the jailhost's virtual memory until the jailhost comes >> to a grinding halt. >> >> I wrote a little test program that keeps allocating POSIX shared memory >> inside of a jail and it can easily bring the host down to its knees: >> >> login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, >> was killed: out of swap space >> Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port >> /dev/ttyu0, sleeping 30 secs >> Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was >> killed: out of swap space > > Posix shm is limited by the swap accounting. For non-jail consumers, > it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make > RLIMIT_SWAP per-jail per-uid. Unfortunately it seems like POSIX shared memory is not linked to the jail it was created in (we discussed this on this list in June and I created a few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t have the desired effect ^_^). Best Michael
Re: POSIX shared memory, jails, and (lack of) limits
On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote: > > > > On 2. Aug 2021, at 15:56, Konstantin Belousov wrote: > > > > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: > >> Hi, > >> > >> I've been playing a bit with POSIX shared memory and, unlike for SysV > >> shared memory, I couldn't find any way to limit its use by jails. > >> > >> First, I looked at racct/rctl, but there is no resource for POSIX shared > >> memory and memoryuse/vmemoryuse don't seem to have an effect (which > >> makes sense). > >> > >> Then I checked if there are jail parameters that could help, but there > >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared > >> memory to limit access to the feature. > >> > >> So, unless I'm missing something, it seems like all jails on a system > >> have unlimited access to POSIX shared memory and therefore any single > >> jail can use up the jailhost's virtual memory until the jailhost comes > >> to a grinding halt. > >> > >> I wrote a little test program that keeps allocating POSIX shared memory > >> inside of a jail and it can easily bring the host down to its knees: > >> > >> login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, > >> was killed: out of swap space > >> Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port > >> /dev/ttyu0, sleeping 30 secs > >> Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was > >> killed: out of swap space > > > > Posix shm is limited by the swap accounting. For non-jail consumers, > > it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make > > RLIMIT_SWAP per-jail per-uid. > > Unfortunately it seems like POSIX shared memory is not linked to the jail it > was created in (we discussed this on this list in June and I created a few > PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t > have the desired effect ^_^). > In what sense 'not linked'? The backing vm_object is created with the current process credentials, which are jailed if creator belongs to a jail.
Re: POSIX shared memory, jails, and (lack of) limits
On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote: > On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote: > > > > > > > On 2. Aug 2021, at 15:56, Konstantin Belousov wrote: > > > > > > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: > > >> Hi, > > >> > > >> I've been playing a bit with POSIX shared memory and, unlike for SysV > > >> shared memory, I couldn't find any way to limit its use by jails. > > >> > > >> First, I looked at racct/rctl, but there is no resource for POSIX shared > > >> memory and memoryuse/vmemoryuse don't seem to have an effect (which > > >> makes sense). Cyril has written a few patches for racct, including one which includes POSIX shared memory objects in rctl's "nshm" and "shmsize" resources, which currently only apply to SysV shm objects: https://reviews.freebsd.org/D30775 We plan to get them committed in the next couple of weeks. "memoryuse" and "vmemoryuse" only count objects that are mapped into some process' address space, so they're not the right way to limit allocations of POSIX shm objects, see below. > > >> > > >> Then I checked if there are jail parameters that could help, but there > > >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared > > >> memory to limit access to the feature. > > >> > > >> So, unless I'm missing something, it seems like all jails on a system > > >> have unlimited access to POSIX shared memory and therefore any single > > >> jail can use up the jailhost's virtual memory until the jailhost comes > > >> to a grinding halt. > > >> > > >> I wrote a little test program that keeps allocating POSIX shared memory > > >> inside of a jail and it can easily bring the host down to its knees: > > >> > > >> login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, > > >> was killed: out of swap space > > >> Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port > > >> /dev/ttyu0, sleeping 30 secs > > >> Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was > > >> killed: out of swap space > > > > > > Posix shm is limited by the swap accounting. For non-jail consumers, > > > it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make > > > RLIMIT_SWAP per-jail per-uid. racct/rctl provides the "swapuse" resource which should account for this. It does not apply to largepage objects, though. > > Unfortunately it seems like POSIX shared memory is not linked to the jail > > it was created in (we discussed this on this list in June and I created a > > few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 > > won’t have the desired effect ^_^). > > > > In what sense 'not linked'? The backing vm_object is created with the > current process credentials, which are jailed if creator belongs to a jail. I believe the problem that Michael is referring to is that named POSIX shm objects created within a jail do not disappear when the jail is destroyed, and the vm object cred reference is leaked. But this is unrelated to swap space accounting.
Re: POSIX shared memory, jails, and (lack of) limits
On 8/2/21 9:40 PM, Mark Johnston wrote: Cyril has written a few patches for racct, including one which includes POSIX shared memory objects in rctl's "nshm" and "shmsize" resources, which currently only apply to SysV shm objects: https://reviews.freebsd.org/D30775 We plan to get them committed in the next couple of weeks. Hello, I haven't looked at it for a bit, but the last time I tried to use sysutils/jail_exporter to get graphs for jail resource usage the graphs for Postgres jails were hilariously wrong, which I believe I tracked down to shared memory being counted more than once. I gave up trying to figure out how to fix it and just lived with Grafana telling me a postgres jail on a 128gb jailhost used 900gb of memory. But it sounds like the above might fix this? Thanks! Best regards, Thomas Steen Rasmussen
Re: POSIX shared memory, jails, and (lack of) limits
> On 2. Aug 2021, at 21:40, Mark Johnston wrote: > > On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote: >>> On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote: >>> >>> On 2. Aug 2021, at 15:56, Konstantin Belousov wrote: On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: > Hi, > > I've been playing a bit with POSIX shared memory and, unlike for SysV > shared memory, I couldn't find any way to limit its use by jails. > > First, I looked at racct/rctl, but there is no resource for POSIX shared > memory and memoryuse/vmemoryuse don't seem to have an effect (which > makes sense). > > Cyril has written a few patches for racct, including one which includes > POSIX shared memory objects in rctl's "nshm" and "shmsize" resources, > which currently only apply to SysV shm objects: > https://reviews.freebsd.org/D30775 > We plan to get them committed in the next couple of weeks. > > "memoryuse" and "vmemoryuse" only count objects that are mapped into > some process' address space, so they're not the right way to limit > allocations of POSIX shm objects, see below. > > > Then I checked if there are jail parameters that could help, but there > doesn't seem to be anything like "allow.sysvshm" for POSIX shared > memory to limit access to the feature. > > So, unless I'm missing something, it seems like all jails on a system > have unlimited access to POSIX shared memory and therefore any single > jail can use up the jailhost's virtual memory until the jailhost comes > to a grinding halt. > > I wrote a little test program that keeps allocating POSIX shared memory > inside of a jail and it can easily bring the host down to its knees: > > login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, > was killed: out of swap space > Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port > /dev/ttyu0, sleeping 30 secs > Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was > killed: out of swap space Posix shm is limited by the swap accounting. For non-jail consumers, it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make RLIMIT_SWAP per-jail per-uid. > > racct/rctl provides the "swapuse" resource which should account for > this. It does not apply to largepage objects, though. I tried to limit swapuse for a jail and it doesn’t limit posix shared memory created within the jail (I can still create shared memory segments within the jail until the machine runs out of virtual memory). Should I share the test case to make sure I didn’t mess up? -m > >>> Unfortunately it seems like POSIX shared memory is not linked to the jail >>> it was created in (we discussed this on this list in June and I created a >>> few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 >>> won’t have the desired effect ^_^). >>> >> >> In what sense 'not linked'? The backing vm_object is created with the >> current process credentials, which are jailed if creator belongs to a jail. > > I believe the problem that Michael is referring to is that named POSIX > shm objects created within a jail do not disappear when the jail is > destroyed, and the vm object cred reference is leaked. But this is > unrelated to swap space accounting.