Re: POSIX shared memory and dying jails

2021-08-02 Thread Michael Gmelin



On Fri, 25 Jun 2021 20:18:39 -0700
James Gritton  wrote:

> On 2021-06-25 09:58, Michael Gmelin wrote:
> > Another problem caused by the lack of jail ownership is that access
> > semantics are a bit strange. E.g., a jail based on / can easily list
> > (and remove) all memory allocations in the system, while for other 
> > jails
> > it depends. They can stat their own allocations like in:
> > 
> > # posixshmcontrol stat /xyz
> > output as expected...
> > 
> > But not list them:
> > 
> > # posixshmcontrol ls
> > posixshmcontrol: cannot get kern.ipc.posix_shm_list length:
> > Operation not permitted
> > 
> > Probably related to matching the path of the allocation, I didn't
> > look into the code.  
> 
> That's just a case of the sysctl not being marked as jail-safe.
> Looking at the code, it's clear that it needs to be altered when
> called from within a jail, but preventing it is definitely not the
> right thing.
> 
> > but having something automatic in the OS would be nice. Or being
> > able to run `posixshmcontrol -j shmtest ls`. Seems like this would
> > be quite some effort though to get it right - also in terms of who
> > can access what - right now, it's simply based on the path, which
> > also gives
> > a lot of flexibility.  
> 
> Since access to the shared memory segments themselves is only on file
> permissions and pathnames, just making a "posixshmcontrol -j" also
> rely on pathnames actually makes sense.
> 
> Put this into a bug report, and I'll take a closer look.  Probably two
> different bugs for different issues (listing and automatic removal).
> 

Hi Jamie,

I *finally* found the time to write the bug reports:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257554
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257555
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257556

I took the liberty to assign them to you.

Best,
Michael

-- 
Michael Gmelin



POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Michael Gmelin
Hi,

I've been playing a bit with POSIX shared memory and, unlike for SysV
shared memory, I couldn't find any way to limit its use by jails.

First, I looked at racct/rctl, but there is no resource for POSIX shared
memory and memoryuse/vmemoryuse don't seem to have an effect (which
makes sense).

Then I checked if there are jail parameters that could help, but there
doesn't seem to be anything like "allow.sysvshm" for POSIX shared
memory to limit access to the feature.

So, unless I'm missing something, it seems like all jails on a system
have unlimited access to POSIX shared memory and therefore any single
jail can use up the jailhost's virtual memory until the jailhost comes
to a grinding halt.

I wrote a little test program that keeps allocating POSIX shared memory
inside of a jail and it can easily bring the host down to its knees:

  login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
  was killed: out of swap space
  Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
  /dev/ttyu0, sleeping 30 secs
  Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
  killed: out of swap space

Best,
Michael

-- 
Michael Gmelin



Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Konstantin Belousov
On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
> Hi,
> 
> I've been playing a bit with POSIX shared memory and, unlike for SysV
> shared memory, I couldn't find any way to limit its use by jails.
> 
> First, I looked at racct/rctl, but there is no resource for POSIX shared
> memory and memoryuse/vmemoryuse don't seem to have an effect (which
> makes sense).
> 
> Then I checked if there are jail parameters that could help, but there
> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
> memory to limit access to the feature.
> 
> So, unless I'm missing something, it seems like all jails on a system
> have unlimited access to POSIX shared memory and therefore any single
> jail can use up the jailhost's virtual memory until the jailhost comes
> to a grinding halt.
> 
> I wrote a little test program that keeps allocating POSIX shared memory
> inside of a jail and it can easily bring the host down to its knees:
> 
>   login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
>   was killed: out of swap space
>   Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
>   /dev/ttyu0, sleeping 30 secs
>   Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
>   killed: out of swap space

Posix shm is limited by the swap accounting.  For non-jail consumers,
it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
RLIMIT_SWAP per-jail per-uid.



Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Michael Gmelin



> On 2. Aug 2021, at 15:56, Konstantin Belousov  wrote:
> 
> On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
>> Hi,
>> 
>> I've been playing a bit with POSIX shared memory and, unlike for SysV
>> shared memory, I couldn't find any way to limit its use by jails.
>> 
>> First, I looked at racct/rctl, but there is no resource for POSIX shared
>> memory and memoryuse/vmemoryuse don't seem to have an effect (which
>> makes sense).
>> 
>> Then I checked if there are jail parameters that could help, but there
>> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
>> memory to limit access to the feature.
>> 
>> So, unless I'm missing something, it seems like all jails on a system
>> have unlimited access to POSIX shared memory and therefore any single
>> jail can use up the jailhost's virtual memory until the jailhost comes
>> to a grinding halt.
>> 
>> I wrote a little test program that keeps allocating POSIX shared memory
>> inside of a jail and it can easily bring the host down to its knees:
>> 
>>  login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
>>  was killed: out of swap space
>>  Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
>>  /dev/ttyu0, sleeping 30 secs
>>  Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
>>  killed: out of swap space
> 
> Posix shm is limited by the swap accounting.  For non-jail consumers,
> it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
> RLIMIT_SWAP per-jail per-uid.

Unfortunately it seems like POSIX shared memory is not linked to the jail it 
was created in (we discussed this on this list in June and I created a few PRs 
about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t have 
the desired effect ^_^).

Best
Michael





Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Konstantin Belousov
On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote:
> 
> 
> > On 2. Aug 2021, at 15:56, Konstantin Belousov  wrote:
> > 
> > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
> >> Hi,
> >> 
> >> I've been playing a bit with POSIX shared memory and, unlike for SysV
> >> shared memory, I couldn't find any way to limit its use by jails.
> >> 
> >> First, I looked at racct/rctl, but there is no resource for POSIX shared
> >> memory and memoryuse/vmemoryuse don't seem to have an effect (which
> >> makes sense).
> >> 
> >> Then I checked if there are jail parameters that could help, but there
> >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
> >> memory to limit access to the feature.
> >> 
> >> So, unless I'm missing something, it seems like all jails on a system
> >> have unlimited access to POSIX shared memory and therefore any single
> >> jail can use up the jailhost's virtual memory until the jailhost comes
> >> to a grinding halt.
> >> 
> >> I wrote a little test program that keeps allocating POSIX shared memory
> >> inside of a jail and it can easily bring the host down to its knees:
> >> 
> >>  login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
> >>  was killed: out of swap space
> >>  Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
> >>  /dev/ttyu0, sleeping 30 secs
> >>  Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
> >>  killed: out of swap space
> > 
> > Posix shm is limited by the swap accounting.  For non-jail consumers,
> > it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
> > RLIMIT_SWAP per-jail per-uid.
> 
> Unfortunately it seems like POSIX shared memory is not linked to the jail it 
> was created in (we discussed this on this list in June and I created a few 
> PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t 
> have the desired effect ^_^).
> 

In what sense 'not linked'?  The backing vm_object is created with the
current process credentials, which are jailed if creator belongs to a jail.



Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Mark Johnston
On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote:
> On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote:
> > 
> > 
> > > On 2. Aug 2021, at 15:56, Konstantin Belousov  wrote:
> > > 
> > > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
> > >> Hi,
> > >> 
> > >> I've been playing a bit with POSIX shared memory and, unlike for SysV
> > >> shared memory, I couldn't find any way to limit its use by jails.
> > >> 
> > >> First, I looked at racct/rctl, but there is no resource for POSIX shared
> > >> memory and memoryuse/vmemoryuse don't seem to have an effect (which
> > >> makes sense).

Cyril has written a few patches for racct, including one which includes
POSIX shared memory objects in rctl's "nshm" and "shmsize" resources,
which currently only apply to SysV shm objects:
https://reviews.freebsd.org/D30775
We plan to get them committed in the next couple of weeks.

"memoryuse" and "vmemoryuse" only count objects that are mapped into
some process' address space, so they're not the right way to limit
allocations of POSIX shm objects, see below.

> > >> 
> > >> Then I checked if there are jail parameters that could help, but there
> > >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
> > >> memory to limit access to the feature.
> > >> 
> > >> So, unless I'm missing something, it seems like all jails on a system
> > >> have unlimited access to POSIX shared memory and therefore any single
> > >> jail can use up the jailhost's virtual memory until the jailhost comes
> > >> to a grinding halt.
> > >> 
> > >> I wrote a little test program that keeps allocating POSIX shared memory
> > >> inside of a jail and it can easily bring the host down to its knees:
> > >> 
> > >>  login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
> > >>  was killed: out of swap space
> > >>  Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
> > >>  /dev/ttyu0, sleeping 30 secs
> > >>  Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
> > >>  killed: out of swap space
> > > 
> > > Posix shm is limited by the swap accounting.  For non-jail consumers,
> > > it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
> > > RLIMIT_SWAP per-jail per-uid.

racct/rctl provides the "swapuse" resource which should account for
this.  It does not apply to largepage objects, though.

> > Unfortunately it seems like POSIX shared memory is not linked to the jail 
> > it was created in (we discussed this on this list in June and I created a 
> > few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 
> > won’t have the desired effect ^_^).
> > 
> 
> In what sense 'not linked'?  The backing vm_object is created with the
> current process credentials, which are jailed if creator belongs to a jail.

I believe the problem that Michael is referring to is that named POSIX
shm objects created within a jail do not disappear when the jail is
destroyed, and the vm object cred reference is leaked.  But this is
unrelated to swap space accounting.



Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Thomas Steen Rasmussen via jail

On 8/2/21 9:40 PM, Mark Johnston wrote:

Cyril has written a few patches for racct, including one which includes
POSIX shared memory objects in rctl's "nshm" and "shmsize" resources,
which currently only apply to SysV shm objects:
https://reviews.freebsd.org/D30775
We plan to get them committed in the next couple of weeks.


Hello,

I haven't looked at it for a bit, but the last time I tried to use 
sysutils/jail_exporter to get graphs for jail resource usage the graphs 
for Postgres jails were hilariously wrong, which I believe I tracked 
down to shared memory being counted more than once.


I gave up trying to figure out how to fix it and just lived with Grafana 
telling me a postgres jail on a 128gb jailhost used 900gb of memory.


But it sounds like the above might fix this?

Thanks!

Best regards,

Thomas Steen Rasmussen




Re: POSIX shared memory, jails, and (lack of) limits

2021-08-02 Thread Michael Gmelin



> On 2. Aug 2021, at 21:40, Mark Johnston  wrote:
> 
> On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote:
>>> On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote:
>>> 
>>> 
 On 2. Aug 2021, at 15:56, Konstantin Belousov  wrote:
 
 On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
> Hi,
> 
> I've been playing a bit with POSIX shared memory and, unlike for SysV
> shared memory, I couldn't find any way to limit its use by jails.
> 
> First, I looked at racct/rctl, but there is no resource for POSIX shared
> memory and memoryuse/vmemoryuse don't seem to have an effect (which
> makes sense).
> 
> Cyril has written a few patches for racct, including one which includes
> POSIX shared memory objects in rctl's "nshm" and "shmsize" resources,
> which currently only apply to SysV shm objects:
> https://reviews.freebsd.org/D30775
> We plan to get them committed in the next couple of weeks.
> 
> "memoryuse" and "vmemoryuse" only count objects that are mapped into
> some process' address space, so they're not the right way to limit
> allocations of POSIX shm objects, see below.
> 
> 
> Then I checked if there are jail parameters that could help, but there
> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
> memory to limit access to the feature.
> 
> So, unless I'm missing something, it seems like all jails on a system
> have unlimited access to POSIX shared memory and therefore any single
> jail can use up the jailhost's virtual memory until the jailhost comes
> to a grinding halt.
> 
> I wrote a little test program that keeps allocating POSIX shared memory
> inside of a jail and it can easily bring the host down to its knees:
> 
> login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
> was killed: out of swap space
> Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
> /dev/ttyu0, sleeping 30 secs
> Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
> killed: out of swap space
 
 Posix shm is limited by the swap accounting.  For non-jail consumers,
 it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
 RLIMIT_SWAP per-jail per-uid.
> 
> racct/rctl provides the "swapuse" resource which should account for
> this.  It does not apply to largepage objects, though.

I tried to limit swapuse for a jail and it doesn’t limit posix shared memory 
created within the jail (I can still create shared memory segments within the 
jail until the machine runs out of virtual memory).

Should I share the test case to make sure I didn’t mess up?

-m



> 
>>> Unfortunately it seems like POSIX shared memory is not linked to the jail 
>>> it was created in (we discussed this on this list in June and I created a 
>>> few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 
>>> won’t have the desired effect ^_^).
>>> 
>> 
>> In what sense 'not linked'?  The backing vm_object is created with the
>> current process credentials, which are jailed if creator belongs to a jail.
> 
> I believe the problem that Michael is referring to is that named POSIX
> shm objects created within a jail do not disappear when the jail is
> destroyed, and the vm object cred reference is leaked.  But this is
> unrelated to swap space accounting.