On Sun, Dec 22, 2024 at 6:11 AM Diego Nieto Cid <dnie...@gmail.com> wrote: > I just didn't understand the hard/soft limits. It's better described > by the structure members and not the comments: > > struct rlimit { > rlim_t rlim_cur; /* Soft limit */ > rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ > }; > > So `rlim_cur` is the limit that must be enforced and `rlim_max` is > the maximum value an unprivileged process can set its `rlim_cur` to.
Hm so now that I think of it, it could make sense to enforce soft limits in userland, if we care about "memory allocated" and not "size of address space" (because the latter is influenced by memory received in messages etc). You'd track the amount of memory allocated, increase it in mmap (when making anonymous or private mappings) and sbrk, and compare it with the soft value from _hurd_rlimits to reject the calls when they would exceed the limit — no gnumach changes needed. It's not a security mechanism, and nothing would prevent you from ignoring the limit should you want to, but it sounds like it would solve all the use cases mentioned (zzuf testsuite, clueless programs that just try to malloc a lot of memory). Maybe I'm describing the same thing as RLIMIT_DATA? The Linux man page says Linux applies it to mmap as well as sbrk since 4.7. > Currently, I have issues understanding how vm_map_copy_t and how they > affect the total memory of the process. vm_map_copy_t is memory in transfer between maps; this is how a Mach message transferring memory is represented. By itself this memory is not owned by any particular map, and so shouldn't be accounted towards anyone's limit. But when you copy the copy object (vm_map_copy_t) *out* into a destination map (vm_map_copy_overwrite, vm_map_copyout, vm_map_copyout_page_list), the new memory appears in the destination map. vm_map_copyin* is how a copy object gets created, but that shouldn't increase the amount of address space used by anyone (but see about VM_PROT_NONE below). > > Yes, with the host port being an optional parameter for the case when > > the limit is getting requested to be increased. > > Great. FWIW, this means that the caller would be potentially sending the host priv port to someone who's not necessarily the kernel. That's fine if we're acting on mach_task_self (since if someone is interposing our task port, we can trust them), but not fine if we're a privileged server who's willing to raise the given task's memory allowance according to some policy. > > > [vm_map] [task exec] map size: 0, requested size: 4294967296, hard > > > limit: 2147483648 > > > > It'd be useful to get a backtrace. You can make your grub use > > /hurd/exec.static instead of /hurd/exec, and use kdb's trace/u command > > to get the userland backtrace easily. You could also add mach_print()s > > in exec.c. That is very likely from the 4 GB red zone, as Luca points out. > > One thing is: it's a VM_PROT_NONE/VM_PROT_NONE area. We wouldn't really > > want to make such area account for RLIMIT_AS, as they are not meant to > > store anything. > > > > This complicates a bit the accounting. I can keep a count of memory allocated > whit that protection. But I supose I need to check for calls to `vm_protect` > or > its underlying implementation. So a lot like soft&hard limits, in Mach, there is (current) protection and there is max protection. If you open a file with O_RDWR and mmap it with MAP_READ, you'd get protection of VM_PROT_READ and max protection of VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE, so you can then increase the protection with vm_protect. If you open the file with O_RDONLY however, you'd only get max protection of VM_PROT_READ | VM_PROT_EXECUTE, and an attempt to vm_protect it to include VM_PROT_WRITE will fail with KERN_PROTECTION_FAILURE. (This KERN_PROTECTION_FAILURE should be mapped to Unix EACCESS, but I don't see glibc doing this.) The 4 GB of memory that the exec server reserves is mapped with cur_protection = VM_PROT_NONE, max_protection = VM_PROT_NONE, so the protection can never be increased. This area is not usable as memory, it's a pure address space allocation, to prevent other things from being allocated in this range of address space. So Samuel is saying you could detect this case (of max_protection == VM_PROT_NONE) and avoid counting it towards the limit. But there's a complication: we still do want it to count towards map->size, so we may need yet another counter, or something. Similarly when deallocating memory with max_protection == VM_PROT_NONE, you don't want to de-crease the address space usage counter. And indeed if vm_map_protect(new_prot = VM_PROT_NONE, set_max = TRUE) is called and it changes non-VM_PROT_NONE memory to VM_PROT_NONE, you'd want to subtract its size from the address space usage. Sergey