On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov <o...@redhat.com> wrote: > On 09/10, Kees Cook wrote: >> >> > So get_arg_page() does >> > >> > /* >> > * Since the stack will hold pointers to the strings, we >> > * must account for them as well. >> > * >> > * The size calculation is the entire vma while each arg >> > page is >> > * built, so each time we get here it's calculating how >> > far it >> > * is currently (rather than each call being just the newly >> > * added size from the arg page). As a result, we need to >> > * always add the entire size of the pointers, so that on >> > the >> > * last call to get_arg_page() we'll actually have the >> > entire >> > * correct size. >> > */ >> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *); >> > if (ptr_size > ULONG_MAX - size) >> > goto fail; >> > size += ptr_size; >> > >> > OK, but >> > acct_arg_size(bprm, size / PAGE_SIZE); >> > >> > after that doesn't look exactly right. This additional space will be used >> > later >> > when the process already uses bprm->mm, right? so it shouldn't be >> > accounted by >> > acct_arg_size(). >> >> My understanding (based on the comment about acct_arg_size()) is that >> before exec_mmap() happens, the memory used to build the new arguments >> copy memory area gets accounted to the MM_ANONPAGES resource limit of >> the execing process. > > Yes, because otherwise oom-killer can't account the memory populated by > get_arg_page() in bprm->mm. > >> I couldn't find any place where the argc/envc >> pointers were being included in the count, > > But why??? To clarify, > > size += ptr_size; > > after acct_arg_size() is clear and correct, we are going to check rlim_stack > and thus the size should include the pointers we will add in > create_elf_tables(). > > But acct_arg_size() should only account the pages we allocate for bprm->mm, > nothing more. create_elf_tables() does not allocate the memory when it > populates > arg_start/arg_end/env_start/env_end. Plus at this time the process has already > switched to bprm->mm.
I've looked more closely now. So, while I agree with you about resource limits, there's a corner case that is better handled here: once we've called flush_old_exec(), we can no longer send errors back to the parent. We just segfault. So, I think it's better to give a resource limit error early, since it is able to do the math early. If we move acct_arg_size() earlier, then the "immediate" resource utilization is checked, but it means it can just segfault later. If we leave it as-is, we account for later memory allocations "too early", but we'll still not be able to run: but we can tell the parent why. I prefer leave it as-is. >> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case... >> >> Hm? acct_arg_size() takes pages, not bytes. I think this is correct? >> What doesn't look right to you? > > Please forget. I meant that _if_ we actually wanted to account this additional > memory in bprm->pages, than we would probably need something like > acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)). I'd need to study that more, but that change seems reasonable. :) -Kees -- Kees Cook Pixel Security