Re: [RFC 12/18] limits: track RLIMIT_MEMLOCK actual max
On 06/18/16 00:59, Doug Ledford wrote: > On 6/13/2016 3:44 PM, Topi Miettinen wrote: >> Track maximum size of locked memory, presented in /proc/self/limits. > > You should have probably Cc:ed everyone on the cover letter and probably > patch 1 of this series. This patch is hard to decipher without the > additional context of those items. However, that said, I think I see Yes, I didn't know to CC everybody involved, sorry about that. > what you are doing. But your wording of your comments below is bad: > >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index feb9bb7..d3f3c9f 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -3378,10 +3378,16 @@ static inline unsigned long rlimit_max(unsigned int >> limit) >> return task_rlimit_max(current, limit); >> } >> >> +static inline void task_bump_rlimit(struct task_struct *tsk, >> +unsigned int limit, unsigned long r) >> +{ >> +if (READ_ONCE(tsk->signal->rlim_curmax[limit]) < r) >> +tsk->signal->rlim_curmax[limit] = r; >> +} >> + >> static inline void bump_rlimit(unsigned int limit, unsigned long r) >> { >> -if (READ_ONCE(current->signal->rlim_curmax[limit]) < r) >> -current->signal->rlim_curmax[limit] = r; >> +return task_bump_rlimit(current, limit, r); >> } >> >> #ifdef CONFIG_CPU_FREQ >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 46ecce4..192001e 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -76,6 +76,9 @@ static int bpf_map_charge_memlock(struct bpf_map *map) >> return -EPERM; >> } >> map->user = user; >> +/* XXX resource limits apply per task, not per user */ >> +bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) << >> +PAGE_SHIFT); > > No, these resource limits do not apply per task. They are per user. The problem could be that the manual pages do not say that but more to the opposite direction. For example, setrlimit(2) says that some limits (RLIMIT_MEMLOCK only for SHML_LOCK and others like RLIMIT_MSGQUEUE) apply indeed per user but others are per process. This note in mlock(2) could be also easily read as specifying a per process limit: "Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process can lock and the RLIMIT_MEMLOCK soft resource limit instead defines a limit on how much memory an unprivileged process may lock." It's also confusing (to me, at least) that the limit values are stored in per task structures, so the actual limits can be different for each process for the same user. The limits are also sometimes compared to per task current->mm->pinned_vm, in other places to current->mm->locked_vm and in still other places to per user user->locked_vm. How can the same limit apply to all of them at the same time? I'd think the user can actually lock many times the limit because of this. Anyway, assuming that the actual implementation is always correct and unchangeable due to ABI stability reasons, it's useless to add XXX comments like I did. > However, you are doing maximum usage accounting on a per-task basis by > adding a new counter to the signal struct of the task. Fine, but your > comments need to reflect that instead of the confusing comment above. > In addition, your function name is horrible for what you are doing. A > person reading this function will think that you are bumping the actual > rlimit on the task, which is not what you are doing. You are performing > per-task accounting of MEMLOCK memory. The actual permission checks are > per-user, and the primary accounting is per-user. So, really, this is > just a nice little feature that provides a more granular per-task usage > (but not control) so a user can see where their overall memlock memory > is being used. Fine. I would reword the comment something like this: > > /* XXX resource is tracked and limit enforced on a per user basis, >but we track it on a per-task basis as well so users can identify >hogs of this resource, stats can be found in /proc//limits */ > > And I would rename bump_rlimit and task_bump_rlimit to something like > account_rlimit and task_account_rlimit. Calling it bump just gives the > wrong idea entirely on first read. Right, others have also proposed better names. -Topi > >> return 0; >> } >> >> @@ -601,6 +604,9 @@ static int bpf_prog_charge_memlock(struct bpf_prog *prog) >> return -EPERM; >> } >> prog->aux->user = user; >> +/* XXX resource limits apply per task, not per user */ >> +bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) << >> +PAGE_SHIFT); >> return 0; >> } > >> @@ -798,6 +802,9 @@ int user_shm_lock(size_t size, struct user_struct *user) >> get_uid(user); >> user->locked_shm += locked; >> allowed = 1; >> + >> +/* XXX resource limits apply per ta
Re: [PATCH v13 01/16] PCI: Let pci_mmap_page_range() take resource address
On Fri, Jun 17, 2016 at 07:24:46PM -0700, Yinghai Lu wrote: > In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try > to check exposed value with resource start/end in proc mmap path. > > |start = vma->vm_pgoff; > |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1; > |pci_start = (mmap_api == PCI_MMAP_PROCFS) ? > |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0; > |if (start >= pci_start && start < pci_start + size && > |start + nr <= pci_start + size) > > That breaks sparc that exposed value is BAR value, and need to be offseted > to resource address. I asked this same question of the v12 patch, but I don't think you answered it: I'm not quite sure what you're saying here. Are you saying that sparc is currently broken, and this patch fixes it? If so, what exactly is broken? Can you give a small example of an mmap that is currently broken? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/mm: Don't do debug print before we do feature fixup
The use feature fixup in segment and page fault path and we should not call any function that can cause either of these before we finish feature fixup. Calling into early debug routine can result in segment fault as updated in https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com Avoid the segment fault by removing the debug print. Also remove the matching closing debug print at the end of the function. Even though the problem existed for a long time, this became more apparent after commit caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code"). Before that commit we used to use feature fixup to finalize code path related to 1T and 256MB segment. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kernel/setup_64.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 8222950f820f..da98f532d160 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -466,7 +466,6 @@ static void __init initialize_cache_info(void) */ void __init setup_system(void) { - DBG(" -> setup_system()\n"); /* Apply the CPUs-specific and firmware specific fixups to kernel * text (nop out sections not relevant to this CPU or this firmware) @@ -584,7 +583,6 @@ void __init setup_system(void) (unsigned long long)PHYSICAL_START); pr_info("-\n"); - DBG(" <- setup_system()\n"); } /* This returns the limit below which memory accesses to the linear -- 2.7.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled
Michael Ellerman writes: > On Fri, 2016-06-17 at 08:33 +1000, Benjamin Herrenschmidt wrote: >> On Thu, 2016-06-16 at 22:49 +0300, Denis Kirjanov wrote: >> > - >> > +BEGIN_MMU_FTR_SECTION >> > + b 2f >> > +END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) >> > andi. r10,r12,MSR_RI /* check for unrecoverable exception >> > */ >> > beq-2f >> >> Are we taking an SLB miss before we do the fixup maybe ? > > Yeah that's the only explanation that makes any sense. > > I think instead of patching down this low we should instead be redirecting SLB > misses to unknown_exception() when radix is enabled. Aneesh? > The 2f branch ends up doing unrecoverable exception. Or are you suggesting something else ? -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V2] powerpc/mm: Don't do debug print before we do feature fixup
We use feature fixup in segment and page fault path and hence we should not call any function that can cause either of these before we finish feature fixup. Calling into early debug routine can result in segment fault as updated in https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com Avoid the segment fault by removing the debug print. Also remove the matching closing debug print at the end of the function. Even though the problem existed for a long time, this became more apparent after commit caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code"). Before that commit we used to use feature fixup to finalize code path related to 1T and 256MB segment. Cc: Denis Kirjanov Signed-off-by: Aneesh Kumar K.V --- Changes from V1: * update commit message * update Cc: list with correct email address arch/powerpc/kernel/setup_64.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 8222950f820f..da98f532d160 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -466,7 +466,6 @@ static void __init initialize_cache_info(void) */ void __init setup_system(void) { - DBG(" -> setup_system()\n"); /* Apply the CPUs-specific and firmware specific fixups to kernel * text (nop out sections not relevant to this CPU or this firmware) @@ -584,7 +583,6 @@ void __init setup_system(void) (unsigned long long)PHYSICAL_START); pr_info("-\n"); - DBG(" <- setup_system()\n"); } /* This returns the limit below which memory accesses to the linear -- 2.7.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2] powerpc/mm: Don't do debug print before we do feature fixup
On Sat, 2016-06-18 at 23:57 +0530, Aneesh Kumar K.V wrote: > We use feature fixup in segment and page fault path and hence we should > not call any function that can cause either of these before we finish > feature fixup. > > Calling into early debug routine can result in segment fault as updated > in > https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com > > Avoid the segment fault by removing the debug print. Also remove the > matching closing debug print at the end of the function. Please add a comment at the beginning of setup_system() explaining the situation. IE, that until the fixups are done, not even debug printfs can be used. It used to work though, the fact that we used 256M segment in that case wasn't an issue, in part because btext only really existed on machines that didn't have 1T segments. It was only ever going to be a segment fault, the rest was bolted (early ioremap). Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev