Re: [RFC 12/18] limits: track RLIMIT_MEMLOCK actual max

2016-06-18 Thread Topi Miettinen
On 06/18/16 00:59, Doug Ledford wrote:
> On 6/13/2016 3:44 PM, Topi Miettinen wrote:
>> Track maximum size of locked memory, presented in /proc/self/limits.
> 
> You should have probably Cc:ed everyone on the cover letter and probably
> patch 1 of this series.  This patch is hard to decipher without the
> additional context of those items.  However, that said, I think I see

Yes, I didn't know to CC everybody involved, sorry about that.

> what you are doing.  But your wording of your comments below is bad:
> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index feb9bb7..d3f3c9f 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -3378,10 +3378,16 @@ static inline unsigned long rlimit_max(unsigned int 
>> limit)
>>  return task_rlimit_max(current, limit);
>>  }
>>  
>> +static inline void task_bump_rlimit(struct task_struct *tsk,
>> +unsigned int limit, unsigned long r)
>> +{
>> +if (READ_ONCE(tsk->signal->rlim_curmax[limit]) < r)
>> +tsk->signal->rlim_curmax[limit] = r;
>> +}
>> +
>>  static inline void bump_rlimit(unsigned int limit, unsigned long r)
>>  {
>> -if (READ_ONCE(current->signal->rlim_curmax[limit]) < r)
>> -current->signal->rlim_curmax[limit] = r;
>> +return task_bump_rlimit(current, limit, r);
>>  }
>>  
>>  #ifdef CONFIG_CPU_FREQ
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 46ecce4..192001e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -76,6 +76,9 @@ static int bpf_map_charge_memlock(struct bpf_map *map)
>>  return -EPERM;
>>  }
>>  map->user = user;
>> +/* XXX resource limits apply per task, not per user */
>> +bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) <<
>> +PAGE_SHIFT);
> 
> No, these resource limits do not apply per task.  They are per user.

The problem could be that the manual pages do not say that but more to
the opposite direction. For example, setrlimit(2) says that some limits
(RLIMIT_MEMLOCK only for SHML_LOCK and others like RLIMIT_MSGQUEUE)
apply indeed per user but others are per process. This note in mlock(2)
could be also easily read as specifying a per process limit:

"Since Linux 2.6.9, no limits are placed on the amount of memory that
   a privileged process can lock and the RLIMIT_MEMLOCK soft resource
   limit instead defines a limit on how much memory an unprivileged
   process may lock."

It's also confusing (to me, at least) that the limit values are stored
in per task structures, so the actual limits can be different for each
process for the same user.

The limits are also sometimes compared to per task
current->mm->pinned_vm, in other places to current->mm->locked_vm and in
still other places to per user user->locked_vm. How can the same limit
apply to all of them at the same time? I'd think the user can actually
lock many times the limit because of this.

Anyway, assuming that the actual implementation is always correct and
unchangeable due to ABI stability reasons, it's useless to add XXX
comments like I did.

> However, you are doing maximum  usage accounting on a per-task basis by
> adding a new counter to the signal struct of the task.  Fine, but your
> comments need to reflect that instead of the confusing comment above.
> In addition, your function name is horrible for what you are doing.  A
> person reading this function will think that you are bumping the actual
> rlimit on the task, which is not what you are doing.  You are performing
> per-task accounting of MEMLOCK memory.  The actual permission checks are
> per-user, and the primary accounting is per-user.  So, really, this is
> just a nice little feature that provides a more granular per-task usage
> (but not control) so a user can see where their overall memlock memory
> is being used.  Fine.  I would reword the comment something like this:
> 
> /* XXX resource is tracked and limit enforced on a per user basis,
>but we track it on a per-task basis as well so users can identify
>hogs of this resource, stats can be found in /proc//limits */
> 
> And I would rename bump_rlimit and task_bump_rlimit to something like
> account_rlimit and task_account_rlimit.  Calling it bump just gives the
> wrong idea entirely on first read.

Right, others have also proposed better names.

-Topi

> 
>>  return 0;
>>  }
>>  
>> @@ -601,6 +604,9 @@ static int bpf_prog_charge_memlock(struct bpf_prog *prog)
>>  return -EPERM;
>>  }
>>  prog->aux->user = user;
>> +/* XXX resource limits apply per task, not per user */
>> +bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) <<
>> +PAGE_SHIFT);
>>  return 0;
>>  }
> 
>> @@ -798,6 +802,9 @@ int user_shm_lock(size_t size, struct user_struct *user)
>>  get_uid(user);
>>  user->locked_shm += locked;
>>  allowed = 1;
>> +
>> +/* XXX resource limits apply per ta

Re: [PATCH v13 01/16] PCI: Let pci_mmap_page_range() take resource address

2016-06-18 Thread Bjorn Helgaas
On Fri, Jun 17, 2016 at 07:24:46PM -0700, Yinghai Lu wrote:
> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
> to check exposed value with resource start/end in proc mmap path.
> 
> |start = vma->vm_pgoff;
> |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> |pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> |if (start >= pci_start && start < pci_start + size &&
> |start + nr <= pci_start + size)
> 
> That breaks sparc that exposed value is BAR value, and need to be offseted
> to resource address.

I asked this same question of the v12 patch, but I don't think you
answered it:

I'm not quite sure what you're saying here.  Are you saying that sparc
is currently broken, and this patch fixes it?  If so, what exactly is
broken?  Can you give a small example of an mmap that is currently
broken?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/mm: Don't do debug print before we do feature fixup

2016-06-18 Thread Aneesh Kumar K.V
The use feature fixup in segment and page fault path and we should
not call any function that can cause either of these before we finish
feature fixup.

Calling into early debug routine can result in segment fault as updated
in 
https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com

Avoid the segment fault by removing the debug print. Also remove the
matching closing debug print at the end of the function.

Even though the problem existed for a long time, this became more apparent
after commit caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly
isolate hash related code"). Before that commit we used to use feature
fixup to finalize code path related to 1T and 256MB segment.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/setup_64.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 8222950f820f..da98f532d160 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -466,7 +466,6 @@ static void __init initialize_cache_info(void)
  */
 void __init setup_system(void)
 {
-   DBG(" -> setup_system()\n");
 
/* Apply the CPUs-specific and firmware specific fixups to kernel
 * text (nop out sections not relevant to this CPU or this firmware)
@@ -584,7 +583,6 @@ void __init setup_system(void)
   (unsigned long long)PHYSICAL_START);
pr_info("-\n");
 
-   DBG(" <- setup_system()\n");
 }
 
 /* This returns the limit below which memory accesses to the linear
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled

2016-06-18 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> On Fri, 2016-06-17 at 08:33 +1000, Benjamin Herrenschmidt wrote:
>> On Thu, 2016-06-16 at 22:49 +0300, Denis Kirjanov wrote:
>> > -
>> > +BEGIN_MMU_FTR_SECTION
>> > +   b   2f
>> > +END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX)
>> > andi.   r10,r12,MSR_RI  /* check for unrecoverable exception
>> > */
>> > beq-2f
>> 
>> Are we taking an SLB miss before we do the fixup maybe ?
>
> Yeah that's the only explanation that makes any sense.
>
> I think instead of patching down this low we should instead be redirecting SLB
> misses to unknown_exception() when radix is enabled. Aneesh?
>

The 2f branch ends up doing unrecoverable exception. Or are you
suggesting something else ?

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc/mm: Don't do debug print before we do feature fixup

2016-06-18 Thread Aneesh Kumar K.V
We use feature fixup in segment and page fault path and hence we should
not call any function that can cause either of these before we finish
feature fixup.

Calling into early debug routine can result in segment fault as updated
in 
https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com

Avoid the segment fault by removing the debug print. Also remove the
matching closing debug print at the end of the function.

Even though the problem existed for a long time, this became more apparent
after commit caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly
isolate hash related code"). Before that commit we used to use feature
fixup to finalize code path related to 1T and 256MB segment.

Cc: Denis Kirjanov 
Signed-off-by: Aneesh Kumar K.V 
---
Changes from V1:
* update commit message
* update Cc: list with correct email address

 arch/powerpc/kernel/setup_64.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 8222950f820f..da98f532d160 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -466,7 +466,6 @@ static void __init initialize_cache_info(void)
  */
 void __init setup_system(void)
 {
-   DBG(" -> setup_system()\n");
 
/* Apply the CPUs-specific and firmware specific fixups to kernel
 * text (nop out sections not relevant to this CPU or this firmware)
@@ -584,7 +583,6 @@ void __init setup_system(void)
   (unsigned long long)PHYSICAL_START);
pr_info("-\n");
 
-   DBG(" <- setup_system()\n");
 }
 
 /* This returns the limit below which memory accesses to the linear
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2] powerpc/mm: Don't do debug print before we do feature fixup

2016-06-18 Thread Benjamin Herrenschmidt
On Sat, 2016-06-18 at 23:57 +0530, Aneesh Kumar K.V wrote:
> We use feature fixup in segment and page fault path and hence we should
> not call any function that can cause either of these before we finish
> feature fixup.
> 
> Calling into early debug routine can result in segment fault as updated
> in 
> https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com
> 
> Avoid the segment fault by removing the debug print. Also remove the
> matching closing debug print at the end of the function.

Please add a comment at the beginning of setup_system() explaining
the situation. IE, that until the fixups are done, not even debug
printfs can be used.

It used to work though, the fact that we used 256M segment in that
case wasn't an issue, in part because btext only really existed on
machines that didn't have 1T segments. It was only ever going to be
a segment fault, the rest was bolted (early ioremap).

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev