Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Clarify the semantics of pkey-0 and provide the corresponding
implementation.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on powerpc only. Could not test on x86.

cc: Thomas Gleixner <t...@linutronix.de>
cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
cc: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 History:
     v4 : (1) moved the code entirely in arch-independent location.
          (2) fixed comments -- suggested by Thomas Gliexner
     v3 : added clarification of the semantics of pkey0.
               -- suggested by Dave Hansen
     v2 : split the patch into two, one for x86 and one for powerpc
               -- suggested by Michael Ellermen

 Documentation/x86/protection-keys.txt |    8 ++++++++
 mm/mprotect.c                         |   25 ++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/protection-keys.txt 
b/Documentation/x86/protection-keys.txt
index ecb0d2d..92802c4 100644
--- a/Documentation/x86/protection-keys.txt
+++ b/Documentation/x86/protection-keys.txt
@@ -88,3 +88,11 @@ with a read():
 The kernel will send a SIGSEGV in both cases, but si_code will be set
 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
 the plain mprotect() permissions are violated.
+
+====================== pkey 0 ==================================
+
+Pkey-0 is special. It is implicitly allocated. Applications cannot allocate or
+free that key. This key is the default key that gets associated with a
+addres-space. It can be explicitly associated with any address-space.
+
+================================================================
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e3309fc..2c779fa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -430,7 +430,13 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
         * them use it here.
         */
        error = -EINVAL;
-       if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
+
+       /*
+        * pkey-0 is special. It always exists. No need to check if it is
+        * allocated. Check allocation status of all other keys. pkey=-1
+        * is not realy a key, it means; use any available key.
+        */
+       if (pkey && pkey != -1 && !mm_pkey_is_allocated(current->mm, pkey))
                goto out;
 
        vma = find_vma(current->mm, start);
@@ -549,6 +555,12 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
        if (pkey == -1)
                goto out;
 
+       if (!pkey) {
+               mm_pkey_free(current->mm, pkey);
+               printk("Internal error, cannot explicitly allocate key-0");
+               goto out;
+       }
+
        ret = arch_set_user_pkey_access(current, pkey, init_val);
        if (ret) {
                mm_pkey_free(current->mm, pkey);
@@ -564,13 +576,20 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
 {
        int ret;
 
+       /*
+        * pkey-0 is special. Userspace can never allocate or free it. It is
+        * allocated by default. It always exists.
+        */
+       if (!pkey)
+               return -EINVAL;
+
        down_write(&current->mm->mmap_sem);
        ret = mm_pkey_free(current->mm, pkey);
        up_write(&current->mm->mmap_sem);
 
        /*
-        * We could provie warnings or errors if any VMA still
-        * has the pkey set here.
+        * We could provide warnings or errors if any VMA still has the pkey
+        * set here.
         */
        return ret;
 }
-- 
1.7.1

Reply via email to