[tip:x86/mm] kexec: Allocate decrypted control pages for kdump if SME is enabled

2018-10-06 Thread tip-bot for Lianbo Jiang
Commit-ID:  9cf38d5559e813cccdba8b44c82cc46ba48d0896
Gitweb: https://git.kernel.org/tip/9cf38d5559e813cccdba8b44c82cc46ba48d0896
Author: Lianbo Jiang 
AuthorDate: Sun, 30 Sep 2018 11:10:31 +0800
Committer:  Borislav Petkov 
CommitDate: Sat, 6 Oct 2018 12:01:51 +0200

kexec: Allocate decrypted control pages for kdump if SME is enabled

When SME is enabled in the first kernel, it needs to allocate decrypted
pages for kdump because when the kdump kernel boots, these pages need to
be accessed decrypted in the initial boot stage, before SME is enabled.

 [ bp: clean up text. ]

Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Cc: ke...@lists.infradead.org
Cc: t...@linutronix.de
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@linux-foundation.org
Cc: dan.j.willi...@intel.com
Cc: bhelg...@google.com
Cc: baiyao...@cmss.chinamobile.com
Cc: ti...@suse.de
Cc: brijesh.si...@amd.com
Cc: dyo...@redhat.com
Cc: b...@redhat.com
Cc: jroe...@suse.de
Link: https://lkml.kernel.org/r/20180930031033.22110-3-liji...@redhat.com
---
 kernel/kexec_core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..86ef06d3dbe3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,10 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   /* Ensure that these pages are decrypted if SME is enabled. */
+   if (pages)
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+
return pages;
 }
 
@@ -867,6 +871,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +889,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;


[tip:x86/mm] x86/ioremap: Add an ioremap_encrypted() helper

2018-10-06 Thread tip-bot for Lianbo Jiang
Commit-ID:  c3a7a61c192ec350330128edb13db33a9bc0ace1
Gitweb: https://git.kernel.org/tip/c3a7a61c192ec350330128edb13db33a9bc0ace1
Author: Lianbo Jiang 
AuthorDate: Thu, 27 Sep 2018 15:19:51 +0800
Committer:  Borislav Petkov 
CommitDate: Sat, 6 Oct 2018 11:57:51 +0200

x86/ioremap: Add an ioremap_encrypted() helper

When SME is enabled, the memory is encrypted in the first kernel. In
this case, SME also needs to be enabled in the kdump kernel, and we have
to remap the old memory with the memory encryption mask.

The case of concern here is if SME is active in the first kernel,
and it is active too in the kdump kernel. There are four cases to be
considered:

a. dump vmcore
   It is encrypted in the first kernel, and needs be read out in the
   kdump kernel.

b. crash notes
   When dumping vmcore, the people usually need to read useful
   information from notes, and the notes is also encrypted.

c. iommu device table
   It's encrypted in the first kernel, kdump kernel needs to access its
   content to analyze and get information it needs.

d. mmio of AMD iommu
   not encrypted in both kernels

Add a new bool parameter @encrypted to __ioremap_caller(). If set,
memory will be remapped with the SME mask.

Add a new function ioremap_encrypted() to explicitly pass in a true
value for @encrypted. Use ioremap_encrypted() for the above a, b, c
cases.

 [ bp: cleanup commit message, extern defs in io.h and drop forgotten
   include. ]

Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Cc: ke...@lists.infradead.org
Cc: t...@linutronix.de
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@linux-foundation.org
Cc: dan.j.willi...@intel.com
Cc: bhelg...@google.com
Cc: baiyao...@cmss.chinamobile.com
Cc: ti...@suse.de
Cc: brijesh.si...@amd.com
Cc: dyo...@redhat.com
Cc: b...@redhat.com
Cc: jroe...@suse.de
Link: https://lkml.kernel.org/r/20180927071954.29615-2-liji...@redhat.com
---
 arch/x86/include/asm/io.h |  3 ++-
 arch/x86/mm/ioremap.c | 24 
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..6df53efcecfd 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -187,11 +187,12 @@ extern void __iomem *ioremap_nocache(resource_size_t 
offset, unsigned long size)
 #define ioremap_nocache ioremap_nocache
 extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
 #define ioremap_uc ioremap_uc
-
 extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned 
long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..24e0920a9b25 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -131,7 +131,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +200,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +292,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +325,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +342,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   

[tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-06 Thread tip-bot for Lianbo Jiang
Commit-ID:  992b649a3f013465d8128da02e5449def662a4c3
Gitweb: https://git.kernel.org/tip/992b649a3f013465d8128da02e5449def662a4c3
Author: Lianbo Jiang 
AuthorDate: Sun, 30 Sep 2018 16:37:41 +0800
Committer:  Borislav Petkov 
CommitDate: Sat, 6 Oct 2018 12:09:26 +0200

kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

In the kdump kernel, the memory of the first kernel needs to be dumped
into the vmcore file.

If SME is enabled in the first kernel, the old memory has to be remapped
with the memory encryption mask in order to access it properly.

Split copy_oldmem_page() functionality to handle encrypted memory
properly.

 [ bp: Heavily massage everything. ]

Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: ke...@lists.infradead.org
Cc: t...@linutronix.de
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@linux-foundation.org
Cc: dan.j.willi...@intel.com
Cc: bhelg...@google.com
Cc: baiyao...@cmss.chinamobile.com
Cc: ti...@suse.de
Cc: brijesh.si...@amd.com
Cc: dyo...@redhat.com
Cc: b...@redhat.com
Cc: jroe...@suse.de
Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b...@redhat.com
---
 arch/x86/kernel/crash_dump_64.c | 60 -
 fs/proc/vmcore.c| 24 -
 include/linux/crash_dump.h  |  4 +++
 3 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 4f2e0778feac..eb8ab3915268 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -11,40 +11,62 @@
 #include 
 #include 
 
-/**
- * copy_oldmem_page - copy one page from "oldmem"
- * @pfn: page frame number to be copied
- * @buf: target memory address for the copy; this can be in kernel address
- * space or user address space (see @userbuf)
- * @csize: number of bytes to copy
- * @offset: offset in bytes into the page (based on pfn) to begin the copy
- * @userbuf: if set, @buf is in user address space, use copy_to_user(),
- * otherwise @buf is in kernel address space, use memcpy().
- *
- * Copy a page from "oldmem". For this page, there is no pte mapped
- * in the current kernel. We stitch up a pte, similar to kmap_atomic.
- */
-ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
-   size_t csize, unsigned long offset, int userbuf)
+static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+ unsigned long offset, int userbuf,
+ bool encrypted)
 {
void  *vaddr;
 
if (!csize)
return 0;
 
-   vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (encrypted)
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT, 
PAGE_SIZE);
+   else
+   vaddr = (__force void *)ioremap_cache(pfn << PAGE_SHIFT, 
PAGE_SIZE);
+
if (!vaddr)
return -ENOMEM;
 
if (userbuf) {
-   if (copy_to_user(buf, vaddr + offset, csize)) {
-   iounmap(vaddr);
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
return -EFAULT;
}
} else
memcpy(buf, vaddr + offset, csize);
 
set_iounmap_nonlazy();
-   iounmap(vaddr);
+   iounmap((void __iomem *)vaddr);
return csize;
 }
+
+/**
+ * copy_oldmem_page - copy one page of memory
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from the old kernel's memory. For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to kmap_atomic.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+unsigned long offset, int userbuf)
+{
+   return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false);
+}
+
+/**
+ * copy_oldmem_page_encrypted - same as copy_oldmem_page() above but ioremap 
the
+ * memory with the encryption mask set to accomodate kdump on SME-enabled
+ * machines.
+ */
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
+  unsigned long offset, int userbuf)
+{
+   return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..42c32d06f7da 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include "internal.h"
 
@@ -98,7 +1

[tip:x86/mm] iommu/amd: Remap the IOMMU device table with the memory encryption mask for kdump

2018-10-06 Thread tip-bot for Lianbo Jiang
Commit-ID:  8780158cf977ea5f9912931a30b3d575b36dba22
Gitweb: https://git.kernel.org/tip/8780158cf977ea5f9912931a30b3d575b36dba22
Author: Lianbo Jiang 
AuthorDate: Sun, 30 Sep 2018 11:10:32 +0800
Committer:  Borislav Petkov 
CommitDate: Sat, 6 Oct 2018 12:08:24 +0200

iommu/amd: Remap the IOMMU device table with the memory encryption mask for 
kdump

The kdump kernel copies the IOMMU device table from the old device table
which is encrypted when SME is enabled in the first kernel. So remap the
old device table with the memory encryption mask in the kdump kernel.

 [ bp: Massage commit message. ]

Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Acked-by: Joerg Roedel 
Cc: ke...@lists.infradead.org
Cc: t...@linutronix.de
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@linux-foundation.org
Cc: dan.j.willi...@intel.com
Cc: bhelg...@google.com
Cc: baiyao...@cmss.chinamobile.com
Cc: ti...@suse.de
Cc: brijesh.si...@amd.com
Cc: dyo...@redhat.com
Cc: b...@redhat.com
Link: https://lkml.kernel.org/r/20180930031033.22110-4-liji...@redhat.com
---
 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 84b3e4445d46..3931c7de7c69 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -902,12 +902,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump kernel.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 


[tip:x86/kdump] kdump: Document kernel data exported in the vmcoreinfo note

2019-01-15 Thread tip-bot for Lianbo Jiang
Commit-ID:  f263245a0ce2c4e23b89a58fa5f7dfc048e11929
Gitweb: https://git.kernel.org/tip/f263245a0ce2c4e23b89a58fa5f7dfc048e11929
Author: Lianbo Jiang 
AuthorDate: Thu, 10 Jan 2019 20:19:43 +0800
Committer:  Borislav Petkov 
CommitDate: Tue, 15 Jan 2019 11:05:28 +0100

kdump: Document kernel data exported in the vmcoreinfo note

Document data exported in vmcoreinfo and briefly describe its use by
userspace tools.

 [ bp: heavily massage and redact the text. ]

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Baoquan He 
Cc: Dave Young 
Cc: Jonathan Corbet 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: ander...@redhat.com
Cc: k-ha...@ab.jp.nec.com
Cc: ke...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: mi...@redhat.com
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190110121944.6050-2-liji...@redhat.com
---
 Documentation/kdump/vmcoreinfo.txt | 495 +
 1 file changed, 495 insertions(+)

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..bb94a4bd597a
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,495 @@
+
+   VMCOREINFO
+
+
+===
+What is it?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built. For example, crash uses it to
+find the corresponding vmlinux in order to process vmcore.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data used by the memory
+management facilities. It is usually 4096 bytes of size and a page is
+aligned on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+The UTS namespace which is used to isolate two specific elements of the
+system that relate to the uname(2) system call. It is named after the
+data structure used to store information returned by the uname(2) system
+call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online nodes
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile gets the vmalloc start value
+from this variable and its value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile gets the pglist_data structure from this symbol, which is
+used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute contiguous memory.
+
+pglist_data
+---
+
+The size of a pglist_data structure. This value is used to check if the
+pglist_data structure is valid. It is also used for checking the memory
+type.
+
+zone
+
+
+The size of a zone structure. This value is used to check if the zone
+structure has been found. It is also used for excluding free pages.
+
+free_area
+-
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful when excluding fr

[tip:x86/kdump] x86/kdump: Export the SME mask to vmcoreinfo

2019-01-11 Thread tip-bot for Lianbo Jiang
Commit-ID:  65f750e5457aef9a8085a99d613fea0430303e93
Gitweb: https://git.kernel.org/tip/65f750e5457aef9a8085a99d613fea0430303e93
Author: Lianbo Jiang 
AuthorDate: Thu, 10 Jan 2019 20:19:44 +0800
Committer:  Borislav Petkov 
CommitDate: Fri, 11 Jan 2019 16:09:25 +0100

x86/kdump: Export the SME mask to vmcoreinfo

On AMD SME machines, makedumpfile tools need to know whether the crashed
kernel was encrypted.

If SME is enabled in the first kernel, the crashed kernel's page table
entries (pgd/pud/pmd/pte) contain the memory encryption mask which
makedumpfile needs to remove in order to obtain the true physical
address.

Export that mask in a vmcoreinfo variable.

 [ bp: Massage commit message and move define at the end of the
   function. ]

Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: Baoquan He 
Cc: Dave Young 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: ander...@redhat.com
Cc: k-ha...@ab.jp.nec.com
Cc: ke...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190110121944.6050-3-liji...@redhat.com
---
 arch/x86/kernel/machine_kexec_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..ceba408ea982 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,6 +352,8 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
@@ -364,6 +366,7 @@ void arch_crash_save_vmcoreinfo(void)
vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
  kaslr_offset());
VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
+   VMCOREINFO_NUMBER(sme_mask);
 }
 
 /* arch-dependent functionality related to kexec file-based syscall */


[tip:x86/kdump] x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4
Gitweb: https://git.kernel.org/tip/ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4
Author: Lianbo Jiang 
AuthorDate: Tue, 23 Apr 2019 09:30:05 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 09:54:31 +0200

x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED

When executing the kexec_file_load() syscall, the first kernel needs to
pass the e820 reserved ranges to the second kernel because some devices
(PCI, for example) need them present in the kdump kernel for proper
initialization.

But the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources using the default IORES_DESC_NONE
descriptor, because there are several types of e820 ranges which are
marked IORES_DESC_NONE, see e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor called IORES_DESC_RESERVED
to mark exactly those ranges. It will be used to match the reserved
resource ranges when walking through iomem resources.

 [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: b...@redhat.com
Cc: dave.han...@linux.intel.com
Cc: dyo...@redhat.com
Cc: "H. Peter Anvin" 
Cc: Huang Zijiang 
Cc: Ingo Molnar 
Cc: Joe Perches 
Cc: Juergen Gross 
Cc: ke...@lists.infradead.org
Cc: Masayoshi Mizuma 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Naoya Horiguchi 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190423013007.17838-2-liji...@redhat.com
---
 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 8f32e705a980..e69408bf664b 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1063,10 +1063,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */


[tip:x86/kdump] x86/mm: Rework ioremap resource mapping determination

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  5da04cc86d1215fd9fe0e5c88ead6e8428a75e56
Gitweb: https://git.kernel.org/tip/5da04cc86d1215fd9fe0e5c88ead6e8428a75e56
Author: Lianbo Jiang 
AuthorDate: Tue, 23 Apr 2019 09:30:06 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 09:58:07 +0200

x86/mm: Rework ioremap resource mapping determination

On ioremap(), __ioremap_check_mem() does a couple of checks on the
supplied memory range to determine how the range should be mapped and in
particular what protection flags should be used.

Generalize the procedure by introducing IORES_MAP_* flags which control
different aspects of the ioremapping and use them in the respective
helpers which determine which descriptor flags should be set per range.

 [ bp:
   - Rewrite commit message.
   - Add/improve comments.
   - Reflow __ioremap_caller()'s args.
   - s/__ioremap_check_desc/__ioremap_check_encrypted/g;
   - s/__ioremap_res_check/__ioremap_collect_map_flags/g;
   - clarify __ioremap_check_ram()'s purpose. ]

Signed-off-by: Lianbo Jiang 
Co-developed-by: Borislav Petkov 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: b...@redhat.com
Cc: Dave Hansen 
Cc: dyo...@redhat.com
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: ke...@lists.infradead.org
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190423013007.17838-3-liji...@redhat.com
---
 arch/x86/mm/ioremap.c  | 71 --
 include/linux/ioport.h |  9 +++
 2 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4b6423e7bd21..e500f1df1140 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -28,9 +28,11 @@
 
 #include "physaddr.h"
 
-struct ioremap_mem_flags {
-   bool system_ram;
-   bool desc_other;
+/*
+ * Descriptor controlling ioremap() behavior.
+ */
+struct ioremap_desc {
+   unsigned int flags;
 };
 
 /*
@@ -62,13 +64,14 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
-static bool __ioremap_check_ram(struct resource *res)
+/* Does the range (or a subset of) contain normal RAM? */
+static unsigned int __ioremap_check_ram(struct resource *res)
 {
unsigned long start_pfn, stop_pfn;
unsigned long i;
 
if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
-   return false;
+   return 0;
 
start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -76,28 +79,44 @@ static bool __ioremap_check_ram(struct resource *res)
for (i = 0; i < (stop_pfn - start_pfn); ++i)
if (pfn_valid(start_pfn + i) &&
!PageReserved(pfn_to_page(start_pfn + i)))
-   return true;
+   return IORES_MAP_SYSTEM_RAM;
}
 
-   return false;
+   return 0;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * In a SEV guest, NONE and RESERVED should not be mapped encrypted because
+ * there the whole memory is already encrypted.
+ */
+static unsigned int __ioremap_check_encrypted(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   if (!sev_active())
+   return 0;
+
+   switch (res->desc) {
+   case IORES_DESC_NONE:
+   case IORES_DESC_RESERVED:
+   break;
+   default:
+   return IORES_MAP_ENCRYPTED;
+   }
+
+   return 0;
 }
 
-static int __ioremap_res_check(struct resource *res, void *arg)
+static int __ioremap_collect_map_flags(struct resource *res, void *arg)
 {
-   struct ioremap_mem_flags *flags = arg;
+   struct ioremap_desc *desc = arg;
 
-   if (!flags->system_ram)
-   flags->system_ram = __ioremap_check_ram(res);
+   if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
+   desc->flags |= __ioremap_check_ram(res);
 
-   if (!flags->desc_other)
-   flags->desc_other = __ioremap_check_desc_other(res);
+   if (!(desc->flags & IORES_MAP_ENCRYPTED))
+   desc->flags |= __ioremap_check_encrypted(res);
 
-   return flags->system_ram && flags->desc_other;
+   return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
+  (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
 }
 
 /*
@@ -106,15 +125,15 @@ static int __ioremap_res_check(struct resource *res, void 
*arg)
  * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
  */
 static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
-   struct ioremap_mem_flags *flags)
+   struct ioremap_desc *desc)
 {
u64 start, end;
 
start = (u64)addr;
end = start + size - 1;
-   memset(flags, 0, sizeof(*flags));
+   memset(desc, 0, sizeof(stru

[tip:x86/kdump] x86/crash: Add e820 reserved ranges to kdump kernel's e820 table

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  980621daf368f2b9aa69c7ea01baa654edb7577b
Gitweb: https://git.kernel.org/tip/980621daf368f2b9aa69c7ea01baa654edb7577b
Author: Lianbo Jiang 
AuthorDate: Tue, 23 Apr 2019 09:30:07 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 10:05:06 +0200

x86/crash: Add e820 reserved ranges to kdump kernel's e820 table

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs, for example:

  kexec -s -p xxx

the kernel does not pass the e820 reserved ranges to the second kernel,
which might cause two problems:

 1. MMCONFIG: A device in PCI segment 1 cannot be discovered by the
kernel PCI probing without all the e820 I/O reservations being present
in the e820 table. Which is the case currently, because the kdump kernel
does not have those reservations because the kexec command does not pass
the I/O reservation via the "memmap=xxx" command line option.

Further details courtesy of Bjorn Helgaas¹: I think you should regard
correct MCFG/ECAM usage in the kdump kernel as a requirement. MMCONFIG
(aka ECAM) space is described in the ACPI MCFG table. If you don't have
ECAM:

  (a) PCI devices won't work at all on non-x86 systems that use only
   ECAM for config access,

  (b) you won't be able to access devices on non-0 segments (granted,
  there aren't very many of these yet, but there will be more in the
  future), and

  (c) you won't be able to access extended config space (addresses
  0x100-0xfff), which means none of the Extended Capabilities will be
  available (AER, ACS, ATS, etc).

 2. The second issue is that the SME kdump kernel doesn't work without
the e820 reserved ranges. When SME is active in the kdump kernel, those
reserved regions are still decrypted, but because those reserved ranges
are not present at all in kdump kernel's e820 table, they are accessed
as encrypted. Which is obviously wrong.

 [1]: 
https://lkml.kernel.org/r/cabhmzuuscs3juzusm5y6eyjk6weo7mjj5-eakgvbw0qee%2b3...@mail.gmail.com

 [ bp: Heavily massage commit message. ]

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Baoquan He 
Cc: Bjorn Helgaas 
Cc: dave.han...@linux.intel.com
Cc: Dave Young 
Cc: "Gustavo A. R. Silva" 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: ke...@lists.infradead.org
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Cc: Yi Wang 
Link: https://lkml.kernel.org/r/20190423013007.17838-4-liji...@redhat.com
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 576b2e1bfc12..32c956705b8e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;


[tip:x86/kdump] x86/kexec: Do not map kexec area as decrypted when SEV is active

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  1a79c1b8a04153c4c387518967ce851f89e22733
Gitweb: https://git.kernel.org/tip/1a79c1b8a04153c4c387518967ce851f89e22733
Author: Lianbo Jiang 
AuthorDate: Tue, 30 Apr 2019 15:44:19 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 10:06:46 +0200

x86/kexec: Do not map kexec area as decrypted when SEV is active

When a virtual machine panics, its memory needs to be dumped for
analysis. With memory encryption in the picture, special care must be
taken when loading a kexec/kdump kernel in a SEV guest.

A SEV guest starts and runs fully encrypted. In order to load a kexec
kernel and initrd, arch_kexec_post_{alloc,free}_pages() need to not map
areas as decrypted unconditionally but differentiate whether the kernel
is running as a SEV guest and if so, leave kexec area encrypted.

 [ bp: Reduce commit message to the relevant information pertaining to
   this commit only. ]

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: b...@redhat.com
Cc: Brijesh Singh 
Cc: dyo...@redhat.com
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: ke...@lists.infradead.org
Cc: "Kirill A. Shutemov" 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190430074421.7852-2-liji...@redhat.com
---
 arch/x86/kernel/machine_kexec_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..3b38449028e0 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -559,8 +559,20 @@ void arch_kexec_unprotect_crashkres(void)
kexec_mark_crashkres(false);
 }
 
+/*
+ * During a traditional boot under SME, SME will encrypt the kernel,
+ * so the SME kexec kernel also needs to be un-encrypted in order to
+ * replicate a normal SME boot.
+ *
+ * During a traditional boot under SEV, the kernel has already been
+ * loaded encrypted, so the SEV kexec kernel needs to be encrypted in
+ * order to replicate a normal SEV boot.
+ */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
+   if (sev_active())
+   return 0;
+
/*
 * If SME is active we need to be sure that kexec pages are
 * not encrypted because when we boot to the new kernel the
@@ -571,6 +583,9 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
pages, gfp_t gfp)
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
+   if (sev_active())
+   return;
+
/*
 * If SME is active we need to reset the pages back to being
 * an encrypted mapping before freeing them.


[tip:x86/kdump] x86/kexec: Set the C-bit in the identity map page table when SEV is active

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  85784d16c2cf172cf1ebaf2390d6b7c4045d659c
Gitweb: https://git.kernel.org/tip/85784d16c2cf172cf1ebaf2390d6b7c4045d659c
Author: Lianbo Jiang 
AuthorDate: Tue, 30 Apr 2019 15:44:20 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 10:07:12 +0200

x86/kexec: Set the C-bit in the identity map page table when SEV is active

When SEV is active, the second kernel image is loaded into encrypted
memory. For that, make sure that when kexec builds the identity mapping
page table, the memory is encrypted (i.e., _PAGE_ENC is set).

 [ bp: Sort local args and OR in _PAGE_ENC for more clarity. ]

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: b...@redhat.com
Cc: dyo...@redhat.com
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: ke...@lists.infradead.org
Cc: "Kirill A. Shutemov" 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190430074421.7852-3-liji...@redhat.com
---
 arch/x86/kernel/machine_kexec_64.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 3b38449028e0..16c37fe489bc 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -50,12 +50,13 @@ static void free_transition_pgtable(struct kimage *image)
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 {
+   pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
+   unsigned long vaddr, paddr;
+   int result = -ENOMEM;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   unsigned long vaddr, paddr;
-   int result = -ENOMEM;
 
vaddr = (unsigned long)relocate_kernel;
paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
@@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
+
+   if (sev_active())
+   prot = PAGE_KERNEL_EXEC;
+
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
return 0;
 err:
return result;
@@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned 
long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
+   if (sev_active()) {
+   info.page_flag   |= _PAGE_ENC;
+   info.kernpg_flag |= _PAGE_ENC;
+   }
+
if (direct_gbpages)
info.direct_gbpages = true;
 


[tip:x86/kdump] fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active

2019-06-20 Thread tip-bot for Lianbo Jiang
Commit-ID:  4eb5fec31e613105668a1472d5876f3d0558e5d8
Gitweb: https://git.kernel.org/tip/4eb5fec31e613105668a1472d5876f3d0558e5d8
Author: Lianbo Jiang 
AuthorDate: Tue, 30 Apr 2019 15:44:21 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 20 Jun 2019 10:07:49 +0200

fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active

In the kdump kernel, the memory of the first kernel gets to be dumped
into a vmcore file.

Similarly to SME kdump, if SEV was enabled in the first kernel, the old
memory has to be remapped encrypted in order to access it properly.

Commit

  992b649a3f01 ("kdump, proc/vmcore: Enable kdumping encrypted memory with SME 
enabled")

took care of the SME case but it uses sme_active() which checks for SME
only. Use mem_encrypt_active() instead, which returns true when either
SME or SEV is active.

Unlike SME, the second kernel images (kernel and initrd) are loaded into
encrypted memory when SEV is active, hence the kernel elf header must be
remapped as encrypted in order to access it properly.

 [ bp: Massage commit message. ]

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Alexey Dobriyan 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: b...@redhat.com
Cc: dyo...@redhat.com
Cc: Ganesh Goudar 
Cc: H. Peter Anvin 
Cc: ke...@lists.infradead.org
Cc: linux-fsde...@vger.kernel.org
Cc: Matthew Wilcox 
Cc: Mike Rapoport 
Cc: mi...@redhat.com
Cc: Rahul Lakkireddy 
Cc: Souptick Joarder 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190430074421.7852-4-liji...@redhat.com
---
 fs/proc/vmcore.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 7bb96fdd38ad..57957c91c6df 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -166,7 +166,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, false);
+   return read_from_oldmem(buf, count, ppos, 0, sev_active());
 }
 
 /*
@@ -174,7 +174,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 
*ppos)
  */
 ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sme_active());
+   return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active());
 }
 
 /*
@@ -374,7 +374,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
loff_t *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, &start,
-  userbuf, sme_active());
+  userbuf, mem_encrypt_active());
if (tmp < 0)
return tmp;
buflen -= tsz;