[tip:x86/mm] kexec: Allocate decrypted control pages for kdump if SME is enabled
Commit-ID: 9cf38d5559e813cccdba8b44c82cc46ba48d0896 Gitweb: https://git.kernel.org/tip/9cf38d5559e813cccdba8b44c82cc46ba48d0896 Author: Lianbo Jiang AuthorDate: Sun, 30 Sep 2018 11:10:31 +0800 Committer: Borislav Petkov CommitDate: Sat, 6 Oct 2018 12:01:51 +0200 kexec: Allocate decrypted control pages for kdump if SME is enabled When SME is enabled in the first kernel, it needs to allocate decrypted pages for kdump because when the kdump kernel boots, these pages need to be accessed decrypted in the initial boot stage, before SME is enabled. [ bp: clean up text. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Cc: ke...@lists.infradead.org Cc: t...@linutronix.de Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@linux-foundation.org Cc: dan.j.willi...@intel.com Cc: bhelg...@google.com Cc: baiyao...@cmss.chinamobile.com Cc: ti...@suse.de Cc: brijesh.si...@amd.com Cc: dyo...@redhat.com Cc: b...@redhat.com Cc: jroe...@suse.de Link: https://lkml.kernel.org/r/20180930031033.22110-3-liji...@redhat.com --- kernel/kexec_core.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 23a83a4da38a..86ef06d3dbe3 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -471,6 +471,10 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image, } } + /* Ensure that these pages are decrypted if SME is enabled. */ + if (pages) + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); + return pages; } @@ -867,6 +871,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = -ENOMEM; goto out; } + arch_kexec_post_alloc_pages(page_address(page), 1, 0); ptr = kmap(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, @@ -884,6 +889,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); + arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT; goto out;
[tip:x86/mm] x86/ioremap: Add an ioremap_encrypted() helper
Commit-ID: c3a7a61c192ec350330128edb13db33a9bc0ace1 Gitweb: https://git.kernel.org/tip/c3a7a61c192ec350330128edb13db33a9bc0ace1 Author: Lianbo Jiang AuthorDate: Thu, 27 Sep 2018 15:19:51 +0800 Committer: Borislav Petkov CommitDate: Sat, 6 Oct 2018 11:57:51 +0200 x86/ioremap: Add an ioremap_encrypted() helper When SME is enabled, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in the kdump kernel, and we have to remap the old memory with the memory encryption mask. The case of concern here is if SME is active in the first kernel, and it is active too in the kdump kernel. There are four cases to be considered: a. dump vmcore It is encrypted in the first kernel, and needs be read out in the kdump kernel. b. crash notes When dumping vmcore, the people usually need to read useful information from notes, and the notes is also encrypted. c. iommu device table It's encrypted in the first kernel, kdump kernel needs to access its content to analyze and get information it needs. d. mmio of AMD iommu not encrypted in both kernels Add a new bool parameter @encrypted to __ioremap_caller(). If set, memory will be remapped with the SME mask. Add a new function ioremap_encrypted() to explicitly pass in a true value for @encrypted. Use ioremap_encrypted() for the above a, b, c cases. [ bp: cleanup commit message, extern defs in io.h and drop forgotten include. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Cc: ke...@lists.infradead.org Cc: t...@linutronix.de Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@linux-foundation.org Cc: dan.j.willi...@intel.com Cc: bhelg...@google.com Cc: baiyao...@cmss.chinamobile.com Cc: ti...@suse.de Cc: brijesh.si...@amd.com Cc: dyo...@redhat.com Cc: b...@redhat.com Cc: jroe...@suse.de Link: https://lkml.kernel.org/r/20180927071954.29615-2-liji...@redhat.com --- arch/x86/include/asm/io.h | 3 ++- arch/x86/mm/ioremap.c | 24 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6de64840dd22..6df53efcecfd 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -187,11 +187,12 @@ extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size) #define ioremap_nocache ioremap_nocache extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size); #define ioremap_uc ioremap_uc - extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); #define ioremap_cache ioremap_cache extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, unsigned long prot_val); #define ioremap_prot ioremap_prot +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size); +#define ioremap_encrypted ioremap_encrypted /** * ioremap - map bus memory into CPU space diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index c63a545ec199..24e0920a9b25 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -131,7 +131,8 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, * caller shouldn't need to know that small detail. */ static void __iomem *__ioremap_caller(resource_size_t phys_addr, - unsigned long size, enum page_cache_mode pcm, void *caller) + unsigned long size, enum page_cache_mode pcm, + void *caller, bool encrypted) { unsigned long offset, vaddr; resource_size_t last_addr; @@ -199,7 +200,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * resulting mapping. */ prot = PAGE_KERNEL_IO; - if (sev_active() && mem_flags.desc_other) + if ((sev_active() && mem_flags.desc_other) || encrypted) prot = pgprot_encrypted(prot); switch (pcm) { @@ -291,7 +292,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_nocache); @@ -324,7 +325,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL_GPL(ioremap_uc); @@ -341,7 +342,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc); void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC, - __builtin_return_address(0)); +
[tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled
Commit-ID: 992b649a3f013465d8128da02e5449def662a4c3 Gitweb: https://git.kernel.org/tip/992b649a3f013465d8128da02e5449def662a4c3 Author: Lianbo Jiang AuthorDate: Sun, 30 Sep 2018 16:37:41 +0800 Committer: Borislav Petkov CommitDate: Sat, 6 Oct 2018 12:09:26 +0200 kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled In the kdump kernel, the memory of the first kernel needs to be dumped into the vmcore file. If SME is enabled in the first kernel, the old memory has to be remapped with the memory encryption mask in order to access it properly. Split copy_oldmem_page() functionality to handle encrypted memory properly. [ bp: Heavily massage everything. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: ke...@lists.infradead.org Cc: t...@linutronix.de Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@linux-foundation.org Cc: dan.j.willi...@intel.com Cc: bhelg...@google.com Cc: baiyao...@cmss.chinamobile.com Cc: ti...@suse.de Cc: brijesh.si...@amd.com Cc: dyo...@redhat.com Cc: b...@redhat.com Cc: jroe...@suse.de Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b...@redhat.com --- arch/x86/kernel/crash_dump_64.c | 60 - fs/proc/vmcore.c| 24 - include/linux/crash_dump.h | 4 +++ 3 files changed, 63 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c index 4f2e0778feac..eb8ab3915268 100644 --- a/arch/x86/kernel/crash_dump_64.c +++ b/arch/x86/kernel/crash_dump_64.c @@ -11,40 +11,62 @@ #include #include -/** - * copy_oldmem_page - copy one page from "oldmem" - * @pfn: page frame number to be copied - * @buf: target memory address for the copy; this can be in kernel address - * space or user address space (see @userbuf) - * @csize: number of bytes to copy - * @offset: offset in bytes into the page (based on pfn) to begin the copy - * @userbuf: if set, @buf is in user address space, use copy_to_user(), - * otherwise @buf is in kernel address space, use memcpy(). - * - * Copy a page from "oldmem". For this page, there is no pte mapped - * in the current kernel. We stitch up a pte, similar to kmap_atomic. - */ -ssize_t copy_oldmem_page(unsigned long pfn, char *buf, - size_t csize, unsigned long offset, int userbuf) +static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, + unsigned long offset, int userbuf, + bool encrypted) { void *vaddr; if (!csize) return 0; - vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE); + if (encrypted) + vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT, PAGE_SIZE); + else + vaddr = (__force void *)ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE); + if (!vaddr) return -ENOMEM; if (userbuf) { - if (copy_to_user(buf, vaddr + offset, csize)) { - iounmap(vaddr); + if (copy_to_user((void __user *)buf, vaddr + offset, csize)) { + iounmap((void __iomem *)vaddr); return -EFAULT; } } else memcpy(buf, vaddr + offset, csize); set_iounmap_nonlazy(); - iounmap(vaddr); + iounmap((void __iomem *)vaddr); return csize; } + +/** + * copy_oldmem_page - copy one page of memory + * @pfn: page frame number to be copied + * @buf: target memory address for the copy; this can be in kernel address + * space or user address space (see @userbuf) + * @csize: number of bytes to copy + * @offset: offset in bytes into the page (based on pfn) to begin the copy + * @userbuf: if set, @buf is in user address space, use copy_to_user(), + * otherwise @buf is in kernel address space, use memcpy(). + * + * Copy a page from the old kernel's memory. For this page, there is no pte + * mapped in the current kernel. We stitch up a pte, similar to kmap_atomic. + */ +ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, +unsigned long offset, int userbuf) +{ + return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false); +} + +/** + * copy_oldmem_page_encrypted - same as copy_oldmem_page() above but ioremap the + * memory with the encryption mask set to accomodate kdump on SME-enabled + * machines. + */ +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize, + unsigned long offset, int userbuf) +{ + return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true); +} diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index cbde728f8ac6..42c32d06f7da 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include #include "internal.h" @@ -98,7 +1
[tip:x86/mm] iommu/amd: Remap the IOMMU device table with the memory encryption mask for kdump
Commit-ID: 8780158cf977ea5f9912931a30b3d575b36dba22 Gitweb: https://git.kernel.org/tip/8780158cf977ea5f9912931a30b3d575b36dba22 Author: Lianbo Jiang AuthorDate: Sun, 30 Sep 2018 11:10:32 +0800 Committer: Borislav Petkov CommitDate: Sat, 6 Oct 2018 12:08:24 +0200 iommu/amd: Remap the IOMMU device table with the memory encryption mask for kdump The kdump kernel copies the IOMMU device table from the old device table which is encrypted when SME is enabled in the first kernel. So remap the old device table with the memory encryption mask in the kdump kernel. [ bp: Massage commit message. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Acked-by: Joerg Roedel Cc: ke...@lists.infradead.org Cc: t...@linutronix.de Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@linux-foundation.org Cc: dan.j.willi...@intel.com Cc: bhelg...@google.com Cc: baiyao...@cmss.chinamobile.com Cc: ti...@suse.de Cc: brijesh.si...@amd.com Cc: dyo...@redhat.com Cc: b...@redhat.com Link: https://lkml.kernel.org/r/20180930031033.22110-4-liji...@redhat.com --- drivers/iommu/amd_iommu_init.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 84b3e4445d46..3931c7de7c69 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -902,12 +902,22 @@ static bool copy_device_table(void) } } - old_devtb_phys = entry & PAGE_MASK; + /* +* When SME is enabled in the first kernel, the entry includes the +* memory encryption mask(sme_me_mask), we must remove the memory +* encryption mask to obtain the true physical address in kdump kernel. +*/ + old_devtb_phys = __sme_clr(entry) & PAGE_MASK; + if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (sme_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + if (!old_devtb) return false;
[tip:x86/kdump] kdump: Document kernel data exported in the vmcoreinfo note
Commit-ID: f263245a0ce2c4e23b89a58fa5f7dfc048e11929 Gitweb: https://git.kernel.org/tip/f263245a0ce2c4e23b89a58fa5f7dfc048e11929 Author: Lianbo Jiang AuthorDate: Thu, 10 Jan 2019 20:19:43 +0800 Committer: Borislav Petkov CommitDate: Tue, 15 Jan 2019 11:05:28 +0100 kdump: Document kernel data exported in the vmcoreinfo note Document data exported in vmcoreinfo and briefly describe its use by userspace tools. [ bp: heavily massage and redact the text. ] Suggested-by: Borislav Petkov Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Baoquan He Cc: Dave Young Cc: Jonathan Corbet Cc: Thomas Gleixner Cc: Vivek Goyal Cc: ander...@redhat.com Cc: k-ha...@ab.jp.nec.com Cc: ke...@lists.infradead.org Cc: linux-...@vger.kernel.org Cc: mi...@redhat.com Cc: x86-ml Link: https://lkml.kernel.org/r/20190110121944.6050-2-liji...@redhat.com --- Documentation/kdump/vmcoreinfo.txt | 495 + 1 file changed, 495 insertions(+) diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt new file mode 100644 index ..bb94a4bd597a --- /dev/null +++ b/Documentation/kdump/vmcoreinfo.txt @@ -0,0 +1,495 @@ + + VMCOREINFO + + +=== +What is it? +=== + +VMCOREINFO is a special ELF note section. It contains various +information from the kernel like structure size, page size, symbol +values, field offsets, etc. These data are packed into an ELF note +section and used by user-space tools like crash and makedumpfile to +analyze a kernel's memory layout. + + +Common variables + + +init_uts_ns.name.release + + +The version of the Linux kernel. Used to find the corresponding source +code from which the kernel has been built. For example, crash uses it to +find the corresponding vmlinux in order to process vmcore. + +PAGE_SIZE +- + +The size of a page. It is the smallest unit of data used by the memory +management facilities. It is usually 4096 bytes of size and a page is +aligned on 4096 bytes. Used for computing page addresses. + +init_uts_ns +--- + +The UTS namespace which is used to isolate two specific elements of the +system that relate to the uname(2) system call. It is named after the +data structure used to store information returned by the uname(2) system +call. + +User-space tools can get the kernel name, host name, kernel release +number, kernel version, architecture name and OS type from it. + +node_online_map +--- + +An array node_states[N_ONLINE] which represents the set of online nodes +in a system, one bit position per node number. Used to keep track of +which nodes are in the system and online. + +swapper_pg_dir +- + +The global page directory pointer of the kernel. Used to translate +virtual to physical addresses. + +_stext +-- + +Defines the beginning of the text section. In general, _stext indicates +the kernel start address. Used to convert a virtual address from the +direct kernel map to a physical address. + +vmap_area_list +-- + +Stores the virtual area list. makedumpfile gets the vmalloc start value +from this variable and its value is necessary for vmalloc translation. + +mem_map +--- + +Physical addresses are translated to struct pages by treating them as +an index into the mem_map array. Right-shifting a physical address +PAGE_SHIFT bits converts it into a page frame number which is an index +into that mem_map array. + +Used to map an address to the corresponding struct page. + +contig_page_data + + +Makedumpfile gets the pglist_data structure from this symbol, which is +used to describe the memory layout. + +User-space tools use this to exclude free pages when dumping memory. + +mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map) +-- + +The address of the mem_section array, its length, structure size, and +the section_mem_map offset. + +It exists in the sparse memory mapping model, and it is also somewhat +similar to the mem_map variable, both of them are used to translate an +address. + +page + + +The size of a page structure. struct page is an important data structure +and it is widely used to compute contiguous memory. + +pglist_data +--- + +The size of a pglist_data structure. This value is used to check if the +pglist_data structure is valid. It is also used for checking the memory +type. + +zone + + +The size of a zone structure. This value is used to check if the zone +structure has been found. It is also used for excluding free pages. + +free_area +- + +The size of a free_area structure. It indicates whether the free_area +structure is valid or not. Useful when excluding fr
[tip:x86/kdump] x86/kdump: Export the SME mask to vmcoreinfo
Commit-ID: 65f750e5457aef9a8085a99d613fea0430303e93 Gitweb: https://git.kernel.org/tip/65f750e5457aef9a8085a99d613fea0430303e93 Author: Lianbo Jiang AuthorDate: Thu, 10 Jan 2019 20:19:44 +0800 Committer: Borislav Petkov CommitDate: Fri, 11 Jan 2019 16:09:25 +0100 x86/kdump: Export the SME mask to vmcoreinfo On AMD SME machines, makedumpfile tools need to know whether the crashed kernel was encrypted. If SME is enabled in the first kernel, the crashed kernel's page table entries (pgd/pud/pmd/pte) contain the memory encryption mask which makedumpfile needs to remove in order to obtain the true physical address. Export that mask in a vmcoreinfo variable. [ bp: Massage commit message and move define at the end of the function. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Andrew Morton Cc: Baoquan He Cc: Dave Young Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Tom Lendacky Cc: ander...@redhat.com Cc: k-ha...@ab.jp.nec.com Cc: ke...@lists.infradead.org Cc: linux-...@vger.kernel.org Cc: x86-ml Link: https://lkml.kernel.org/r/20190110121944.6050-3-liji...@redhat.com --- arch/x86/kernel/machine_kexec_64.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 4c8acdfdc5a7..ceba408ea982 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -352,6 +352,8 @@ void machine_kexec(struct kimage *image) void arch_crash_save_vmcoreinfo(void) { + u64 sme_mask = sme_me_mask; + VMCOREINFO_NUMBER(phys_base); VMCOREINFO_SYMBOL(init_top_pgt); vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", @@ -364,6 +366,7 @@ void arch_crash_save_vmcoreinfo(void) vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset()); VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE); + VMCOREINFO_NUMBER(sme_mask); } /* arch-dependent functionality related to kexec file-based syscall */
[tip:x86/kdump] x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED
Commit-ID: ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4 Gitweb: https://git.kernel.org/tip/ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4 Author: Lianbo Jiang AuthorDate: Tue, 23 Apr 2019 09:30:05 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 09:54:31 +0200 x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED When executing the kexec_file_load() syscall, the first kernel needs to pass the e820 reserved ranges to the second kernel because some devices (PCI, for example) need them present in the kdump kernel for proper initialization. But the kernel can not exactly match the e820 reserved ranges when walking through the iomem resources using the default IORES_DESC_NONE descriptor, because there are several types of e820 ranges which are marked IORES_DESC_NONE, see e820_type_to_iores_desc(). Therefore, add a new I/O resource descriptor called IORES_DESC_RESERVED to mark exactly those ranges. It will be used to match the reserved resource ranges when walking through iomem resources. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Andy Lutomirski Cc: b...@redhat.com Cc: dave.han...@linux.intel.com Cc: dyo...@redhat.com Cc: "H. Peter Anvin" Cc: Huang Zijiang Cc: Ingo Molnar Cc: Joe Perches Cc: Juergen Gross Cc: ke...@lists.infradead.org Cc: Masayoshi Mizuma Cc: Michal Hocko Cc: Mike Rapoport Cc: Naoya Horiguchi Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190423013007.17838-2-liji...@redhat.com --- arch/x86/kernel/e820.c | 2 +- include/linux/ioport.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 8f32e705a980..e69408bf664b 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1063,10 +1063,10 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry) case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE; case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY; case E820_TYPE_PRAM:return IORES_DESC_PERSISTENT_MEMORY_LEGACY; + case E820_TYPE_RESERVED:return IORES_DESC_RESERVED; case E820_TYPE_RESERVED_KERN: /* Fall-through: */ case E820_TYPE_RAM: /* Fall-through: */ case E820_TYPE_UNUSABLE:/* Fall-through: */ - case E820_TYPE_RESERVED:/* Fall-through: */ default:return IORES_DESC_NONE; } } diff --git a/include/linux/ioport.h b/include/linux/ioport.h index da0ebaec25f0..6ed59de48bd5 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -133,6 +133,7 @@ enum { IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY= 6, IORES_DESC_DEVICE_PUBLIC_MEMORY = 7, + IORES_DESC_RESERVED = 8, }; /* helpers to define resources */
[tip:x86/kdump] x86/mm: Rework ioremap resource mapping determination
Commit-ID: 5da04cc86d1215fd9fe0e5c88ead6e8428a75e56 Gitweb: https://git.kernel.org/tip/5da04cc86d1215fd9fe0e5c88ead6e8428a75e56 Author: Lianbo Jiang AuthorDate: Tue, 23 Apr 2019 09:30:06 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 09:58:07 +0200 x86/mm: Rework ioremap resource mapping determination On ioremap(), __ioremap_check_mem() does a couple of checks on the supplied memory range to determine how the range should be mapped and in particular what protection flags should be used. Generalize the procedure by introducing IORES_MAP_* flags which control different aspects of the ioremapping and use them in the respective helpers which determine which descriptor flags should be set per range. [ bp: - Rewrite commit message. - Add/improve comments. - Reflow __ioremap_caller()'s args. - s/__ioremap_check_desc/__ioremap_check_encrypted/g; - s/__ioremap_res_check/__ioremap_collect_map_flags/g; - clarify __ioremap_check_ram()'s purpose. ] Signed-off-by: Lianbo Jiang Co-developed-by: Borislav Petkov Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Andy Lutomirski Cc: b...@redhat.com Cc: Dave Hansen Cc: dyo...@redhat.com Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: ke...@lists.infradead.org Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190423013007.17838-3-liji...@redhat.com --- arch/x86/mm/ioremap.c | 71 -- include/linux/ioport.h | 9 +++ 2 files changed, 54 insertions(+), 26 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 4b6423e7bd21..e500f1df1140 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -28,9 +28,11 @@ #include "physaddr.h" -struct ioremap_mem_flags { - bool system_ram; - bool desc_other; +/* + * Descriptor controlling ioremap() behavior. + */ +struct ioremap_desc { + unsigned int flags; }; /* @@ -62,13 +64,14 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, return err; } -static bool __ioremap_check_ram(struct resource *res) +/* Does the range (or a subset of) contain normal RAM? */ +static unsigned int __ioremap_check_ram(struct resource *res) { unsigned long start_pfn, stop_pfn; unsigned long i; if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM) - return false; + return 0; start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT; stop_pfn = (res->end + 1) >> PAGE_SHIFT; @@ -76,28 +79,44 @@ static bool __ioremap_check_ram(struct resource *res) for (i = 0; i < (stop_pfn - start_pfn); ++i) if (pfn_valid(start_pfn + i) && !PageReserved(pfn_to_page(start_pfn + i))) - return true; + return IORES_MAP_SYSTEM_RAM; } - return false; + return 0; } -static int __ioremap_check_desc_other(struct resource *res) +/* + * In a SEV guest, NONE and RESERVED should not be mapped encrypted because + * there the whole memory is already encrypted. + */ +static unsigned int __ioremap_check_encrypted(struct resource *res) { - return (res->desc != IORES_DESC_NONE); + if (!sev_active()) + return 0; + + switch (res->desc) { + case IORES_DESC_NONE: + case IORES_DESC_RESERVED: + break; + default: + return IORES_MAP_ENCRYPTED; + } + + return 0; } -static int __ioremap_res_check(struct resource *res, void *arg) +static int __ioremap_collect_map_flags(struct resource *res, void *arg) { - struct ioremap_mem_flags *flags = arg; + struct ioremap_desc *desc = arg; - if (!flags->system_ram) - flags->system_ram = __ioremap_check_ram(res); + if (!(desc->flags & IORES_MAP_SYSTEM_RAM)) + desc->flags |= __ioremap_check_ram(res); - if (!flags->desc_other) - flags->desc_other = __ioremap_check_desc_other(res); + if (!(desc->flags & IORES_MAP_ENCRYPTED)) + desc->flags |= __ioremap_check_encrypted(res); - return flags->system_ram && flags->desc_other; + return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) == + (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)); } /* @@ -106,15 +125,15 @@ static int __ioremap_res_check(struct resource *res, void *arg) * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES). */ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, - struct ioremap_mem_flags *flags) + struct ioremap_desc *desc) { u64 start, end; start = (u64)addr; end = start + size - 1; - memset(flags, 0, sizeof(*flags)); + memset(desc, 0, sizeof(stru
[tip:x86/kdump] x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
Commit-ID: 980621daf368f2b9aa69c7ea01baa654edb7577b Gitweb: https://git.kernel.org/tip/980621daf368f2b9aa69c7ea01baa654edb7577b Author: Lianbo Jiang AuthorDate: Tue, 23 Apr 2019 09:30:07 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 10:05:06 +0200 x86/crash: Add e820 reserved ranges to kdump kernel's e820 table At present, when using the kexec_file_load() syscall to load the kernel image and initramfs, for example: kexec -s -p xxx the kernel does not pass the e820 reserved ranges to the second kernel, which might cause two problems: 1. MMCONFIG: A device in PCI segment 1 cannot be discovered by the kernel PCI probing without all the e820 I/O reservations being present in the e820 table. Which is the case currently, because the kdump kernel does not have those reservations because the kexec command does not pass the I/O reservation via the "memmap=xxx" command line option. Further details courtesy of Bjorn Helgaas¹: I think you should regard correct MCFG/ECAM usage in the kdump kernel as a requirement. MMCONFIG (aka ECAM) space is described in the ACPI MCFG table. If you don't have ECAM: (a) PCI devices won't work at all on non-x86 systems that use only ECAM for config access, (b) you won't be able to access devices on non-0 segments (granted, there aren't very many of these yet, but there will be more in the future), and (c) you won't be able to access extended config space (addresses 0x100-0xfff), which means none of the Extended Capabilities will be available (AER, ACS, ATS, etc). 2. The second issue is that the SME kdump kernel doesn't work without the e820 reserved ranges. When SME is active in the kdump kernel, those reserved regions are still decrypted, but because those reserved ranges are not present at all in kdump kernel's e820 table, they are accessed as encrypted. Which is obviously wrong. [1]: https://lkml.kernel.org/r/cabhmzuuscs3juzusm5y6eyjk6weo7mjj5-eakgvbw0qee%2b3...@mail.gmail.com [ bp: Heavily massage commit message. ] Suggested-by: Dave Young Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Baoquan He Cc: Bjorn Helgaas Cc: dave.han...@linux.intel.com Cc: Dave Young Cc: "Gustavo A. R. Silva" Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: ke...@lists.infradead.org Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Cc: Yi Wang Link: https://lkml.kernel.org/r/20190423013007.17838-4-liji...@redhat.com --- arch/x86/kernel/crash.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 576b2e1bfc12..32c956705b8e 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd, memmap_entry_callback); + /* Add e820 reserved ranges */ + cmd.type = E820_TYPE_RESERVED; + flags = IORESOURCE_MEM; + walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd, + memmap_entry_callback); + /* Add crashk_low_res region */ if (crashk_low_res.end) { ei.addr = crashk_low_res.start;
[tip:x86/kdump] x86/kexec: Do not map kexec area as decrypted when SEV is active
Commit-ID: 1a79c1b8a04153c4c387518967ce851f89e22733 Gitweb: https://git.kernel.org/tip/1a79c1b8a04153c4c387518967ce851f89e22733 Author: Lianbo Jiang AuthorDate: Tue, 30 Apr 2019 15:44:19 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 10:06:46 +0200 x86/kexec: Do not map kexec area as decrypted when SEV is active When a virtual machine panics, its memory needs to be dumped for analysis. With memory encryption in the picture, special care must be taken when loading a kexec/kdump kernel in a SEV guest. A SEV guest starts and runs fully encrypted. In order to load a kexec kernel and initrd, arch_kexec_post_{alloc,free}_pages() need to not map areas as decrypted unconditionally but differentiate whether the kernel is running as a SEV guest and if so, leave kexec area encrypted. [ bp: Reduce commit message to the relevant information pertaining to this commit only. ] Co-developed-by: Brijesh Singh Signed-off-by: Brijesh Singh Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: b...@redhat.com Cc: Brijesh Singh Cc: dyo...@redhat.com Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: ke...@lists.infradead.org Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190430074421.7852-2-liji...@redhat.com --- arch/x86/kernel/machine_kexec_64.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index ceba408ea982..3b38449028e0 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -559,8 +559,20 @@ void arch_kexec_unprotect_crashkres(void) kexec_mark_crashkres(false); } +/* + * During a traditional boot under SME, SME will encrypt the kernel, + * so the SME kexec kernel also needs to be un-encrypted in order to + * replicate a normal SME boot. + * + * During a traditional boot under SEV, the kernel has already been + * loaded encrypted, so the SEV kexec kernel needs to be encrypted in + * order to replicate a normal SEV boot. + */ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) { + if (sev_active()) + return 0; + /* * If SME is active we need to be sure that kexec pages are * not encrypted because when we boot to the new kernel the @@ -571,6 +583,9 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { + if (sev_active()) + return; + /* * If SME is active we need to reset the pages back to being * an encrypted mapping before freeing them.
[tip:x86/kdump] x86/kexec: Set the C-bit in the identity map page table when SEV is active
Commit-ID: 85784d16c2cf172cf1ebaf2390d6b7c4045d659c Gitweb: https://git.kernel.org/tip/85784d16c2cf172cf1ebaf2390d6b7c4045d659c Author: Lianbo Jiang AuthorDate: Tue, 30 Apr 2019 15:44:20 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 10:07:12 +0200 x86/kexec: Set the C-bit in the identity map page table when SEV is active When SEV is active, the second kernel image is loaded into encrypted memory. For that, make sure that when kexec builds the identity mapping page table, the memory is encrypted (i.e., _PAGE_ENC is set). [ bp: Sort local args and OR in _PAGE_ENC for more clarity. ] Co-developed-by: Brijesh Singh Signed-off-by: Brijesh Singh Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: b...@redhat.com Cc: dyo...@redhat.com Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: ke...@lists.infradead.org Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190430074421.7852-3-liji...@redhat.com --- arch/x86/kernel/machine_kexec_64.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 3b38449028e0..16c37fe489bc 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -50,12 +50,13 @@ static void free_transition_pgtable(struct kimage *image) static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) { + pgprot_t prot = PAGE_KERNEL_EXEC_NOENC; + unsigned long vaddr, paddr; + int result = -ENOMEM; p4d_t *p4d; pud_t *pud; pmd_t *pmd; pte_t *pte; - unsigned long vaddr, paddr; - int result = -ENOMEM; vaddr = (unsigned long)relocate_kernel; paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); @@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE)); } pte = pte_offset_kernel(pmd, vaddr); - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC)); + + if (sev_active()) + prot = PAGE_KERNEL_EXEC; + + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); return 0; err: return result; @@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) level4p = (pgd_t *)__va(start_pgtable); clear_page(level4p); + if (sev_active()) { + info.page_flag |= _PAGE_ENC; + info.kernpg_flag |= _PAGE_ENC; + } + if (direct_gbpages) info.direct_gbpages = true;
[tip:x86/kdump] fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active
Commit-ID: 4eb5fec31e613105668a1472d5876f3d0558e5d8 Gitweb: https://git.kernel.org/tip/4eb5fec31e613105668a1472d5876f3d0558e5d8 Author: Lianbo Jiang AuthorDate: Tue, 30 Apr 2019 15:44:21 +0800 Committer: Borislav Petkov CommitDate: Thu, 20 Jun 2019 10:07:49 +0200 fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active In the kdump kernel, the memory of the first kernel gets to be dumped into a vmcore file. Similarly to SME kdump, if SEV was enabled in the first kernel, the old memory has to be remapped encrypted in order to access it properly. Commit 992b649a3f01 ("kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled") took care of the SME case but it uses sme_active() which checks for SME only. Use mem_encrypt_active() instead, which returns true when either SME or SEV is active. Unlike SME, the second kernel images (kernel and initrd) are loaded into encrypted memory when SEV is active, hence the kernel elf header must be remapped as encrypted in order to access it properly. [ bp: Massage commit message. ] Co-developed-by: Brijesh Singh Signed-off-by: Brijesh Singh Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: Alexey Dobriyan Cc: Andrew Morton Cc: Arnd Bergmann Cc: b...@redhat.com Cc: dyo...@redhat.com Cc: Ganesh Goudar Cc: H. Peter Anvin Cc: ke...@lists.infradead.org Cc: linux-fsde...@vger.kernel.org Cc: Matthew Wilcox Cc: Mike Rapoport Cc: mi...@redhat.com Cc: Rahul Lakkireddy Cc: Souptick Joarder Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/20190430074421.7852-4-liji...@redhat.com --- fs/proc/vmcore.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 7bb96fdd38ad..57957c91c6df 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -166,7 +166,7 @@ void __weak elfcorehdr_free(unsigned long long addr) */ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos) { - return read_from_oldmem(buf, count, ppos, 0, false); + return read_from_oldmem(buf, count, ppos, 0, sev_active()); } /* @@ -174,7 +174,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos) */ ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos) { - return read_from_oldmem(buf, count, ppos, 0, sme_active()); + return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active()); } /* @@ -374,7 +374,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos, buflen); start = m->paddr + *fpos - m->offset; tmp = read_from_oldmem(buffer, tsz, &start, - userbuf, sme_active()); + userbuf, mem_encrypt_active()); if (tmp < 0) return tmp; buflen -= tsz;