Re: [PATCH] net/ibmvnic: Fix missing { in __ibmvnic_reset
From: Michal Suchanek Date: Mon, 9 Sep 2019 22:44:51 +0200 > Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from > queue") > adds a } without corresponding { causing build break. > > Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from > queue") > Signed-off-by: Michal Suchanek Applied.
Re: [PATCH 1/2] libnvdimm/altmap: Track namespace boundaries in altmap
On Mon, Sep 9, 2019 at 11:29 PM Aneesh Kumar K.V wrote: > > With PFN_MODE_PMEM namespace, the memmap area is allocated from the device > area. Some architectures map the memmap area with large page size. On > architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. > This maps a namespace size of 16G. > > When populating memmap region with 16MB page from the device area, > make sure the allocated space is not used to map resources outside this > namespace. Such usage of device area will prevent a namespace destroy. > > Add resource end pnf in altmap and use that to check if the memmap area > allocation can map pfn outside the namespace. On ppc64 in such case we > fallback > to allocation from memory. Shouldn't this instead be comprehended by nd_pfn_init() to increase the reservation size so that it fits in the alignment? It may not always be possible to fall back to allocation from memory for extremely large pmem devices. I.e. at 64GB of memmap per 1TB of pmem there may not be enough DRAM to store the memmap.
Re: [PATCH 1/2] libnvdimm/altmap: Track namespace boundaries in altmap
"Aneesh Kumar K.V" writes: > With PFN_MODE_PMEM namespace, the memmap area is allocated from the device > area. Some architectures map the memmap area with large page size. On > architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. > This maps a namespace size of 16G. > > When populating memmap region with 16MB page from the device area, > make sure the allocated space is not used to map resources outside this > namespace. Such usage of device area will prevent a namespace destroy. > > Add resource end pnf in altmap and use that to check if the memmap area > allocation can map pfn outside the namespace. On ppc64 in such case we > fallback > to allocation from memory. > > This fix kernel crash reported below: > > [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 > devm_memremap_pages_release+0x2d8/0x2e0 > [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 > [ 133.464760] Faulting instruction address: 0xc007580c > [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] > [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > . > [ 133.464901] NIP [c007580c] vmemmap_free+0x2ac/0x3d0 > [ 133.464906] LR [c00757f8] vmemmap_free+0x298/0x3d0 > [ 133.464910] Call Trace: > [ 133.464914] [c07cbfd0f7b0] [c00757f8] vmemmap_free+0x298/0x3d0 > (unreliable) > [ 133.464921] [c07cbfd0f8d0] [c0370a44] > section_deactivate+0x1a4/0x240 > [ 133.464928] [c07cbfd0f980] [c0386270] > __remove_pages+0x3a0/0x590 > [ 133.464935] [c07cbfd0fa50] [c0074158] > arch_remove_memory+0x88/0x160 > [ 133.464942] [c07cbfd0fae0] [c03be8c0] > devm_memremap_pages_release+0x150/0x2e0 > [ 133.464949] [c07cbfd0fb70] [c0738ea0] > devm_action_release+0x30/0x50 > [ 133.464955] [c07cbfd0fb90] [c073a5a4] release_nodes+0x344/0x400 > [ 133.464961] [c07cbfd0fc40] [c073378c] > device_release_driver_internal+0x15c/0x250 > [ 133.464968] [c07cbfd0fc80] [c072fd14] unbind_store+0x104/0x110 > [ 133.464973] [c07cbfd0fcd0] [c072ee24] drv_attr_store+0x44/0x70 > [ 133.464981] [c07cbfd0fcf0] [c04a32bc] sysfs_kf_write+0x6c/0xa0 > [ 133.464987] [c07cbfd0fd10] [c04a1dfc] > kernfs_fop_write+0x17c/0x250 > [ 133.464993] [c07cbfd0fd60] [c03c348c] __vfs_write+0x3c/0x70 > [ 133.464999] [c07cbfd0fd80] [c03c75d0] vfs_write+0xd0/0x250 > > Reported-by: Sachin Sant > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/init_64.c | 17 - > drivers/nvdimm/pfn_devs.c | 2 ++ > include/linux/memremap.h | 1 + > 3 files changed, 19 insertions(+), 1 deletion(-) Tested-by: Santosh Sivaraj > > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c > index a44f6281ca3a..4e08246acd79 100644 > --- a/arch/powerpc/mm/init_64.c > +++ b/arch/powerpc/mm/init_64.c > @@ -172,6 +172,21 @@ static __meminit void vmemmap_list_populate(unsigned > long phys, > vmemmap_list = vmem_back; > } > > +static bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long > start, > + unsigned long page_size) > +{ > + unsigned long nr_pfn = page_size / sizeof(struct page); > + unsigned long start_pfn = page_to_pfn((struct page *)start); > + > + if ((start_pfn + nr_pfn) > altmap->end_pfn) > + return true; > + > + if (start_pfn < altmap->base_pfn) > + return true; > + > + return false; > +} > + > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int > node, > struct vmem_altmap *altmap) > { > @@ -194,7 +209,7 @@ int __meminit vmemmap_populate(unsigned long start, > unsigned long end, int node, >* fail due to alignment issues when using 16MB hugepages, so >* fall back to system memory if the altmap allocation fail. >*/ > - if (altmap) { > + if (altmap && !altmap_cross_boundary(altmap, start, page_size)) > { > p = altmap_alloc_block_buf(page_size, altmap); > if (!p) > pr_debug("altmap block allocation failed, > falling back to system memory"); > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 3e7b11cf1aae..a616d69c8224 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -618,9 +618,11 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, > struct dev_pagemap *pgmap) > struct nd_namespace_common *ndns = nd_pfn->ndns; > struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > resource_size_t base = nsio->res.start + start_pad; > + resource_size_t end = nsio->res.end - end_trunc; > struct vmem_altmap __altmap = { > .base_pfn = init_altmap_base(base), > .reserve = init_altmap_reserve(base),
[PATCH v8 0/8] kvmppc: Driver to manage pages of secure guest
Hi, A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. On such platforms, this driver will be used to manage the movement of guest pages between the normal memory managed by hypervisor(HV) and secure memory managed by Ultravisor(UV). Private ZONE_DEVICE memory equal to the amount of secure memory available in the platform for running secure guests is created. Whenever a page belonging to the guest becomes secure, a page from this private device memory is used to represent and track that secure page on the HV side. The movement of pages between normal and secure memory is done via migrate_vma_pages(). The reverse movement is driven via pagemap_ops.migrate_to_ram(). The page-in or page-out requests from UV will come to HV as hcalls and HV will call back into UV via uvcalls to satisfy these page requests. These patches are against hmm.git (https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm) plus Claudio Carvalho's base ultravisor enablement patches that are present in Michael Ellerman's tree (https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/ppc-kvm) These patches along with Claudio's above patches are required to run secure pseries guests on KVM. This patchset is based on hmm.git because hmm.git has migrate_vma cleanup and not-device memremap_pages patchsets that are required by this patchset. Changes in v8 = - s/kvmppc_devm/kvmppc_uvmem - Carrying Suraj's patch that defines bit positions for different rmap functions from Paul's kvm-next branch. Added KVMPPC_RMAP_UVMEM_PFN to this patch. - No need to use irqsave version of spinlock to protect pfn bitmap - mmap_sem and srcu_lock reversal in page-in/page-out so that we have uniform locking semantics in page-in, page-out, fault and reset paths. This also matches with other usages of the same two locks in powerpc code. - kvmppc_uvmem_free_memslot_pfns() needs kvm srcu read lock. - Addressed all the review feedback from Christoph and Sukadev. - Dropped kvmppc_rmap_is_devm_pfn() and introduced kvmppc_rmap_type() - Bail out early if page-in request comes for an already paged-in page - kvmppc_uvmem_pfn_lock re-arrangement - Check for failure from gfn_to_memslot in kvmppc_h_svm_page_in - Consolidate migrate_vma setup and related code into two helpers kvmppc_svm_page_in/out. - Use NUMA_NO_NODE in memremap_pages() instead of -1 - Removed externs in declarations - Ensure *rmap assignment gets cleared in the error case in kvmppc_uvmem_get_page() - A few other code cleanups v7: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-August/195631.html Anshuman Khandual (1): KVM: PPC: Ultravisor: Add PPC_UV config option Bharata B Rao (6): kvmppc: Movement of pages between normal and secure memory kvmppc: Shared pages support for secure guests kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls kvmppc: Handle memory plug/unplug to secure VM kvmppc: Radix changes for secure guest kvmppc: Support reset of secure guest Suraj Jitindar Singh (1): KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslot Documentation/virt/kvm/api.txt | 19 + arch/powerpc/Kconfig| 17 + arch/powerpc/include/asm/hvcall.h | 9 + arch/powerpc/include/asm/kvm_book3s_uvmem.h | 48 ++ arch/powerpc/include/asm/kvm_host.h | 56 +- arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/include/asm/ultravisor-api.h | 6 + arch/powerpc/include/asm/ultravisor.h | 36 ++ arch/powerpc/kvm/Makefile | 3 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 22 + arch/powerpc/kvm/book3s_hv.c| 121 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 604 arch/powerpc/kvm/powerpc.c | 12 + include/uapi/linux/kvm.h| 1 + 15 files changed, 953 insertions(+), 5 deletions(-) create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c -- 2.21.0
[PATCH v8 1/8] KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslot
From: Suraj Jitindar Singh The rmap array in the guest memslot is an array of size number of guest pages, allocated at memslot creation time. Each rmap entry in this array is used to store information about the guest page to which it corresponds. For example for a hpt guest it is used to store a lock bit, rc bits, a present bit and the index of a hpt entry in the guest hpt which maps this page. For a radix guest which is running nested guests it is used to store a pointer to a linked list of nested rmap entries which store the nested guest physical address which maps this guest address and for which there is a pte in the shadow page table. As there are currently two uses for the rmap array, and the potential for this to expand to more in the future, define a type field (being the top 8 bits of the rmap entry) to be used to define the type of the rmap entry which is currently present and define two values for this field for the two current uses of the rmap array. Since the nested case uses the rmap entry to store a pointer, define this type as having the two high bits set as is expected for a pointer. Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering). Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras Signed-off-by: Bharata B Rao [Added rmap type KVMPPC_RMAP_UVMEM_PFN] --- arch/powerpc/include/asm/kvm_host.h | 28 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 4bb552d639b8..81cd221ccc04 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -232,11 +232,31 @@ struct revmap_entry { }; /* - * We use the top bit of each memslot->arch.rmap entry as a lock bit, - * and bit 32 as a present flag. The bottom 32 bits are the - * index in the guest HPT of a HPTE that points to the page. + * The rmap array of size number of guest pages is allocated for each memslot. + * This array is used to store usage specific information about the guest page. + * Below are the encodings of the various possible usage types. */ -#define KVMPPC_RMAP_LOCK_BIT 63 +/* Free bits which can be used to define a new usage */ +#define KVMPPC_RMAP_TYPE_MASK 0xff00 +#define KVMPPC_RMAP_NESTED 0xc000 /* Nested rmap array */ +#define KVMPPC_RMAP_HPT0x0100 /* HPT guest */ +#define KVMPPC_RMAP_UVMEM_PFN 0x0200 /* Secure GPA */ + +static inline unsigned long kvmppc_rmap_type(unsigned long *rmap) +{ + return (*rmap & KVMPPC_RMAP_TYPE_MASK); +} + +/* + * rmap usage definition for a hash page table (hpt) guest: + * 0x0800 Lock bit + * 0x0180 RC bits + * 0x0001 Present bit + * 0x HPT index bits + * The bottom 32 bits are the index in the guest HPT of a HPTE that points to + * the page. + */ +#define KVMPPC_RMAP_LOCK_BIT 43 #define KVMPPC_RMAP_RC_SHIFT 32 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT) #define KVMPPC_RMAP_PRESENT0x1ul diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 63e0ce91e29d..7186c65c61c9 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -99,7 +99,7 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, } else { rev->forw = rev->back = pte_index; *rmap = (*rmap & ~KVMPPC_RMAP_INDEX) | - pte_index | KVMPPC_RMAP_PRESENT; + pte_index | KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_HPT; } unlock_rmap(rmap); } -- 2.21.0
[PATCH v8 3/8] kvmppc: Shared pages support for secure guests
A secure guest will share some of its pages with hypervisor (Eg. virtio bounce buffers etc). Support sharing of pages between hypervisor and ultravisor. Once a secure page is converted to shared page, the device page is unmapped from the HV side page tables. Signed-off-by: Bharata B Rao --- arch/powerpc/include/asm/hvcall.h | 3 ++ arch/powerpc/kvm/book3s_hv_uvmem.c | 65 -- 2 files changed, 65 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 2595d0144958..4e98dd992bd1 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -342,6 +342,9 @@ #define H_TLB_INVALIDATE 0xF808 #define H_COPY_TOFROM_GUEST0xF80C +/* Flags for H_SVM_PAGE_IN */ +#define H_PAGE_IN_SHARED0x1 + /* Platform-specific hcalls used by the Ultravisor */ #define H_SVM_PAGE_IN 0xEF00 #define H_SVM_PAGE_OUT 0xEF04 diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index a1eccb065ba9..bcecb643a730 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -46,6 +46,7 @@ struct kvmppc_uvmem_page_pvt { unsigned long *rmap; unsigned int lpid; unsigned long gpa; + bool skip_page_out; }; /* @@ -159,6 +160,53 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start, return ret; } +/* + * Shares the page with HV, thus making it a normal page. + * + * - If the page is already secure, then provision a new page and share + * - If the page is a normal page, share the existing page + * + * In the former case, uses the dev_pagemap_ops migrate_to_ram handler + * to unmap the device page from QEMU's page tables. + */ +static unsigned long +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift) +{ + + int ret = H_PARAMETER; + struct page *uvmem_page; + struct kvmppc_uvmem_page_pvt *pvt; + unsigned long pfn; + unsigned long *rmap; + struct kvm_memory_slot *slot; + unsigned long gfn = gpa >> page_shift; + int srcu_idx; + + srcu_idx = srcu_read_lock(&kvm->srcu); + slot = gfn_to_memslot(kvm, gfn); + if (!slot) + goto out; + + rmap = &slot->arch.rmap[gfn - slot->base_gfn]; + if (kvmppc_rmap_type(rmap) == KVMPPC_RMAP_UVMEM_PFN) { + uvmem_page = pfn_to_page(*rmap & ~KVMPPC_RMAP_UVMEM_PFN); + pvt = (struct kvmppc_uvmem_page_pvt *) + uvmem_page->zone_device_data; + pvt->skip_page_out = true; + } + + pfn = gfn_to_pfn(kvm, gfn); + if (is_error_noslot_pfn(pfn)) + goto out; + + if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift)) + ret = H_SUCCESS; + kvm_release_pfn_clean(pfn); +out: + srcu_read_unlock(&kvm->srcu, srcu_idx); + return ret; +} + /* * H_SVM_PAGE_IN: Move page from normal memory to secure memory. */ @@ -177,9 +225,12 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa, if (page_shift != PAGE_SHIFT) return H_P3; - if (flags) + if (flags & ~H_PAGE_IN_SHARED) return H_P2; + if (flags & H_PAGE_IN_SHARED) + return kvmppc_share_page(kvm, gpa, page_shift); + ret = H_PARAMETER; srcu_idx = srcu_read_lock(&kvm->srcu); down_read(&kvm->mm->mmap_sem); @@ -252,8 +303,16 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, pvt = spage->zone_device_data; pfn = page_to_pfn(dpage); - ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0, - page_shift); + /* +* This function is used in two cases: +* - When HV touches a secure page, for which we do UV_PAGE_OUT +* - When a secure page is converted to shared page, we touch +* the page to essentially unmap the device page. In this +* case we skip page-out. +*/ + if (!pvt->skip_page_out) + ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0, + page_shift); if (ret == U_SUCCESS) *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED; -- 2.21.0
[PATCH v8 4/8] kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls
H_SVM_INIT_START: Initiate securing a VM H_SVM_INIT_DONE: Conclude securing a VM As part of H_SVM_INIT_START, register all existing memslots with the UV. H_SVM_INIT_DONE call by UV informs HV that transition of the guest to secure mode is complete. These two states (transition to secure mode STARTED and transition to secure mode COMPLETED) are recorded in kvm->arch.secure_guest. Setting these states will cause the assembly code that enters the guest to call the UV_RETURN ucall instead of trying to enter the guest directly. Signed-off-by: Bharata B Rao Acked-by: Paul Mackerras --- arch/powerpc/include/asm/hvcall.h | 2 ++ arch/powerpc/include/asm/kvm_book3s_uvmem.h | 12 arch/powerpc/include/asm/kvm_host.h | 4 +++ arch/powerpc/include/asm/ultravisor-api.h | 1 + arch/powerpc/include/asm/ultravisor.h | 7 + arch/powerpc/kvm/book3s_hv.c| 7 + arch/powerpc/kvm/book3s_hv_uvmem.c | 34 + 7 files changed, 67 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 4e98dd992bd1..13bd870609c3 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -348,6 +348,8 @@ /* Platform-specific hcalls used by the Ultravisor */ #define H_SVM_PAGE_IN 0xEF00 #define H_SVM_PAGE_OUT 0xEF04 +#define H_SVM_INIT_START 0xEF08 +#define H_SVM_INIT_DONE0xEF0C /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h index 9603c2b48d67..fc924ef00b91 100644 --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h @@ -11,6 +11,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra, unsigned long flags, unsigned long page_shift); +unsigned long kvmppc_h_svm_init_start(struct kvm *kvm); +unsigned long kvmppc_h_svm_init_done(struct kvm *kvm); #else static inline unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra, @@ -25,5 +27,15 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra, { return H_UNSUPPORTED; } + +static inline unsigned long kvmppc_h_svm_init_start(struct kvm *kvm) +{ + return H_UNSUPPORTED; +} + +static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm) +{ + return H_UNSUPPORTED; +} #endif /* CONFIG_PPC_UV */ #endif /* __POWERPC_KVM_PPC_HMM_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 16633ad3be45..cab3099db8d4 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -281,6 +281,10 @@ struct kvm_hpt_info { struct kvm_resize_hpt; +/* Flag values for kvm_arch.secure_guest */ +#define KVMPPC_SECURE_INIT_START 0x1 /* H_SVM_INIT_START has been called */ +#define KVMPPC_SECURE_INIT_DONE 0x2 /* H_SVM_INIT_DONE completed */ + struct kvm_arch { unsigned int lpid; unsigned int smt_mode; /* # vcpus per virtual core */ diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 1cd1f595fd81..c578d9b13a56 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -25,6 +25,7 @@ /* opcodes */ #define UV_WRITE_PATE 0xF104 #define UV_RETURN 0xF11C +#define UV_REGISTER_MEM_SLOT 0xF120 #define UV_PAGE_IN 0xF128 #define UV_PAGE_OUT0xF12C diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index 0fc4a974b2e8..58ccf5e2d6bb 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -45,4 +45,11 @@ static inline int uv_page_out(u64 lpid, u64 dst_ra, u64 src_gpa, u64 flags, page_shift); } +static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size, + u64 flags, u64 slotid) +{ + return ucall_norets(UV_REGISTER_MEM_SLOT, lpid, start_gpa, + size, flags, slotid); +} + #endif /* _ASM_POWERPC_ULTRAVISOR_H */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c5404db8f0cd..2527f1676b59 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1089,6 +1089,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) kvmppc_get_gpr(vcpu, 5), kvmppc_get_gpr(vcpu, 6)); break; + case H_SVM_INIT_START: + ret = kvmppc_h_svm_init_start(vcpu->kvm); + break; + case H_SVM_INIT_DONE: +
[PATCH v8 2/8] kvmppc: Movement of pages between normal and secure memory
Manage migration of pages betwen normal and secure memory of secure guest by implementing H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. H_SVM_PAGE_IN: Move the content of a normal page to secure page H_SVM_PAGE_OUT: Move the content of a secure page to normal page Private ZONE_DEVICE memory equal to the amount of secure memory available in the platform for running secure guests is created. Whenever a page belonging to the guest becomes secure, a page from this private device memory is used to represent and track that secure page on the HV side. The movement of pages between normal and secure memory is done via migrate_vma_pages() using UV_PAGE_IN and UV_PAGE_OUT ucalls. Signed-off-by: Bharata B Rao --- arch/powerpc/include/asm/hvcall.h | 4 + arch/powerpc/include/asm/kvm_book3s_uvmem.h | 29 ++ arch/powerpc/include/asm/kvm_host.h | 12 + arch/powerpc/include/asm/ultravisor-api.h | 2 + arch/powerpc/include/asm/ultravisor.h | 14 + arch/powerpc/kvm/Makefile | 3 + arch/powerpc/kvm/book3s_hv.c| 19 + arch/powerpc/kvm/book3s_hv_uvmem.c | 431 8 files changed, 514 insertions(+) create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 2023e327..2595d0144958 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -342,6 +342,10 @@ #define H_TLB_INVALIDATE 0xF808 #define H_COPY_TOFROM_GUEST0xF80C +/* Platform-specific hcalls used by the Ultravisor */ +#define H_SVM_PAGE_IN 0xEF00 +#define H_SVM_PAGE_OUT 0xEF04 + /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 #define H_SET_MODE_RESOURCE_SET_DAWR 2 diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h new file mode 100644 index ..9603c2b48d67 --- /dev/null +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __POWERPC_KVM_PPC_HMM_H__ +#define __POWERPC_KVM_PPC_HMM_H__ + +#ifdef CONFIG_PPC_UV +unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, + unsigned long gra, + unsigned long flags, + unsigned long page_shift); +unsigned long kvmppc_h_svm_page_out(struct kvm *kvm, + unsigned long gra, + unsigned long flags, + unsigned long page_shift); +#else +static inline unsigned long +kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra, +unsigned long flags, unsigned long page_shift) +{ + return H_UNSUPPORTED; +} + +static inline unsigned long +kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra, + unsigned long flags, unsigned long page_shift) +{ + return H_UNSUPPORTED; +} +#endif /* CONFIG_PPC_UV */ +#endif /* __POWERPC_KVM_PPC_HMM_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 81cd221ccc04..16633ad3be45 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -869,4 +869,16 @@ static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} +#ifdef CONFIG_PPC_UV +int kvmppc_uvmem_init(void); +void kvmppc_uvmem_free(void); +#else +static inline int kvmppc_uvmem_init(void) +{ + return 0; +} + +static inline void kvmppc_uvmem_free(void) {} +#endif /* CONFIG_PPC_UV */ + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 6a0f9c74f959..1cd1f595fd81 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -25,5 +25,7 @@ /* opcodes */ #define UV_WRITE_PATE 0xF104 #define UV_RETURN 0xF11C +#define UV_PAGE_IN 0xF128 +#define UV_PAGE_OUT0xF12C #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index d7aa97aa7834..0fc4a974b2e8 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -31,4 +31,18 @@ static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1) return ucall_norets(UV_WRITE_PATE, lpid, dw0, dw1); } +static inline int uv_page_in(u64 lpid, u64 src_ra, u64 dst_gpa, u64 flags, +u64 page_shift) +{ + return ucall_norets(UV_PAGE_IN, lpid, src_ra, dst_gpa, flags, + p
[PATCH v8 5/8] kvmppc: Handle memory plug/unplug to secure VM
Register the new memslot with UV during plug and unregister the memslot during unplug. Signed-off-by: Bharata B Rao Acked-by: Paul Mackerras --- arch/powerpc/include/asm/ultravisor-api.h | 1 + arch/powerpc/include/asm/ultravisor.h | 5 + arch/powerpc/kvm/book3s_hv.c | 21 + 3 files changed, 27 insertions(+) diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index c578d9b13a56..46b1ee381695 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -26,6 +26,7 @@ #define UV_WRITE_PATE 0xF104 #define UV_RETURN 0xF11C #define UV_REGISTER_MEM_SLOT 0xF120 +#define UV_UNREGISTER_MEM_SLOT 0xF124 #define UV_PAGE_IN 0xF128 #define UV_PAGE_OUT0xF12C diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index 58ccf5e2d6bb..719c0c3930b9 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -52,4 +52,9 @@ static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size, size, flags, slotid); } +static inline int uv_unregister_mem_slot(u64 lpid, u64 slotid) +{ + return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid); +} + #endif /* _ASM_POWERPC_ULTRAVISOR_H */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2527f1676b59..fc93e5ba5683 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -74,6 +74,7 @@ #include #include #include +#include #include "book3s.h" @@ -4517,6 +4518,26 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm, if (change == KVM_MR_FLAGS_ONLY && kvm_is_radix(kvm) && ((new->flags ^ old->flags) & KVM_MEM_LOG_DIRTY_PAGES)) kvmppc_radix_flush_memslot(kvm, old); + /* +* If UV hasn't yet called H_SVM_INIT_START, don't register memslots. +*/ + if (!kvm->arch.secure_guest) + return; + + switch (change) { + case KVM_MR_CREATE: + uv_register_mem_slot(kvm->arch.lpid, +new->base_gfn << PAGE_SHIFT, +new->npages * PAGE_SIZE, +0, new->id); + break; + case KVM_MR_DELETE: + uv_unregister_mem_slot(kvm->arch.lpid, old->id); + break; + default: + /* TODO: Handle KVM_MR_MOVE */ + break; + } } /* -- 2.21.0
[PATCH v8 6/8] kvmppc: Radix changes for secure guest
- After the guest becomes secure, when we handle a page fault of a page belonging to SVM in HV, send that page to UV via UV_PAGE_IN. - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL. - Ensure all those routines that walk the secondary page tables of the guest don't do so in case of secure VM. For secure guest, the active secondary page tables are in secure memory and the secondary page tables in HV are freed when guest becomes secure. Signed-off-by: Bharata B Rao --- arch/powerpc/include/asm/kvm_host.h | 12 arch/powerpc/include/asm/ultravisor-api.h | 1 + arch/powerpc/include/asm/ultravisor.h | 5 + arch/powerpc/kvm/book3s_64_mmu_radix.c| 22 ++ arch/powerpc/kvm/book3s_hv_uvmem.c| 20 5 files changed, 60 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index cab3099db8d4..17780c82c1b4 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -876,6 +876,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} #ifdef CONFIG_PPC_UV int kvmppc_uvmem_init(void); void kvmppc_uvmem_free(void); +bool kvmppc_is_guest_secure(struct kvm *kvm); +int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa); #else static inline int kvmppc_uvmem_init(void) { @@ -883,6 +885,16 @@ static inline int kvmppc_uvmem_init(void) } static inline void kvmppc_uvmem_free(void) {} + +static inline bool kvmppc_is_guest_secure(struct kvm *kvm) +{ + return false; +} + +static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa) +{ + return -EFAULT; +} #endif /* CONFIG_PPC_UV */ #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 46b1ee381695..cf200d4ce703 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -29,5 +29,6 @@ #define UV_UNREGISTER_MEM_SLOT 0xF124 #define UV_PAGE_IN 0xF128 #define UV_PAGE_OUT0xF12C +#define UV_PAGE_INVAL 0xF138 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index 719c0c3930b9..b333241bbe4c 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -57,4 +57,9 @@ static inline int uv_unregister_mem_slot(u64 lpid, u64 slotid) return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid); } +static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift) +{ + return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift); +} + #endif /* _ASM_POWERPC_ULTRAVISOR_H */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 2d415c36a61d..93ad34e63045 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include /* * Supported radix tree geometry. @@ -915,6 +917,9 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (!(dsisr & DSISR_PRTABLE_FAULT)) gpa |= ea & 0xfff; + if (kvmppc_is_guest_secure(kvm)) + return kvmppc_send_page_to_uv(kvm, gpa & PAGE_MASK); + /* Get the corresponding memslot */ memslot = gfn_to_memslot(kvm, gfn); @@ -972,6 +977,11 @@ int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long gpa = gfn << PAGE_SHIFT; unsigned int shift; + if (kvmppc_is_guest_secure(kvm)) { + uv_page_inval(kvm->arch.lpid, gpa, PAGE_SIZE); + return 0; + } + ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); if (ptep && pte_present(*ptep)) kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot, @@ -989,6 +999,9 @@ int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, int ref = 0; unsigned long old, *rmapp; + if (kvmppc_is_guest_secure(kvm)) + return ref; + ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); if (ptep && pte_present(*ptep) && pte_young(*ptep)) { old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0, @@ -1013,6 +1026,9 @@ int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned int shift; int ref = 0; + if (kvmppc_is_guest_secure(kvm)) + return ref; + ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); if (ptep && pte_present(*ptep) && pte_young(*ptep)) ref = 1; @@ -1030,6 +1046,9 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm, int ret = 0; unsigned long old, *rmapp; + if (kvmppc_is_gues
[PATCH v8 7/8] kvmppc: Support reset of secure guest
Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF. This ioctl will be issued by QEMU during reset and includes the the following steps: - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall - Unpin the VPA pages so that they can be migrated back to secure side when guest becomes secure again. This is required because pinned pages can't be migrated. - Reinitialize guest's partitioned scoped page tables. These are freed when guest becomes secure (H_SVM_INIT_DONE) - Release all device pages of the secure guest. After these steps, guest is ready to issue UV_ESM call once again to switch to secure mode. Signed-off-by: Bharata B Rao Signed-off-by: Sukadev Bhattiprolu [Implementation of uv_svm_terminate() and its call from guest shutdown path] Signed-off-by: Ram Pai [Unpinning of VPA pages] --- Documentation/virt/kvm/api.txt | 19 ++ arch/powerpc/include/asm/kvm_book3s_uvmem.h | 7 ++ arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/include/asm/ultravisor-api.h | 1 + arch/powerpc/include/asm/ultravisor.h | 5 ++ arch/powerpc/kvm/book3s_hv.c| 74 + arch/powerpc/kvm/book3s_hv_uvmem.c | 62 - arch/powerpc/kvm/powerpc.c | 12 include/uapi/linux/kvm.h| 1 + 9 files changed, 182 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 2d067767b617..8e7a02e547e9 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -4111,6 +4111,25 @@ Valid values for 'action': #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +4.121 KVM_PPC_SVM_OFF + +Capability: basic +Architectures: powerpc +Type: vm ioctl +Parameters: none +Returns: 0 on successful completion, +Errors: + EINVAL:if ultravisor failed to terminate the secure guest + ENOMEM:if hypervisor failed to allocate new radix page tables for guest + +This ioctl is used to turn off the secure mode of the guest or transition +the guest from secure mode to normal mode. This is invoked when the guest +is reset. This has no effect if called for a normal guest. + +This ioctl issues an ultravisor call to terminate the secure guest, +unpins the VPA pages, reinitializes guest's partition scoped page +tables and releases all the device pages that are used to track the +secure pages by hypervisor. 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h index fc924ef00b91..6b8cc8edd0ab 100644 --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h @@ -13,6 +13,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long page_shift); unsigned long kvmppc_h_svm_init_start(struct kvm *kvm); unsigned long kvmppc_h_svm_init_done(struct kvm *kvm); +void kvmppc_uvmem_free_memslot_pfns(struct kvm *kvm, + struct kvm_memslots *slots); #else static inline unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra, @@ -37,5 +39,10 @@ static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm) { return H_UNSUPPORTED; } + +static inline void kvmppc_uvmem_free_memslot_pfns(struct kvm *kvm, + struct kvm_memslots *slots) +{ +} #endif /* CONFIG_PPC_UV */ #endif /* __POWERPC_KVM_PPC_HMM_H__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2484e6a8f5ca..e4093d067354 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -177,6 +177,7 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm, extern int kvmppc_switch_mmu_to_hpt(struct kvm *kvm); extern int kvmppc_switch_mmu_to_radix(struct kvm *kvm); extern void kvmppc_setup_partition_table(struct kvm *kvm); +extern int kvmppc_reinit_partition_table(struct kvm *kvm); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce_64 *args); @@ -321,6 +322,7 @@ struct kvmppc_ops { int size); int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, int size); + int (*svm_off)(struct kvm *kvm); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index cf200d4ce703..3a27a0c0be05 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -30,5 +30,6 @@ #define UV_PAGE_IN 0xF128 #define UV_PAGE_OUT0xF12C #define UV_PAGE_INVAL 0xF138 +#define UV_SVM_TERMINATE 0xF13C #endif /* _ASM_POWERPC_ULTRAVISO
[PATCH v8 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option
From: Anshuman Khandual CONFIG_PPC_UV adds support for ultravisor. Signed-off-by: Anshuman Khandual Signed-off-by: Bharata B Rao Signed-off-by: Ram Pai [ Update config help and commit message ] Signed-off-by: Claudio Carvalho --- arch/powerpc/Kconfig | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d8dcd8820369..044838794112 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -448,6 +448,23 @@ config PPC_TRANSACTIONAL_MEM help Support user-mode Transactional Memory on POWERPC. +config PPC_UV + bool "Ultravisor support" + depends on KVM_BOOK3S_HV_POSSIBLE + select ZONE_DEVICE + select DEV_PAGEMAP_OPS + select DEVICE_PRIVATE + select MEMORY_HOTPLUG + select MEMORY_HOTREMOVE + default n + help + This option paravirtualizes the kernel to run in POWER platforms that + supports the Protected Execution Facility (PEF). On such platforms, + the ultravisor firmware runs at a privilege level above the + hypervisor. + + If unsure, say "N". + config LD_HEAD_STUB_CATCH bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if EXPERT depends on PPC64 -- 2.21.0
Re: [PATCH 1/2] libnvdimm/altmap: Track namespace boundaries in altmap
On 9/10/19 1:40 PM, Dan Williams wrote: On Mon, Sep 9, 2019 at 11:29 PM Aneesh Kumar K.V wrote: With PFN_MODE_PMEM namespace, the memmap area is allocated from the device area. Some architectures map the memmap area with large page size. On architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. This maps a namespace size of 16G. When populating memmap region with 16MB page from the device area, make sure the allocated space is not used to map resources outside this namespace. Such usage of device area will prevent a namespace destroy. Add resource end pnf in altmap and use that to check if the memmap area allocation can map pfn outside the namespace. On ppc64 in such case we fallback to allocation from memory. Shouldn't this instead be comprehended by nd_pfn_init() to increase the reservation size so that it fits in the alignment? It may not always be possible to fall back to allocation from memory for extremely large pmem devices. I.e. at 64GB of memmap per 1TB of pmem there may not be enough DRAM to store the memmap. We do switch between DRAM and device for memmap allocation. ppc64 vmemmap_populate does if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { p = altmap_alloc_block_buf(page_size, altmap); if (!p) pr_debug("altmap block allocation failed, falling back to system memory"); } if (!p) p = vmemmap_alloc_block_buf(page_size, node); With that we should be using DRAM for the first and the last mapping, rest of the memmap should be backed by device. -aneesh
Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
On 09/09/19 9:03 PM, Oliver O'Halloran wrote: > On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini wrote: >> >> On 04/09/19 5:50 PM, Michael Ellerman wrote: >>> Hari Bathini writes: >>> >> >> [...] >> +/* + * CPU state data is provided by f/w. Below are the definitions + * provided in HDAT spec. Refer to latest HDAT specification for + * any update to this format. + */ >>> >>> How is this meant to work? If HDAT ever changes the format they will >>> break all existing kernels in the field. >>> +#define HDAT_FADUMP_CPU_DATA_VERSION1 >> >> Changes are not expected here. But this is just to cover for such scenario, >> if that ever happens. > > The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR. > As far as I can tell the values you've assumed here are chip-specific, > non-architected SPR numbers that come from an array buried somewhere > in the SBE codebase. I don't believe you for a second when you say > that this will never change. At least, the understanding is that this numbers not change across processor generations. If something changes, it is supposed to be handled in SBE. Also, I am told this numbers would be listed in the HDAT Spec. Not sure if that happened yet though. Vasant, you have anything to add? >> Also, I think it is a bit far-fetched to error out if versions mismatch. >> Warning and proceeding sounds worthier because the changes are usually >> backward compatible, if and when there are any. Will update accordingly... > > Literally the only reason I didn't drop the CPU DATA parts of the OPAL > MPIPL series was because I assumed the kernel would do the sensible > thing and reject or ignore the structure if it did not know how to > parse the data. I think, the changes if any, would have to be backward compatible for the sake of sanity. Even if they are not, we are better off exporting the /proc/vmcore with a warning and some crazy CPU register data (if parsing goes alright) than no dump at all? - Hari
[PATCH 0/2] powerpc/xmon: Improve output of XIVE commands
Hello, This series extend the interrupt command output with the PQ bit value and reworks the CPU command output to check that a CPU is started. Thanks, C. Cédric Le Goater (2): powerpc/xmon: Improve output of XIVE interrupts powerpc/xmon: Fix output of XIVE IPI arch/powerpc/include/asm/xive.h | 3 +- arch/powerpc/sysdev/xive/common.c | 56 +++ arch/powerpc/xmon/xmon.c | 15 +++-- 3 files changed, 47 insertions(+), 27 deletions(-) -- 2.21.0
[PATCH 2/2] powerpc/xmon: Fix output of XIVE IPI
When dumping the XIVE state of an CPU IPI, xmon does not check if the CPU is started or not which can cause an error. Add a check for that and change the output to be on one line just as the XIVE interrupts of the machine. Signed-off-by: Cédric Le Goater --- arch/powerpc/sysdev/xive/common.c | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 85a27ec49d34..20f45b8a52ab 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -237,25 +237,30 @@ static notrace void xive_dump_eq(const char *name, struct xive_q *q) i0 = be32_to_cpup(q->qpage + idx); idx = (idx + 1) & q->msk; i1 = be32_to_cpup(q->qpage + idx); - xmon_printf(" %s Q T=%d %08x %08x ...\n", name, - q->toggle, i0, i1); + xmon_printf("%s idx=%d T=%d %08x %08x ...", name, +q->idx, q->toggle, i0, i1); } notrace void xmon_xive_do_dump(int cpu) { struct xive_cpu *xc = per_cpu(xive_cpu, cpu); - xmon_printf("XIVE state for CPU %d:\n", cpu); - xmon_printf(" pp=%02x cppr=%02x\n", xc->pending_prio, xc->cppr); - xive_dump_eq("IRQ", &xc->queue[xive_irq_priority]); + xmon_printf("CPU %d:", cpu); + if (xc) { + xmon_printf("pp=%02x CPPR=%02x ", xc->pending_prio, xc->cppr); + #ifdef CONFIG_SMP - { - u64 val = xive_esb_read(&xc->ipi_data, XIVE_ESB_GET); - xmon_printf(" IPI state: %x:%c%c\n", xc->hw_ipi, - val & XIVE_ESB_VAL_P ? 'P' : 'p', - val & XIVE_ESB_VAL_Q ? 'Q' : 'q'); - } + { + u64 val = xive_esb_read(&xc->ipi_data, XIVE_ESB_GET); + + xmon_printf("IPI=0x%08x PQ=%c%c ", xc->hw_ipi, + val & XIVE_ESB_VAL_P ? 'P' : '-', + val & XIVE_ESB_VAL_Q ? 'Q' : '-'); + } #endif + xive_dump_eq("EQ", &xc->queue[xive_irq_priority]); + } + xmon_printf("\n"); } int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d) -- 2.21.0
[PATCH 1/2] powerpc/xmon: Improve output of XIVE interrupts
When looping on the list of interrupts, add the current value of the PQ bits with a load on the ESB page. This has the side effect of faulting the ESB page of all interrupts. Signed-off-by: Cédric Le Goater --- arch/powerpc/include/asm/xive.h | 3 +-- arch/powerpc/sysdev/xive/common.c | 29 ++--- arch/powerpc/xmon/xmon.c | 15 --- 3 files changed, 31 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h index 967d6ab3c977..71f52f22c36b 100644 --- a/arch/powerpc/include/asm/xive.h +++ b/arch/powerpc/include/asm/xive.h @@ -99,8 +99,7 @@ extern void xive_flush_interrupt(void); /* xmon hook */ extern void xmon_xive_do_dump(int cpu); -extern int xmon_xive_get_irq_config(u32 irq, u32 *target, u8 *prio, - u32 *sw_irq); +extern int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d); /* APIs used by KVM */ extern u32 xive_native_default_eq_shift(void); diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index ed4561e71951..85a27ec49d34 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -258,10 +258,33 @@ notrace void xmon_xive_do_dump(int cpu) #endif } -int xmon_xive_get_irq_config(u32 irq, u32 *target, u8 *prio, -u32 *sw_irq) +int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d) { - return xive_ops->get_irq_config(irq, target, prio, sw_irq); + int rc; + u32 target; + u8 prio; + u32 lirq; + + rc = xive_ops->get_irq_config(hw_irq, &target, &prio, &lirq); + if (rc) { + xmon_printf("IRQ 0x%08x : no config rc=%d\n", hw_irq, rc); + return rc; + } + + xmon_printf("IRQ 0x%08x : target=0x%x prio=%02x lirq=0x%x ", + hw_irq, target, prio, lirq); + + if (d) { + struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); + u64 val = xive_esb_read(xd, XIVE_ESB_GET); + + xmon_printf("PQ=%c%c", + val & XIVE_ESB_VAL_P ? 'P' : '-', + val & XIVE_ESB_VAL_Q ? 'Q' : '-'); + } + + xmon_printf("\n"); + return 0; } #endif /* CONFIG_XMON */ diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index dc9832e06256..d83364ebc5c5 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2572,16 +2572,9 @@ static void dump_all_xives(void) dump_one_xive(cpu); } -static void dump_one_xive_irq(u32 num) +static void dump_one_xive_irq(u32 num, struct irq_data *d) { - int rc; - u32 target; - u8 prio; - u32 lirq; - - rc = xmon_xive_get_irq_config(num, &target, &prio, &lirq); - xmon_printf("IRQ 0x%08x : target=0x%x prio=%d lirq=0x%x (rc=%d)\n", - num, target, prio, lirq, rc); + xmon_xive_get_irq_config(num, d); } static void dump_all_xive_irq(void) @@ -2599,7 +2592,7 @@ static void dump_all_xive_irq(void) hwirq = (unsigned int)irqd_to_hwirq(d); /* IPIs are special (HW number 0) */ if (hwirq) - dump_one_xive_irq(hwirq); + dump_one_xive_irq(hwirq, d); } } @@ -2619,7 +2612,7 @@ static void dump_xives(void) return; } else if (c == 'i') { if (scanhex(&num)) - dump_one_xive_irq(num); + dump_one_xive_irq(num, NULL); else dump_all_xive_irq(); return; -- 2.21.0
Re: [PATCH 1/2] libnvdimm/altmap: Track namespace boundaries in altmap
On Tue, Sep 10, 2019 at 1:31 AM Aneesh Kumar K.V wrote: > > On 9/10/19 1:40 PM, Dan Williams wrote: > > On Mon, Sep 9, 2019 at 11:29 PM Aneesh Kumar K.V > > wrote: > >> > >> With PFN_MODE_PMEM namespace, the memmap area is allocated from the device > >> area. Some architectures map the memmap area with large page size. On > >> architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. > >> This maps a namespace size of 16G. > >> > >> When populating memmap region with 16MB page from the device area, > >> make sure the allocated space is not used to map resources outside this > >> namespace. Such usage of device area will prevent a namespace destroy. > >> > >> Add resource end pnf in altmap and use that to check if the memmap area > >> allocation can map pfn outside the namespace. On ppc64 in such case we > >> fallback > >> to allocation from memory. > > > > Shouldn't this instead be comprehended by nd_pfn_init() to increase > > the reservation size so that it fits in the alignment? It may not > > always be possible to fall back to allocation from memory for > > extremely large pmem devices. I.e. at 64GB of memmap per 1TB of pmem > > there may not be enough DRAM to store the memmap. > > > > We do switch between DRAM and device for memmap allocation. ppc64 > vmemmap_populate does > > if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { > p = altmap_alloc_block_buf(page_size, altmap); > if (!p) > pr_debug("altmap block allocation failed, falling back to > system memory"); > } > if (!p) > p = vmemmap_alloc_block_buf(page_size, node); > > > With that we should be using DRAM for the first and the last mapping, > rest of the memmap should be backed by device. Ah, ok, makes sense.
[PATCH v3 02/15] powerpc/32: Add EXCEPTION_PROLOG_0 in head_32.h
This patch creates a macro for the very first part of exception prolog, this will help when implementing CONFIG_VMAP_STACK Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 4 +--- arch/powerpc/kernel/head_32.h | 9 ++--- arch/powerpc/kernel/head_8xx.S | 9 ++--- 3 files changed, 9 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 4a24f8f026c7..9e868567b716 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -272,9 +272,7 @@ __secondary_hold_acknowledge: */ . = 0x200 DO_KVM 0x200 - mtspr SPRN_SPRG_SCRATCH0,r10 - mtspr SPRN_SPRG_SCRATCH1,r11 - mfcrr10 + EXCEPTION_PROLOG_0 #ifdef CONFIG_PPC_CHRP mfspr r11, SPRN_SPRG_THREAD lwz r11, RTAS_SP(r11) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index b2ca8c9ffd8b..8e345f8d4b0e 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -10,13 +10,16 @@ * We assume sprg3 has the physical address of the current * task's thread_struct. */ - .macro EXCEPTION_PROLOG + EXCEPTION_PROLOG_0 + EXCEPTION_PROLOG_1 + EXCEPTION_PROLOG_2 +.endm + +.macro EXCEPTION_PROLOG_0 mtspr SPRN_SPRG_SCRATCH0,r10 mtspr SPRN_SPRG_SCRATCH1,r11 mfcrr10 - EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 .endm .macro EXCEPTION_PROLOG_1 diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 19f583e18402..dac7c0a34eea 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -494,10 +494,7 @@ InstructionTLBError: */ . = 0x1400 DataTLBError: - mtspr SPRN_SPRG_SCRATCH0, r10 - mtspr SPRN_SPRG_SCRATCH1, r11 - mfcrr10 - + EXCEPTION_PROLOG_0 mfspr r11, SPRN_DAR cmpwi cr0, r11, RPN_PATTERN beq-FixupDAR/* must be a buggy dcbX, icbi insn. */ @@ -530,9 +527,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */ */ . = 0x1c00 DataBreakpoint: - mtspr SPRN_SPRG_SCRATCH0, r10 - mtspr SPRN_SPRG_SCRATCH1, r11 - mfcrr10 + EXCEPTION_PROLOG_0 mfspr r11, SPRN_SRR0 cmplwi cr0, r11, (.Ldtlbie - PAGE_OFFSET)@l cmplwi cr7, r11, (.Litlbie - PAGE_OFFSET)@l -- 2.13.3
[PATCH v3 00/15] Enable CONFIG_VMAP_STACK on PPC32
The purpose of this serie is to enable CONFIG_VMAP_STACK on PPC32. rfc v1: initial support on 8xx rfc v2: added stack overflow detection. v3: - Stack overflow detection works, tested with LKDTM STACK_EXHAUST test - Support for book3s32 added Christophe Leroy (15): powerpc/32: replace MTMSRD() by mtmsr powerpc/32: Add EXCEPTION_PROLOG_0 in head_32.h powerpc/32: save DEAR/DAR before calling handle_page_fault powerpc/32: move MSR_PR test into EXCEPTION_PROLOG_0 powerpc/32: add a macro to get and/or save DAR and DSISR on stack. powerpc/32: prepare for CONFIG_VMAP_STACK powerpc: align stack to 2 * THREAD_SIZE with VMAP_STACK powerpc/32: Add early stack overflow detection with VMAP stack. powerpc/8xx: Use alternative scratch registers in DTLB miss handler powerpc/8xx: drop exception entries for non-existing exceptions powerpc/8xx: move DataStoreTLBMiss perf handler powerpc/8xx: split breakpoint exception powerpc/8xx: Enable CONFIG_VMAP_STACK powerpc/32s: reorganise DSI handler. powerpc/32s: Activate CONFIG_VMAP_STACK arch/powerpc/include/asm/irq.h | 1 + arch/powerpc/include/asm/processor.h | 6 ++ arch/powerpc/include/asm/thread_info.h | 18 arch/powerpc/kernel/asm-offsets.c | 6 ++ arch/powerpc/kernel/entry_32.S | 55 -- arch/powerpc/kernel/head_32.S | 57 ++ arch/powerpc/kernel/head_32.h | 129 --- arch/powerpc/kernel/head_40x.S | 2 + arch/powerpc/kernel/head_8xx.S | 186 +++-- arch/powerpc/kernel/head_booke.h | 2 + arch/powerpc/kernel/head_fsl_booke.S | 1 + arch/powerpc/kernel/irq.c | 1 + arch/powerpc/kernel/setup_32.c | 3 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/kernel/traps.c| 15 ++- arch/powerpc/kernel/vmlinux.lds.S | 2 +- arch/powerpc/mm/book3s32/hash_low.S| 46 +--- arch/powerpc/mm/book3s32/mmu.c | 9 +- arch/powerpc/perf/8xx-pmu.c| 12 ++- arch/powerpc/platforms/Kconfig.cputype | 3 + 20 files changed, 379 insertions(+), 177 deletions(-) -- 2.13.3
[PATCH v3 04/15] powerpc/32: move MSR_PR test into EXCEPTION_PROLOG_0
In order to simplify VMAP stack implementation, move MSR_PR test into EXCEPTION_PROLOG_0. This requires to not modify cr0 between EXCEPTION_PROLOG_0 and EXCEPTION_PROLOG_1. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.h | 4 ++-- arch/powerpc/kernel/head_8xx.S | 39 --- 2 files changed, 22 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 8e345f8d4b0e..436ffd862d2a 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -19,12 +19,12 @@ .macro EXCEPTION_PROLOG_0 mtspr SPRN_SPRG_SCRATCH0,r10 mtspr SPRN_SPRG_SCRATCH1,r11 + mfspr r11, SPRN_SRR1 /* check whether user or kernel */ mfcrr10 + andi. r11, r11, MSR_PR .endm .macro EXCEPTION_PROLOG_1 - mfspr r11,SPRN_SRR1 /* check whether user or kernel */ - andi. r11,r11,MSR_PR tophys(r11,r1) /* use tophys(r1) if kernel */ beq 1f mfspr r11,SPRN_SPRG_THREAD diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index fb284d95c76a..175c3cfc8014 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -497,8 +497,8 @@ InstructionTLBError: DataTLBError: EXCEPTION_PROLOG_0 mfspr r11, SPRN_DAR - cmpwi cr0, r11, RPN_PATTERN - beq-FixupDAR/* must be a buggy dcbX, icbi insn. */ + cmpwi cr1, r11, RPN_PATTERN + beq-cr1, FixupDAR /* must be a buggy dcbX, icbi insn. */ DARFixed:/* Return from dcbx instruction bug workaround */ EXCEPTION_PROLOG_1 EXCEPTION_PROLOG_2 @@ -531,9 +531,9 @@ DARFixed:/* Return from dcbx instruction bug workaround */ DataBreakpoint: EXCEPTION_PROLOG_0 mfspr r11, SPRN_SRR0 - cmplwi cr0, r11, (.Ldtlbie - PAGE_OFFSET)@l + cmplwi cr1, r11, (.Ldtlbie - PAGE_OFFSET)@l cmplwi cr7, r11, (.Litlbie - PAGE_OFFSET)@l - beq-cr0, 11f + beq-cr1, 11f beq-cr7, 11f EXCEPTION_PROLOG_1 EXCEPTION_PROLOG_2 @@ -578,9 +578,9 @@ FixupDAR:/* Entry point for dcbx workaround. */ mfspr r10, SPRN_SRR0 mtspr SPRN_MD_EPN, r10 rlwinm r11, r10, 16, 0xfff8 - cmpli cr0, r11, PAGE_OFFSET@h + cmpli cr1, r11, PAGE_OFFSET@h mfspr r11, SPRN_M_TWB /* Get level 1 table */ - blt+3f + blt+cr1, 3f rlwinm r11, r10, 16, 0xfff8 0: cmpli cr7, r11, (PAGE_OFFSET + 0x180)@h @@ -595,7 +595,7 @@ FixupDAR:/* Entry point for dcbx workaround. */ 3: lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 1 entry */ mtspr SPRN_MD_TWC, r11 - mtcrr11 + mtcrf 0x01, r11 mfspr r11, SPRN_MD_TWC lwz r11, 0(r11) /* Get the pte */ bt 28,200f /* bit 28 = Large page (8M) */ @@ -608,16 +608,16 @@ FixupDAR:/* Entry point for dcbx workaround. */ * no need to include them here */ xoris r10, r11, 0x7c00/* check if major OP code is 31 */ rlwinm r10, r10, 0, 21, 5 - cmpwi cr0, r10, 2028 /* Is dcbz? */ - beq+142f - cmpwi cr0, r10, 940 /* Is dcbi? */ - beq+142f - cmpwi cr0, r10, 108 /* Is dcbst? */ - beq+144f/* Fix up store bit! */ - cmpwi cr0, r10, 172 /* Is dcbf? */ - beq+142f - cmpwi cr0, r10, 1964 /* Is icbi? */ - beq+142f + cmpwi cr1, r10, 2028 /* Is dcbz? */ + beq+cr1, 142f + cmpwi cr1, r10, 940 /* Is dcbi? */ + beq+cr1, 142f + cmpwi cr1, r10, 108 /* Is dcbst? */ + beq+cr1, 144f /* Fix up store bit! */ + cmpwi cr1, r10, 172 /* Is dcbf? */ + beq+cr1, 142f + cmpwi cr1, r10, 1964 /* Is icbi? */ + beq+cr1, 142f 141: mfspr r10,SPRN_M_TW b DARFixed/* Nope, go back to normal TLB processing */ @@ -676,8 +676,9 @@ FixupDAR:/* Entry point for dcbx workaround. */ add r10, r10, r30 ;b 151f add r10, r10, r31 151: - rlwinm. r11,r11,19,24,28/* offset into jump table for reg RA */ - beq 152f/* if reg RA is zero, don't add it */ + rlwinm r11,r11,19,24,28/* offset into jump table for reg RA */ + cmpwi cr1, r11, 0 + beq cr1, 152f /* if reg RA is zero, don't add it */ addir11, r11, 150b@l/* add start of table */ mtctr r11 /* load ctr with jump address */ rlwinm r11,r11,0,16,10 /* make sure we don't execute this more than once */ -- 2.13.3
[PATCH v3 01/15] powerpc/32: replace MTMSRD() by mtmsr
On PPC32, MTMSRD() is simply defined as mtmsr. Replace MTMSRD(reg) by mtmsr reg in files dedicated to PPC32, this makes the code less obscure. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 18 +- arch/powerpc/kernel/head_32.h | 4 ++-- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index d60908ea37fb..6273b4862482 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -397,7 +397,7 @@ ret_from_syscall: LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC - MTMSRD(r10) + mtmsr r10 lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) @@ -554,7 +554,7 @@ syscall_exit_work: */ ori r10,r10,MSR_EE SYNC - MTMSRD(r10) + mtmsr r10 /* Save NVGPRS if they're not saved already */ lwz r4,_TRAP(r1) @@ -697,7 +697,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_SPE) and.r0,r0,r11 /* FP or altivec or SPE enabled? */ beq+1f andcr11,r11,r0 - MTMSRD(r11) + mtmsr r11 isync 1: stw r11,_MSR(r1) mfcrr10 @@ -831,7 +831,7 @@ ret_from_except: /* Note: We don't bother telling lockdep about it */ LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) SYNC/* Some chip revs have problems here... */ - MTMSRD(r10) /* disable interrupts */ + mtmsr r10 /* disable interrupts */ lwz r3,_MSR(r1) /* Returning to user mode? */ andi. r0,r3,MSR_PR @@ -998,7 +998,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) */ LOAD_REG_IMMEDIATE(r10,MSR_KERNEL & ~MSR_RI) SYNC - MTMSRD(r10) /* clear the RI bit */ + mtmsr r10 /* clear the RI bit */ .globl exc_exit_restart exc_exit_restart: lwz r12,_NIP(r1) @@ -1234,7 +1234,7 @@ do_resched: /* r10 contains MSR_KERNEL here */ #endif ori r10,r10,MSR_EE SYNC - MTMSRD(r10) /* hard-enable interrupts */ + mtmsr r10 /* hard-enable interrupts */ bl schedule recheck: /* Note: And we don't tell it we are disabling them again @@ -1243,7 +1243,7 @@ recheck: */ LOAD_REG_IMMEDIATE(r10,MSR_KERNEL) SYNC - MTMSRD(r10) /* disable interrupts */ + mtmsr r10 /* disable interrupts */ lwz r9,TI_FLAGS(r2) andi. r0,r9,_TIF_NEED_RESCHED bne-do_resched @@ -1252,7 +1252,7 @@ recheck: do_user_signal:/* r10 contains MSR_KERNEL here */ ori r10,r10,MSR_EE SYNC - MTMSRD(r10) /* hard-enable interrupts */ + mtmsr r10 /* hard-enable interrupts */ /* save r13-r31 in the exception frame, if not already done */ lwz r3,_TRAP(r1) andi. r0,r3,1 @@ -1341,7 +1341,7 @@ _GLOBAL(enter_rtas) stw r9,8(r1) LOAD_REG_IMMEDIATE(r0,MSR_KERNEL) SYNC/* disable interrupts so SRR0/1 */ - MTMSRD(r0) /* don't get trashed */ + mtmsr r0 /* don't get trashed */ li r9,MSR_KERNEL & ~(MSR_IR|MSR_DR) mtlrr6 stw r7, THREAD + RTAS_SP(r2) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 8abc7783dbe5..b2ca8c9ffd8b 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -50,7 +50,7 @@ rlwinm r9,r9,0,14,12 /* clear MSR_WE (necessary?) */ #else li r10,MSR_KERNEL & ~(MSR_IR|MSR_DR) /* can take exceptions */ - MTMSRD(r10) /* (except for mach check in rtas) */ + mtmsr r10 /* (except for mach check in rtas) */ #endif stw r0,GPR0(r11) lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ @@ -80,7 +80,7 @@ rlwinm r9,r9,0,14,12 /* clear MSR_WE (necessary?) */ #else LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take exceptions */ - MTMSRD(r10) /* (except for mach check in rtas) */ + mtmsr r10 /* (except for mach check in rtas) */ #endif lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ stw r2,GPR2(r11) -- 2.13.3
[PATCH v3 03/15] powerpc/32: save DEAR/DAR before calling handle_page_fault
handle_page_fault() is the only function that save DAR/DEAR itself. Save DAR/DEAR before calling handle_page_fault() to prepare for VMAP stack which will require to save even before. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 1 - arch/powerpc/kernel/head_32.S| 2 ++ arch/powerpc/kernel/head_40x.S | 2 ++ arch/powerpc/kernel/head_8xx.S | 2 ++ arch/powerpc/kernel/head_booke.h | 2 ++ arch/powerpc/kernel/head_fsl_booke.S | 1 + 6 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 6273b4862482..317ad9df8ba8 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -621,7 +621,6 @@ ppc_swapcontext: */ .globl handle_page_fault handle_page_fault: - stw r4,_DAR(r1) addir3,r1,STACK_FRAME_OVERHEAD #ifdef CONFIG_PPC_BOOK3S_32 andis. r0,r5,DSISR_DABRMATCH@h diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 9e868567b716..bebb49d877f2 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -310,6 +310,7 @@ BEGIN_MMU_FTR_SECTION END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) 1: lwz r5,_DSISR(r11) /* get DSISR value */ mfspr r4,SPRN_DAR + stw r4, _DAR(r11) EXC_XFER_LITE(0x300, handle_page_fault) @@ -327,6 +328,7 @@ BEGIN_MMU_FTR_SECTION END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) 1: mr r4,r12 andis. r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */ + stw r4, _DAR(r11) EXC_XFER_LITE(0x400, handle_page_fault) /* External interrupt */ diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S index 585ea1976550..9bb663977e84 100644 --- a/arch/powerpc/kernel/head_40x.S +++ b/arch/powerpc/kernel/head_40x.S @@ -313,6 +313,7 @@ _ENTRY(saved_ksp_limit) START_EXCEPTION(0x0400, InstructionAccess) EXCEPTION_PROLOG mr r4,r12 /* Pass SRR0 as arg2 */ + stw r4, _DEAR(r11) li r5,0/* Pass zero as arg3 */ EXC_XFER_LITE(0x400, handle_page_fault) @@ -676,6 +677,7 @@ DataAccess: mfspr r5,SPRN_ESR /* Grab the ESR, save it, pass arg3 */ stw r5,_ESR(r11) mfspr r4,SPRN_DEAR/* Grab the DEAR, save it, pass arg2 */ + stw r4, _DEAR(r11) EXC_XFER_LITE(0x300, handle_page_fault) /* Other PowerPC processors, namely those derived from the 6xx-series diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index dac7c0a34eea..fb284d95c76a 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -486,6 +486,7 @@ InstructionTLBError: tlbie r4 /* 0x400 is InstructionAccess exception, needed by bad_page_fault() */ .Litlbie: + stw r4, _DAR(r11) EXC_XFER_LITE(0x400, handle_page_fault) /* This is the data TLB error on the MPC8xx. This could be due to @@ -504,6 +505,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */ mfspr r5,SPRN_DSISR stw r5,_DSISR(r11) mfspr r4,SPRN_DAR + stw r4, _DAR(r11) andis. r10,r5,DSISR_NOHPTE@h beq+.Ldtlbie tlbie r4 diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 2ae635df9026..37fc84ed90e3 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -467,6 +467,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) mfspr r5,SPRN_ESR;/* Grab the ESR and save it */\ stw r5,_ESR(r11); \ mfspr r4,SPRN_DEAR; /* Grab the DEAR */ \ + stw r4, _DEAR(r11); \ EXC_XFER_LITE(0x0300, handle_page_fault) #define INSTRUCTION_STORAGE_EXCEPTION\ @@ -475,6 +476,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) mfspr r5,SPRN_ESR;/* Grab the ESR and save it */\ stw r5,_ESR(r11); \ mr r4,r12; /* Pass SRR0 as arg2 */ \ + stw r4, _DEAR(r11); \ li r5,0; /* Pass zero as arg3 */ \ EXC_XFER_LITE(0x0400, handle_page_fault) diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index adf0505dbe02..442aaac292b0 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -376,6 +376,7 @@ interrupt_base: mfspr r4,SPRN_DEAR/* Grab the DEAR, save it, pass arg2 */ andis. r10,r5,(ESR_ILK|ESR_DLK)@h bne 1f + s
[PATCH v3 12/15] powerpc/8xx: split breakpoint exception
Breakpoint exception is big. Split it to support future growth on exception prolog. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 1e718e47fe3c..225e242ce1c5 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -490,14 +490,7 @@ DARFixed:/* Return from dcbx instruction bug workaround */ * support of breakpoints and such. Someday I will get around to * using them. */ - . = 0x1c00 -DataBreakpoint: - EXCEPTION_PROLOG_0 - mfspr r11, SPRN_SRR0 - cmplwi cr1, r11, (.Ldtlbie - PAGE_OFFSET)@l - cmplwi cr7, r11, (.Litlbie - PAGE_OFFSET)@l - beq-cr1, 11f - beq-cr7, 11f +do_databreakpoint: EXCEPTION_PROLOG_1 EXCEPTION_PROLOG_2 addir3,r1,STACK_FRAME_OVERHEAD @@ -505,7 +498,15 @@ DataBreakpoint: stw r4,_DAR(r11) mfspr r5,SPRN_DSISR EXC_XFER_STD(0x1c00, do_break) -11: + + . = 0x1c00 +DataBreakpoint: + EXCEPTION_PROLOG_0 + mfspr r11, SPRN_SRR0 + cmplwi cr1, r11, (.Ldtlbie - PAGE_OFFSET)@l + cmplwi cr7, r11, (.Litlbie - PAGE_OFFSET)@l + cror4*cr1+eq, 4*cr1+eq, 4*cr7+eq + bne cr1, do_databreakpoint mtcrr10 mfspr r10, SPRN_SPRG_SCRATCH0 mfspr r11, SPRN_SPRG_SCRATCH1 -- 2.13.3
[PATCH v3 08/15] powerpc/32: Add early stack overflow detection with VMAP stack.
To avoid recursive faults, stack overflow detection has to be performed before writing in the stack in exception prologs. Do it by checking the alignment. If the stack pointer alignment is wrong, it means it is pointing to the following or preceding page. Without VMAP stack, a stack overflow is catastrophic. With VMAP stack, a stack overflow isn't destructive, so don't panic. Kill the task with SIGSEGV instead. A dedicated overflow stack is set up for each CPU. lkdtm: Performing direct entry EXHAUST_STACK lkdtm: Calling function with 512 frame size to depth 32 ... lkdtm: loop 32/32 ... lkdtm: loop 31/32 ... lkdtm: loop 30/32 ... lkdtm: loop 29/32 ... lkdtm: loop 28/32 ... lkdtm: loop 27/32 ... lkdtm: loop 26/32 ... lkdtm: loop 25/32 ... lkdtm: loop 24/32 ... lkdtm: loop 23/32 ... lkdtm: loop 22/32 ... lkdtm: loop 21/32 ... lkdtm: loop 20/32 ... Kernel stack overflow in process test[359], r1=c900c008 Oops: Kernel stack overflow, sig: 6 [#1] BE PAGE_SIZE=4K MMU=Hash PowerMac Modules linked in: CPU: 0 PID: 359 Comm: test Not tainted 5.3.0-rc7+ #2225 NIP: c0622060 LR: c0626710 CTR: REGS: c0895f48 TRAP: Not tainted (5.3.0-rc7+) MSR: 1032 CR: 28004224 XER: GPR00: c0626ca4 c900c008 c783c000 c07335cc c900c010 c07335cc c900c0f0 c07335cc GPR08: c900c0f0 0001 28008222 GPR16: 10010128 1001 b799c245 10010158 c07335cc 0025 GPR24: c069 c08b91d4 c068f688 0020 c900c0f0 c068f668 c08b95b4 c08b91d4 NIP [c0622060] format_decode+0x0/0x4d4 LR [c0626710] vsnprintf+0x80/0x5fc Call Trace: [c900c068] [c0626ca4] vscnprintf+0x18/0x48 [c900c078] [c007b944] vprintk_store+0x40/0x214 [c900c0b8] [c007bf50] vprintk_emit+0x90/0x1dc [c900c0e8] [c007c5cc] printk+0x50/0x60 [c900c128] [c03da5b0] recursive_loop+0x44/0x6c [c900c338] [c03da5c4] recursive_loop+0x58/0x6c [c900c548] [c03da5c4] recursive_loop+0x58/0x6c [c900c758] [c03da5c4] recursive_loop+0x58/0x6c [c900c968] [c03da5c4] recursive_loop+0x58/0x6c [c900cb78] [c03da5c4] recursive_loop+0x58/0x6c [c900cd88] [c03da5c4] recursive_loop+0x58/0x6c [c900cf98] [c03da5c4] recursive_loop+0x58/0x6c [c900d1a8] [c03da5c4] recursive_loop+0x58/0x6c [c900d3b8] [c03da5c4] recursive_loop+0x58/0x6c [c900d5c8] [c03da5c4] recursive_loop+0x58/0x6c [c900d7d8] [c03da5c4] recursive_loop+0x58/0x6c [c900d9e8] [c03da5c4] recursive_loop+0x58/0x6c [c900dbf8] [c03da5c4] recursive_loop+0x58/0x6c [c900de08] [c03da67c] lkdtm_EXHAUST_STACK+0x30/0x4c [c900de18] [c03da3e8] direct_entry+0xc8/0x140 [c900de48] [c029fb40] full_proxy_write+0x64/0xcc [c900de68] [c01500f8] __vfs_write+0x30/0x1d0 [c900dee8] [c0152cb8] vfs_write+0xb8/0x1d4 [c900df08] [c0152f7c] ksys_write+0x58/0xe8 [c900df38] [c0014208] ret_from_syscall+0x0/0x34 --- interrupt: c01 at 0xf806664 LR = 0x1000c868 Instruction dump: 4b91 80010014 7c832378 7c0803a6 38210010 4e800020 3d20c08a 3ca0c089 8089a0cc 38a58f0c 3861 4ba2d494 <9421ffe0> 7c0802a6 bfc10018 7c9f2378 Signed-off-by: Christophe Leroy Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 1 + arch/powerpc/kernel/entry_32.S | 25 + arch/powerpc/kernel/head_32.h | 4 arch/powerpc/kernel/irq.c | 1 + arch/powerpc/kernel/setup_32.c | 1 + arch/powerpc/kernel/traps.c| 15 --- 6 files changed, 44 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 814dfab7e392..ec74ced2437d 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -55,6 +55,7 @@ extern void *mcheckirq_ctx[NR_CPUS]; */ extern void *hardirq_ctx[NR_CPUS]; extern void *softirq_ctx[NR_CPUS]; +extern void *stackovf_ctx[NR_CPUS]; void call_do_softirq(void *sp); void call_do_irq(struct pt_regs *regs, void *sp); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 2a26fe19f0b1..00fcf954e742 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -184,9 +184,11 @@ transfer_to_handler: */ kuap_save_and_lock r11, r12, r9, r2, r0 addir2, r12, -THREAD +#ifndef CONFIG_VMAP_STACK lwz r9,KSP_LIMIT(r12) cmplw r1,r9 /* if r1 <= ksp_limit */ ble-stack_ovf /* then the kernel stack overflowed */ +#endif 5: #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500) lwz r12,TI_LOCAL_FLAGS(r2) @@ -298,6 +300,28 @@ reenable_mmu: * On kernel stack overflow, load up an initial stack pointer * and call StackOverflow(regs), which should not return. */ +#ifdef CONFIG_VMAP_STACK +_GLOBAL(stack_ovf) + li r11, 0 +#ifdef CONFIG_SMP + mfspr r11, SPRN_SPRG_THREAD + tovirt(r11, r11) + lwz r11, TASK_CPU - THREAD(r11) + slwir11, r11, 3 +#endif + addis r11, r11, stackovf_ctx@ha + addir11, r11, stackovf_ctx@l + lwz r11, 0(r11) + cmpwi cr1, r1
[PATCH v3 10/15] powerpc/8xx: drop exception entries for non-existing exceptions
head_8xx.S has entries for all exceptions from 0x100 to 0x1f00. Several of them do not exist and are never generated by the 8xx in accordance with the documentation. Remove those entry points to make some room for future growing exception code. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 29 - 1 file changed, 29 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 3de9c5f1746c..5aa63693f790 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -134,18 +134,6 @@ MachineCheck: addi r3,r1,STACK_FRAME_OVERHEAD EXC_XFER_STD(0x200, machine_check_exception) -/* Data access exception. - * This is "never generated" by the MPC8xx. - */ - . = 0x300 -DataAccess: - -/* Instruction access exception. - * This is "never generated" by the MPC8xx. - */ - . = 0x400 -InstructionAccess: - /* External interrupt */ EXCEPTION(0x500, HardwareInterrupt, do_IRQ, EXC_XFER_LITE) @@ -162,16 +150,9 @@ Alignment: /* Program check exception */ EXCEPTION(0x700, ProgramCheck, program_check_exception, EXC_XFER_STD) -/* No FPU on MPC8xx. This exception is not supposed to happen. -*/ - EXCEPTION(0x800, FPUnavailable, unknown_exception, EXC_XFER_STD) - /* Decrementer */ EXCEPTION(0x900, Decrementer, timer_interrupt, EXC_XFER_LITE) - EXCEPTION(0xa00, Trap_0a, unknown_exception, EXC_XFER_STD) - EXCEPTION(0xb00, Trap_0b, unknown_exception, EXC_XFER_STD) - /* System call */ . = 0xc00 SystemCall: @@ -179,8 +160,6 @@ SystemCall: /* Single step - not used on 601 */ EXCEPTION(0xd00, SingleStep, single_step_exception, EXC_XFER_STD) - EXCEPTION(0xe00, Trap_0e, unknown_exception, EXC_XFER_STD) - EXCEPTION(0xf00, Trap_0f, unknown_exception, EXC_XFER_STD) /* On the MPC8xx, this is a software emulation interrupt. It occurs * for all unimplemented and illegal instructions. @@ -507,14 +486,6 @@ DARFixed:/* Return from dcbx instruction bug workaround */ /* 0x300 is DataAccess exception, needed by bad_page_fault() */ EXC_XFER_LITE(0x300, handle_page_fault) - EXCEPTION(0x1500, Trap_15, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1600, Trap_16, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1700, Trap_17, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1800, Trap_18, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1900, Trap_19, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1a00, Trap_1a, unknown_exception, EXC_XFER_STD) - EXCEPTION(0x1b00, Trap_1b, unknown_exception, EXC_XFER_STD) - /* On the MPC8xx, these next four traps are used for development * support of breakpoints and such. Someday I will get around to * using them. -- 2.13.3
[PATCH v3 13/15] powerpc/8xx: Enable CONFIG_VMAP_STACK
This patch enables CONFIG_VMAP_STACK. For that, a few changes are done in head_8xx.S. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 34 -- arch/powerpc/platforms/Kconfig.cputype | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 225e242ce1c5..fc6d4d10e298 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -127,7 +127,7 @@ instruction_counter: /* Machine check */ . = 0x200 MachineCheck: - EXCEPTION_PROLOG + EXCEPTION_PROLOG dar save_dar_dsisr_on_stack r4, r5, r11 li r6, RPN_PATTERN mtspr SPRN_DAR, r6/* Tag DAR, to be used in DTLB Error */ @@ -140,7 +140,7 @@ MachineCheck: /* Alignment exception */ . = 0x600 Alignment: - EXCEPTION_PROLOG + EXCEPTION_PROLOG dar save_dar_dsisr_on_stack r4, r5, r11 li r6, RPN_PATTERN mtspr SPRN_DAR, r6/* Tag DAR, to be used in DTLB Error */ @@ -457,20 +457,26 @@ InstructionTLBError: */ . = 0x1400 DataTLBError: - EXCEPTION_PROLOG_0 + EXCEPTION_PROLOG_0 dar mfspr r11, SPRN_DAR cmpwi cr1, r11, RPN_PATTERN beq-cr1, FixupDAR /* must be a buggy dcbX, icbi insn. */ DARFixed:/* Return from dcbx instruction bug workaround */ +#ifdef CONFIG_VMAP_STACK + li r11, RPN_PATTERN + mtspr SPRN_DAR, r11 /* Tag DAR, to be used in DTLB Error */ +#endif EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 + EXCEPTION_PROLOG_2 dar get_and_save_dar_dsisr_on_stack r4, r5, r11 andis. r10,r5,DSISR_NOHPTE@h beq+.Ldtlbie tlbie r4 .Ldtlbie: +#ifndef CONFIG_VMAP_STACK li r10,RPN_PATTERN mtspr SPRN_DAR,r10/* Tag DAR, to be used in DTLB Error */ +#endif /* 0x300 is DataAccess exception, needed by bad_page_fault() */ EXC_XFER_LITE(0x300, handle_page_fault) @@ -492,16 +498,20 @@ DARFixed:/* Return from dcbx instruction bug workaround */ */ do_databreakpoint: EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 + EXCEPTION_PROLOG_2 dar addir3,r1,STACK_FRAME_OVERHEAD mfspr r4,SPRN_BAR stw r4,_DAR(r11) +#ifdef CONFIG_VMAP_STACK + lwz r5,_DSISR(r11) +#else mfspr r5,SPRN_DSISR +#endif EXC_XFER_STD(0x1c00, do_break) . = 0x1c00 DataBreakpoint: - EXCEPTION_PROLOG_0 + EXCEPTION_PROLOG_0 dar mfspr r11, SPRN_SRR0 cmplwi cr1, r11, (.Ldtlbie - PAGE_OFFSET)@l cmplwi cr7, r11, (.Litlbie - PAGE_OFFSET)@l @@ -530,6 +540,11 @@ InstructionBreakpoint: EXCEPTION(0x1e00, Trap_1e, unknown_exception, EXC_XFER_STD) EXCEPTION(0x1f00, Trap_1f, unknown_exception, EXC_XFER_STD) +#ifdef CONFIG_VMAP_STACK +stack_ovf_trampoline: + b stack_ovf +#endif + . = 0x2000 /* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions @@ -650,7 +665,14 @@ FixupDAR:/* Entry point for dcbx workaround. */ 152: mfdar r11 mtctr r11 /* restore ctr reg from DAR */ +#ifdef CONFIG_VMAP_STACK + mfspr r11, SPRN_SPRG_THREAD + stw r10, DAR(r11) + mfspr r10, SPRN_DSISR + stw r10, DSISR(r11) +#else mtdar r10 /* save fault EA to DAR */ +#endif mfspr r10,SPRN_M_TW b DARFixed/* Go back to normal TLB handling */ diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 12543e53fa96..3c42569b75cc 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -49,6 +49,7 @@ config PPC_8xx select PPC_HAVE_KUEP select PPC_HAVE_KUAP select PPC_MM_SLICES if HUGETLB_PAGE + select HAVE_ARCH_VMAP_STACK config 40x bool "AMCC 40x" -- 2.13.3
[PATCH v3 05/15] powerpc/32: add a macro to get and/or save DAR and DSISR on stack.
Refactor reading and saving of DAR and DSISR in exception vectors. This will ease the implementation of VMAP stack. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 5 + arch/powerpc/kernel/head_32.h | 11 +++ arch/powerpc/kernel/head_8xx.S | 23 +++ 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index bebb49d877f2..449625b4ff03 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -339,10 +339,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) DO_KVM 0x600 Alignment: EXCEPTION_PROLOG - mfspr r4,SPRN_DAR - stw r4,_DAR(r11) - mfspr r5,SPRN_DSISR - stw r5,_DSISR(r11) + save_dar_dsisr_on_stack r4, r5, r11 addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_STD(0x600, alignment_exception) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 436ffd862d2a..f19a1ab91fb5 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -144,6 +144,17 @@ RFI /* jump to handler, enable MMU */ .endm +.macro save_dar_dsisr_on_stack reg1, reg2, sp + mfspr \reg1, SPRN_DAR + mfspr \reg2, SPRN_DSISR + stw \reg1, _DAR(\sp) + stw \reg2, _DSISR(\sp) +.endm + +.macro get_and_save_dar_dsisr_on_stack reg1, reg2, sp + save_dar_dsisr_on_stack \reg1, \reg2, \sp +.endm + /* * Note: code which follows this uses cr0.eq (set if from kernel), * r11, r12 (SRR0), and r9 (SRR1). diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 175c3cfc8014..25e19af49705 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -128,12 +128,9 @@ instruction_counter: . = 0x200 MachineCheck: EXCEPTION_PROLOG - mfspr r4,SPRN_DAR - stw r4,_DAR(r11) - li r5,RPN_PATTERN - mtspr SPRN_DAR,r5 /* Tag DAR, to be used in DTLB Error */ - mfspr r5,SPRN_DSISR - stw r5,_DSISR(r11) + save_dar_dsisr_on_stack r4, r5, r11 + li r6, RPN_PATTERN + mtspr SPRN_DAR, r6/* Tag DAR, to be used in DTLB Error */ addi r3,r1,STACK_FRAME_OVERHEAD EXC_XFER_STD(0x200, machine_check_exception) @@ -156,12 +153,9 @@ InstructionAccess: . = 0x600 Alignment: EXCEPTION_PROLOG - mfspr r4,SPRN_DAR - stw r4,_DAR(r11) - li r5,RPN_PATTERN - mtspr SPRN_DAR,r5 /* Tag DAR, to be used in DTLB Error */ - mfspr r5,SPRN_DSISR - stw r5,_DSISR(r11) + save_dar_dsisr_on_stack r4, r5, r11 + li r6, RPN_PATTERN + mtspr SPRN_DAR, r6/* Tag DAR, to be used in DTLB Error */ addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_STD(0x600, alignment_exception) @@ -502,10 +496,7 @@ DataTLBError: DARFixed:/* Return from dcbx instruction bug workaround */ EXCEPTION_PROLOG_1 EXCEPTION_PROLOG_2 - mfspr r5,SPRN_DSISR - stw r5,_DSISR(r11) - mfspr r4,SPRN_DAR - stw r4, _DAR(r11) + get_and_save_dar_dsisr_on_stack r4, r5, r11 andis. r10,r5,DSISR_NOHPTE@h beq+.Ldtlbie tlbie r4 -- 2.13.3
[PATCH v3 07/15] powerpc: align stack to 2 * THREAD_SIZE with VMAP_STACK
In order to ease stack overflow detection, align stack to 2 * THREAD_SIZE when using VMAP_STACK. This allows overflow detection using a single bit check. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/thread_info.h | 13 + arch/powerpc/kernel/setup_32.c | 2 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/kernel/vmlinux.lds.S | 2 +- 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 488d5c4670ff..a2270749b282 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -22,6 +22,19 @@ #define THREAD_SIZE(1 << THREAD_SHIFT) +/* + * By aligning VMAP'd stacks to 2 * THREAD_SIZE, we can detect overflow by + * checking sp & (1 << THREAD_SHIFT), which we can do cheaply in the entry + * assembly. + */ +#ifdef CONFIG_VMAP_STACK +#define THREAD_ALIGN_SHIFT (THREAD_SHIFT + 1) +#else +#define THREAD_ALIGN_SHIFT THREAD_SHIFT +#endif + +#define THREAD_ALIGN (1 << THREAD_ALIGN_SHIFT) + #ifndef __ASSEMBLY__ #include #include diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index a7541edf0cdb..180e658c1a6b 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -137,7 +137,7 @@ arch_initcall(ppc_init); static void *__init alloc_stack(void) { - void *ptr = memblock_alloc(THREAD_SIZE, THREAD_SIZE); + void *ptr = memblock_alloc(THREAD_SIZE, THREAD_ALIGN); if (!ptr) panic("cannot allocate %d bytes for stack at %pS\n", diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 44b4c432a273..f630fe4d36a8 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -644,7 +644,7 @@ static void *__init alloc_stack(unsigned long limit, int cpu) BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16); - ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, + ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_ALIGN, MEMBLOCK_LOW_LIMIT, limit, early_cpu_to_node(cpu)); if (!ptr) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 060a1acd7c6d..d38335129c06 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -346,7 +346,7 @@ SECTIONS #endif /* The initial task and kernel stack */ - INIT_TASK_DATA_SECTION(THREAD_SIZE) + INIT_TASK_DATA_SECTION(THREAD_ALIGN) .data..page_aligned : AT(ADDR(.data..page_aligned) - LOAD_OFFSET) { PAGE_ALIGNED_DATA(PAGE_SIZE) -- 2.13.3
[PATCH v3 06/15] powerpc/32: prepare for CONFIG_VMAP_STACK
To support CONFIG_VMAP_STACK, the kernel has to activate Data MMU Translation for accessing the stack. Before doing that it must save SRR0, SRR1 and also DAR and DSISR when relevant, in order to not loose them in case there is a Data TLB Miss once the translation is reactivated. This patch adds fields in thread struct for saving those registers. It prepares entry_32.S to handle exception entry with Data MMU Translation enabled and alters EXCEPTION_PROLOG macros to save SRR0, SRR1, DAR and DSISR then reenables Data MMU. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/processor.h | 6 ++ arch/powerpc/include/asm/thread_info.h | 5 ++ arch/powerpc/kernel/asm-offsets.c | 6 ++ arch/powerpc/kernel/entry_32.S | 7 +++ arch/powerpc/kernel/head_32.h | 101 + 5 files changed, 115 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index a9993e7a443b..92c02d15f117 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -163,6 +163,12 @@ struct thread_struct { #if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_PPC_KUAP) unsigned long kuap; /* opened segments for user access */ #endif +#ifdef CONFIG_VMAP_STACK + unsigned long srr0; + unsigned long srr1; + unsigned long dar; + unsigned long dsisr; +#endif /* Debug Registers */ struct debug_reg debug; struct thread_fp_state fp_state; diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 8e1d0195ac36..488d5c4670ff 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -10,10 +10,15 @@ #define _ASM_POWERPC_THREAD_INFO_H #include +#include #ifdef __KERNEL__ +#if defined(CONFIG_VMAP_STACK) && CONFIG_THREAD_SHIFT < PAGE_SHIFT +#define THREAD_SHIFT PAGE_SHIFT +#else #define THREAD_SHIFT CONFIG_THREAD_SHIFT +#endif #define THREAD_SIZE(1 << THREAD_SHIFT) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 484f54dab247..782cbf489ab0 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -127,6 +127,12 @@ int main(void) OFFSET(KSP_VSID, thread_struct, ksp_vsid); #else /* CONFIG_PPC64 */ OFFSET(PGDIR, thread_struct, pgdir); +#ifdef CONFIG_VMAP_STACK + OFFSET(SRR0, thread_struct, srr0); + OFFSET(SRR1, thread_struct, srr1); + OFFSET(DAR, thread_struct, dar); + OFFSET(DSISR, thread_struct, dsisr); +#endif #ifdef CONFIG_SPE OFFSET(THREAD_EVR0, thread_struct, evr[0]); OFFSET(THREAD_ACC, thread_struct, acc); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 317ad9df8ba8..2a26fe19f0b1 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -140,6 +140,9 @@ transfer_to_handler: stw r12,_CTR(r11) stw r2,_XER(r11) mfspr r12,SPRN_SPRG_THREAD +#ifdef CONFIG_VMAP_STACK + tovirt(r12, r12) +#endif beq 2f /* if from user, fix up THREAD.regs */ addir2, r12, -THREAD addir11,r1,STACK_FRAME_OVERHEAD @@ -195,7 +198,11 @@ transfer_to_handler: transfer_to_handler_cont: 3: mflrr9 +#ifdef CONFIG_VMAP_STACK + tovirt(r9, r9) +#else tovirt(r2, r2) /* set r2 to current */ +#endif lwz r11,0(r9) /* virtual address of handler */ lwz r9,4(r9)/* where to go when done */ #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index f19a1ab91fb5..59e775930be8 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -10,31 +10,57 @@ * We assume sprg3 has the physical address of the current * task's thread_struct. */ -.macro EXCEPTION_PROLOG - EXCEPTION_PROLOG_0 +.macro EXCEPTION_PROLOG ext + EXCEPTION_PROLOG_0 \ext EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 + EXCEPTION_PROLOG_2 \ext .endm -.macro EXCEPTION_PROLOG_0 +.macro EXCEPTION_PROLOG_0 ext mtspr SPRN_SPRG_SCRATCH0,r10 mtspr SPRN_SPRG_SCRATCH1,r11 +#ifdef CONFIG_VMAP_STACK + mfspr r10, SPRN_SPRG_THREAD + .ifnb \ext + mfspr r11, SPRN_DAR + stw r11, DAR(r10) + mfspr r11, SPRN_DSISR + stw r11, DSISR(r10) + .endif + mfspr r11, SPRN_SRR0 + stw r11, SRR0(r10) +#endif mfspr r11, SPRN_SRR1 /* check whether user or kernel */ +#ifdef CONFIG_VMAP_STACK + stw r11, SRR1(r10) +#endif mfcrr10 andi. r11, r11, MSR_PR .endm .macro EXCEPTION_PROLOG_1 +#ifdef CONFIG_VMAP_STACK + li r11, MSR
[PATCH v3 11/15] powerpc/8xx: move DataStoreTLBMiss perf handler
Move DataStoreTLBMiss perf handler in order to cope with future growing exception prolog. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 5aa63693f790..1e718e47fe3c 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -166,18 +166,6 @@ SystemCall: */ EXCEPTION(0x1000, SoftEmu, program_check_exception, EXC_XFER_STD) -/* Called from DataStoreTLBMiss when perf TLB misses events are activated */ -#ifdef CONFIG_PERF_EVENTS - patch_site 0f, patch__dtlbmiss_perf -0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) - addir10, r10, 1 - stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) - mfspr r10, SPRN_DAR - mtspr SPRN_DAR, r11 /* Tag DAR */ - mfspr r11, SPRN_M_TW - rfi -#endif - . = 0x1100 /* * For the MPC8xx, this is a software tablewalk to load the instruction @@ -486,6 +474,18 @@ DARFixed:/* Return from dcbx instruction bug workaround */ /* 0x300 is DataAccess exception, needed by bad_page_fault() */ EXC_XFER_LITE(0x300, handle_page_fault) +/* Called from DataStoreTLBMiss when perf TLB misses events are activated */ +#ifdef CONFIG_PERF_EVENTS + patch_site 0f, patch__dtlbmiss_perf +0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) + addir10, r10, 1 + stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) + mfspr r10, SPRN_DAR + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_M_TW + rfi +#endif + /* On the MPC8xx, these next four traps are used for development * support of breakpoints and such. Someday I will get around to * using them. -- 2.13.3
[PATCH v3 09/15] powerpc/8xx: Use alternative scratch registers in DTLB miss handler
In preparation of handling CONFIG_VMAP_STACK, DTLB miss handler need to use different scratch registers than other exception handlers in order to not jeopardise exception entry on stack DTLB misses. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 27 ++- arch/powerpc/perf/8xx-pmu.c| 12 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 25e19af49705..3de9c5f1746c 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -193,8 +193,9 @@ SystemCall: 0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) addir10, r10, 1 stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0) - mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 + mfspr r10, SPRN_DAR + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_M_TW rfi #endif @@ -337,8 +338,8 @@ ITLBMissLinear: . = 0x1200 DataStoreTLBMiss: - mtspr SPRN_SPRG_SCRATCH0, r10 - mtspr SPRN_SPRG_SCRATCH1, r11 + mtspr SPRN_DAR, r10 + mtspr SPRN_M_TW, r11 mfcrr11 /* If we are faulting a kernel address, we have to use the @@ -403,10 +404,10 @@ DataStoreTLBMiss: mtspr SPRN_MD_RPN, r10/* Update TLB entry */ /* Restore registers */ - mtspr SPRN_DAR, r11 /* Tag DAR */ -0: mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 +0: mfspr r10, SPRN_DAR + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_M_TW rfi patch_site 0b, patch__dtlbmiss_exit_1 @@ -422,10 +423,10 @@ DTLBMissIMMR: mtspr SPRN_MD_RPN, r10/* Update TLB entry */ li r11, RPN_PATTERN - mtspr SPRN_DAR, r11 /* Tag DAR */ -0: mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 +0: mfspr r10, SPRN_DAR + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_M_TW rfi patch_site 0b, patch__dtlbmiss_exit_2 @@ -459,10 +460,10 @@ DTLBMissLinear: mtspr SPRN_MD_RPN, r10/* Update TLB entry */ li r11, RPN_PATTERN - mtspr SPRN_DAR, r11 /* Tag DAR */ -0: mfspr r10, SPRN_SPRG_SCRATCH0 - mfspr r11, SPRN_SPRG_SCRATCH1 +0: mfspr r10, SPRN_DAR + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_M_TW rfi patch_site 0b, patch__dtlbmiss_exit_3 diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c index 19124b0b171a..1ad03c55c88c 100644 --- a/arch/powerpc/perf/8xx-pmu.c +++ b/arch/powerpc/perf/8xx-pmu.c @@ -157,10 +157,6 @@ static void mpc8xx_pmu_read(struct perf_event *event) static void mpc8xx_pmu_del(struct perf_event *event, int flags) { - /* mfspr r10, SPRN_SPRG_SCRATCH0 */ - unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) | - __PPC_SPR(SPRN_SPRG_SCRATCH0); - mpc8xx_pmu_read(event); /* If it was the last user, stop counting to avoid useles overhead */ @@ -173,6 +169,10 @@ static void mpc8xx_pmu_del(struct perf_event *event, int flags) break; case PERF_8xx_ID_ITLB_LOAD_MISS: if (atomic_dec_return(&itlb_miss_ref) == 0) { + /* mfspr r10, SPRN_SPRG_SCRATCH0 */ + unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) | + __PPC_SPR(SPRN_SPRG_SCRATCH0); + patch_instruction_site(&patch__itlbmiss_exit_1, insn); #ifndef CONFIG_PIN_TLB_TEXT patch_instruction_site(&patch__itlbmiss_exit_2, insn); @@ -181,6 +181,10 @@ static void mpc8xx_pmu_del(struct perf_event *event, int flags) break; case PERF_8xx_ID_DTLB_LOAD_MISS: if (atomic_dec_return(&dtlb_miss_ref) == 0) { + /* mfspr r10, SPRN_DAR */ + unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) | + __PPC_SPR(SPRN_DAR); + patch_instruction_site(&patch__dtlbmiss_exit_1, insn); patch_instruction_site(&patch__dtlbmiss_exit_2, insn); patch_instruction_site(&patch__dtlbmiss_exit_3, insn); -- 2.13.3
[PATCH v3 14/15] powerpc/32s: reorganise DSI handler.
The part decidated to handling hash_page() is fully unneeded for processors not having real hash pages like the 603. Lets enlarge the content of the feature fixup, and provide an alternative which jumps directly instead of getting NIPs. Also, in preparation of VMAP stacks, the end of DSI handler has moved to later in the code as it won't fit anymore once VMAP stacks are there. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 449625b4ff03..5bda6a092673 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -295,24 +295,22 @@ __secondary_hold_acknowledge: DO_KVM 0x300 DataAccess: EXCEPTION_PROLOG - mfspr r10,SPRN_DSISR - stw r10,_DSISR(r11) + get_and_save_dar_dsisr_on_stack r4, r5, r11 +BEGIN_MMU_FTR_SECTION #ifdef CONFIG_PPC_KUAP - andis. r0,r10,(DSISR_BAD_FAULT_32S | DSISR_DABRMATCH | DSISR_PROTFAULT)@h + andis. r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH | DSISR_PROTFAULT)@h #else - andis. r0,r10,(DSISR_BAD_FAULT_32S|DSISR_DABRMATCH)@h + andis. r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h #endif - bne 1f /* if not, try to put a PTE */ - mfspr r4,SPRN_DAR /* into the hash table */ - rlwinm r3,r10,32-15,21,21 /* DSISR_STORE -> _PAGE_RW */ -BEGIN_MMU_FTR_SECTION + bne handle_page_fault_tramp /* if not, try to put a PTE */ + rlwinm r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */ bl hash_page -END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) -1: lwz r5,_DSISR(r11) /* get DSISR value */ - mfspr r4,SPRN_DAR - stw r4, _DAR(r11) - EXC_XFER_LITE(0x300, handle_page_fault) - + lwz r5, _DSISR(r11) /* get DSISR value */ + lwz r4, _DAR(r11) + b handle_page_fault_tramp +FTR_SECTION_ELSE + b handle_page_fault_tramp +ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE) /* Instruction access exception. */ . = 0x400 @@ -642,6 +640,9 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU) . = 0x3000 +handle_page_fault_tramp: + EXC_XFER_LITE(0x300, handle_page_fault) + AltiVecUnavailable: EXCEPTION_PROLOG #ifdef CONFIG_ALTIVEC -- 2.13.3
[PATCH v3 15/15] powerpc/32s: Activate CONFIG_VMAP_STACK
A few changes to retrieve DAR and DSISR from struct regs instead of retrieving them directly, as they may have changed due to a TLB miss. Also modifies hash_page() and friends to work with virtual data addresses instead of physical ones. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 4 +++ arch/powerpc/kernel/head_32.S | 19 +++--- arch/powerpc/kernel/head_32.h | 4 ++- arch/powerpc/mm/book3s32/hash_low.S| 46 +- arch/powerpc/mm/book3s32/mmu.c | 9 +-- arch/powerpc/platforms/Kconfig.cputype | 2 ++ 6 files changed, 61 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 00fcf954e742..1d3b152ee54f 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -1365,7 +1365,11 @@ _GLOBAL(enter_rtas) lis r6,1f@ha/* physical return address for rtas */ addir6,r6,1f@l tophys(r6,r6) +#ifdef CONFIG_VMAP_STACK + mr r7, r1 +#else tophys(r7,r1) +#endif lwz r8,RTASENTRY(r4) lwz r4,RTASBASE(r4) mfmsr r9 diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 5bda6a092673..97bc02306a34 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -272,14 +272,22 @@ __secondary_hold_acknowledge: */ . = 0x200 DO_KVM 0x200 +MachineCheck: EXCEPTION_PROLOG_0 +#ifdef CONFIG_VMAP_STACK + li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ + mtmsr r11 +#endif #ifdef CONFIG_PPC_CHRP mfspr r11, SPRN_SPRG_THREAD +#ifdef CONFIG_VMAP_STACK + tovirt(r11, r11) +#endif lwz r11, RTAS_SP(r11) cmpwi cr1, r11, 0 bne cr1, 7f #endif /* CONFIG_PPC_CHRP */ - EXCEPTION_PROLOG_1 + EXCEPTION_PROLOG_1 rtas 7: EXCEPTION_PROLOG_2 addir3,r1,STACK_FRAME_OVERHEAD #ifdef CONFIG_PPC_CHRP @@ -294,7 +302,7 @@ __secondary_hold_acknowledge: . = 0x300 DO_KVM 0x300 DataAccess: - EXCEPTION_PROLOG + EXCEPTION_PROLOG dar get_and_save_dar_dsisr_on_stack r4, r5, r11 BEGIN_MMU_FTR_SECTION #ifdef CONFIG_PPC_KUAP @@ -336,7 +344,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) . = 0x600 DO_KVM 0x600 Alignment: - EXCEPTION_PROLOG + EXCEPTION_PROLOG dar save_dar_dsisr_on_stack r4, r5, r11 addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_STD(0x600, alignment_exception) @@ -643,6 +651,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU) handle_page_fault_tramp: EXC_XFER_LITE(0x300, handle_page_fault) +#ifdef CONFIG_VMAP_STACK +stack_ovf_trampoline: + b stack_ovf +#endif + AltiVecUnavailable: EXCEPTION_PROLOG #ifdef CONFIG_ALTIVEC diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 283d4298d555..ae2c8e07e1d5 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -38,10 +38,12 @@ andi. r11, r11, MSR_PR .endm -.macro EXCEPTION_PROLOG_1 +.macro EXCEPTION_PROLOG_1 rtas #ifdef CONFIG_VMAP_STACK + .ifb\rtas li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ mtmsr r11 + .endif subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */ #else tophys(r11,r1) /* use tophys(r1) if kernel */ diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S index 8bbbd9775c8a..c11b0a005196 100644 --- a/arch/powerpc/mm/book3s32/hash_low.S +++ b/arch/powerpc/mm/book3s32/hash_low.S @@ -25,6 +25,12 @@ #include #include +#ifdef CONFIG_VMAP_STACK +#define ADDR_OFFSET0 +#else +#define ADDR_OFFSETPAGE_OFFSET +#endif + #ifdef CONFIG_SMP .section .bss .align 2 @@ -47,8 +53,8 @@ mmu_hash_lock: .text _GLOBAL(hash_page) #ifdef CONFIG_SMP - lis r8, (mmu_hash_lock - PAGE_OFFSET)@h - ori r8, r8, (mmu_hash_lock - PAGE_OFFSET)@l + lis r8, (mmu_hash_lock - ADDR_OFFSET)@h + ori r8, r8, (mmu_hash_lock - ADDR_OFFSET)@l lis r0,0x0fff b 10f 11:lwz r6,0(r8) @@ -66,9 +72,12 @@ _GLOBAL(hash_page) cmplw 0,r4,r0 ori r3,r3,_PAGE_USER|_PAGE_PRESENT /* test low addresses as user */ mfspr r5, SPRN_SPRG_PGDIR /* phys page-table root */ +#ifdef CONFIG_VMAP_STACK + tovirt(r5, r5) +#endif blt+112f/* assume user more likely */ - lis r5, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ - addir5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l/* kernel page table */ + lis r5, (swapper_pg_dir - ADDR_OFFSET)@ha /* if kernel address, use */ + addir5 ,r5 ,(swapper_pg_dir - ADDR_OFFSET)@l/* kernel page tabl
Re: [PATCH 1/2] libnvdimm/altmap: Track namespace boundaries in altmap
> > With PFN_MODE_PMEM namespace, the memmap area is allocated from the device > area. Some architectures map the memmap area with large page size. On > architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. > This maps a namespace size of 16G. > > When populating memmap region with 16MB page from the device area, > make sure the allocated space is not used to map resources outside this > namespace. Such usage of device area will prevent a namespace destroy. > > Add resource end pnf in altmap and use that to check if the memmap area > allocation can map pfn outside the namespace. On ppc64 in such case we > fallback > to allocation from memory. > > This fix kernel crash reported below: > > [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 > devm_memremap_pages_release+0x2d8/0x2e0 > [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 > [ 133.464760] Faulting instruction address: 0xc007580c > [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] > [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > . > [ 133.464901] NIP [c007580c] vmemmap_free+0x2ac/0x3d0 > [ 133.464906] LR [c00757f8] vmemmap_free+0x298/0x3d0 > [ 133.464910] Call Trace: > [ 133.464914] [c07cbfd0f7b0] [c00757f8] vmemmap_free+0x298/0x3d0 > (unreliable) > [ 133.464921] [c07cbfd0f8d0] [c0370a44] > section_deactivate+0x1a4/0x240 > [ 133.464928] [c07cbfd0f980] [c0386270] > __remove_pages+0x3a0/0x590 > [ 133.464935] [c07cbfd0fa50] [c0074158] > arch_remove_memory+0x88/0x160 > [ 133.464942] [c07cbfd0fae0] [c03be8c0] > devm_memremap_pages_release+0x150/0x2e0 > [ 133.464949] [c07cbfd0fb70] [c0738ea0] > devm_action_release+0x30/0x50 > [ 133.464955] [c07cbfd0fb90] [c073a5a4] > release_nodes+0x344/0x400 > [ 133.464961] [c07cbfd0fc40] [c073378c] > device_release_driver_internal+0x15c/0x250 > [ 133.464968] [c07cbfd0fc80] [c072fd14] unbind_store+0x104/0x110 > [ 133.464973] [c07cbfd0fcd0] [c072ee24] drv_attr_store+0x44/0x70 > [ 133.464981] [c07cbfd0fcf0] [c04a32bc] sysfs_kf_write+0x6c/0xa0 > [ 133.464987] [c07cbfd0fd10] [c04a1dfc] > kernfs_fop_write+0x17c/0x250 > [ 133.464993] [c07cbfd0fd60] [c03c348c] __vfs_write+0x3c/0x70 > [ 133.464999] [c07cbfd0fd80] [c03c75d0] vfs_write+0xd0/0x250 > > Reported-by: Sachin Sant > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/init_64.c | 17 - > drivers/nvdimm/pfn_devs.c | 2 ++ > include/linux/memremap.h | 1 + > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c > index a44f6281ca3a..4e08246acd79 100644 > --- a/arch/powerpc/mm/init_64.c > +++ b/arch/powerpc/mm/init_64.c > @@ -172,6 +172,21 @@ static __meminit void vmemmap_list_populate(unsigned > long phys, > vmemmap_list = vmem_back; > } > > +static bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long > start, > + unsigned long page_size) > +{ > + unsigned long nr_pfn = page_size / sizeof(struct page); > + unsigned long start_pfn = page_to_pfn((struct page *)start); > + > + if ((start_pfn + nr_pfn) > altmap->end_pfn) > + return true; > + > + if (start_pfn < altmap->base_pfn) > + return true; > + > + return false; > +} > + > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int > node, > struct vmem_altmap *altmap) > { > @@ -194,7 +209,7 @@ int __meminit vmemmap_populate(unsigned long start, > unsigned long end, int node, >* fail due to alignment issues when using 16MB hugepages, so >* fall back to system memory if the altmap allocation fail. >*/ > - if (altmap) { > + if (altmap && !altmap_cross_boundary(altmap, start, page_size)) > { > p = altmap_alloc_block_buf(page_size, altmap); > if (!p) > pr_debug("altmap block allocation failed, > falling back to system > memory"); > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 3e7b11cf1aae..a616d69c8224 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -618,9 +618,11 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, > struct dev_pagemap *pgmap) > struct nd_namespace_common *ndns = nd_pfn->ndns; > struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > resource_size_t base = nsio->res.start + start_pad; > + resource_size_t end = nsio->res.end - end_trunc; > struct vmem_altmap __altmap = { > .base_pfn = init_altmap_base(base), > .reserve = init_altmap_reserve(base), > + .end_pfn = P
Re: [PATCH v8 2/8] kvmppc: Movement of pages between normal and secure memory
On Tue, Sep 10, 2019 at 01:59:40PM +0530, Bharata B Rao wrote: > +static struct page *kvmppc_uvmem_get_page(unsigned long *rmap, > + unsigned long gpa, unsigned int lpid) > +{ > + struct page *dpage = NULL; > + unsigned long bit, uvmem_pfn; > + struct kvmppc_uvmem_page_pvt *pvt; > + unsigned long pfn_last, pfn_first; > + > + pfn_first = kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT; > + pfn_last = pfn_first + > +(resource_size(&kvmppc_uvmem_pgmap.res) >> PAGE_SHIFT); > + > + spin_lock(&kvmppc_uvmem_pfn_lock); > + bit = find_first_zero_bit(kvmppc_uvmem_pfn_bitmap, > + pfn_last - pfn_first); > + if (bit >= (pfn_last - pfn_first)) > + goto out; > + bitmap_set(kvmppc_uvmem_pfn_bitmap, bit, 1); > + > + uvmem_pfn = bit + pfn_first; > + dpage = pfn_to_page(uvmem_pfn); > + if (!trylock_page(dpage)) > + goto out_clear; > + > + pvt = kzalloc(sizeof(*pvt), GFP_KERNEL); While re-arraging the code, I had moved this allocation to outside of spinlock and changed from GFP_ATOMIC to GFP_KERNEL. But later realized that error path exit would be cleaner with allocation under the lock. So moved it back but missed changing it back to ATOMIC. Here is the updated patch. >From a97e34cdb7e9bc411627690602c6fa484aa16c56 Mon Sep 17 00:00:00 2001 From: Bharata B Rao Date: Wed, 22 May 2019 10:13:19 +0530 Subject: [PATCH v8 2/8] kvmppc: Movement of pages between normal and secure memory Manage migration of pages betwen normal and secure memory of secure guest by implementing H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. H_SVM_PAGE_IN: Move the content of a normal page to secure page H_SVM_PAGE_OUT: Move the content of a secure page to normal page Private ZONE_DEVICE memory equal to the amount of secure memory available in the platform for running secure guests is created. Whenever a page belonging to the guest becomes secure, a page from this private device memory is used to represent and track that secure page on the HV side. The movement of pages between normal and secure memory is done via migrate_vma_pages() using UV_PAGE_IN and UV_PAGE_OUT ucalls. Signed-off-by: Bharata B Rao --- arch/powerpc/include/asm/hvcall.h | 4 + arch/powerpc/include/asm/kvm_book3s_uvmem.h | 29 ++ arch/powerpc/include/asm/kvm_host.h | 12 + arch/powerpc/include/asm/ultravisor-api.h | 2 + arch/powerpc/include/asm/ultravisor.h | 14 + arch/powerpc/kvm/Makefile | 3 + arch/powerpc/kvm/book3s_hv.c| 19 + arch/powerpc/kvm/book3s_hv_uvmem.c | 431 8 files changed, 514 insertions(+) create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 2023e327..2595d0144958 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -342,6 +342,10 @@ #define H_TLB_INVALIDATE 0xF808 #define H_COPY_TOFROM_GUEST0xF80C +/* Platform-specific hcalls used by the Ultravisor */ +#define H_SVM_PAGE_IN 0xEF00 +#define H_SVM_PAGE_OUT 0xEF04 + /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 #define H_SET_MODE_RESOURCE_SET_DAWR 2 diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h new file mode 100644 index ..9603c2b48d67 --- /dev/null +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __POWERPC_KVM_PPC_HMM_H__ +#define __POWERPC_KVM_PPC_HMM_H__ + +#ifdef CONFIG_PPC_UV +unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, + unsigned long gra, + unsigned long flags, + unsigned long page_shift); +unsigned long kvmppc_h_svm_page_out(struct kvm *kvm, + unsigned long gra, + unsigned long flags, + unsigned long page_shift); +#else +static inline unsigned long +kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra, +unsigned long flags, unsigned long page_shift) +{ + return H_UNSUPPORTED; +} + +static inline unsigned long +kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra, + unsigned long flags, unsigned long page_shift) +{ + return H_UNSUPPORTED; +} +#endif /* CONFIG_PPC_UV */ +#endif /* __POWERPC_KVM_PPC_HMM_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 81cd221ccc04..16633ad3be45 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -869,4 +869,16 @@ static inline void kvm_arch_vcpu_bloc
[PATCH 0/2] powerpc/watchpoint: Disable watchpoint hit by larx/stcx instruction
I've prepared my patch on top of Christophe's patch as it's easy to change stepping_handler() rather than hw_breakpoint_handler(). 2nd patch is the actual fix. Christophe Leroy (1): powerpc/hw_breakpoint: move instruction stepping out of hw_breakpoint_handler() Ravi Bangoria (1): powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions arch/powerpc/kernel/hw_breakpoint.c | 77 +++-- 1 file changed, 50 insertions(+), 27 deletions(-) -- 2.21.0
[PATCH 1/2] powerpc/hw_breakpoint: move instruction stepping out of hw_breakpoint_handler()
From: Christophe Leroy On 8xx, breakpoints stop after executing the instruction, so stepping/emulation is not needed. Move it into a sub-function and remove the #ifdefs. Signed-off-by: Christophe Leroy Reviewed-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 60 - 1 file changed, 33 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index c8d1fa2e9d53..28ad3171bb82 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -198,15 +198,43 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) /* * Handle debug exception notifications. */ +static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, +unsigned long addr) +{ + int stepped; + unsigned int instr; + + /* Do not emulate user-space instructions, instead single-step them */ + if (user_mode(regs)) { + current->thread.last_hit_ubp = bp; + regs->msr |= MSR_SE; + return false; + } + + stepped = 0; + instr = 0; + if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) + stepped = emulate_step(regs, instr); + + /* +* emulate_step() could not execute it. We've failed in reliably +* handling the hw-breakpoint. Unregister it and throw a warning +* message to let the user know about it. +*/ + if (!stepped) { + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + perf_event_disable_inatomic(bp); + return false; + } + return true; +} + int hw_breakpoint_handler(struct die_args *args) { int rc = NOTIFY_STOP; struct perf_event *bp; struct pt_regs *regs = args->regs; -#ifndef CONFIG_PPC_8xx - int stepped = 1; - unsigned int instr; -#endif struct arch_hw_breakpoint *info; unsigned long dar = regs->dar; @@ -251,31 +279,9 @@ int hw_breakpoint_handler(struct die_args *args) (dar - bp->attr.bp_addr < bp->attr.bp_len))) info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; -#ifndef CONFIG_PPC_8xx - /* Do not emulate user-space instructions, instead single-step them */ - if (user_mode(regs)) { - current->thread.last_hit_ubp = bp; - regs->msr |= MSR_SE; + if (!IS_ENABLED(CONFIG_PPC_8xx) && !stepping_handler(regs, bp, info->address)) goto out; - } - - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *) regs->nip)) - stepped = emulate_step(regs, instr); - /* -* emulate_step() could not execute it. We've failed in reliably -* handling the hw-breakpoint. Unregister it and throw a warning -* message to let the user know about it. -*/ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", info->address); - perf_event_disable_inatomic(bp); - goto out; - } -#endif /* * As a policy, the callback is invoked in a 'trigger-after-execute' * fashion -- 2.21.0
[PATCH 2/2] powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions
If watchpoint exception is generated by larx/stcx instructions, the reservation created by larx gets lost while handling exception, and thus stcx instruction always fails. Generally these instructions are used in a while(1) loop, for example spinlocks. And because stcx never succeeds, it loops forever and ultimately hangs the system. Note that ptrace anyway works in one-shot mode and thus for ptrace we don't change the behaviour. It's up to ptrace user to take care of this. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 49 +++-- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 28ad3171bb82..9fa496a598ce 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -195,14 +195,32 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) tsk->thread.last_hit_ubp = NULL; } +static bool is_larx_stcx_instr(struct pt_regs *regs, unsigned int instr) +{ + int ret, type; + struct instruction_op op; + + ret = analyse_instr(&op, regs, instr); + type = GETTYPE(op.type); + return (!ret && (type == LARX || type == STCX)); +} + /* * Handle debug exception notifications. */ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, unsigned long addr) { - int stepped; - unsigned int instr; + unsigned int instr = 0; + + if (__get_user_inatomic(instr, (unsigned int *)regs->nip)) + goto fail; + + if (is_larx_stcx_instr(regs, instr)) { + printk_ratelimited("Watchpoint: Can't emulate/single-step larx/" + "stcx instructions. Disabling watchpoint.\n"); + goto disable; + } /* Do not emulate user-space instructions, instead single-step them */ if (user_mode(regs)) { @@ -211,23 +229,22 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, return false; } - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) - stepped = emulate_step(regs, instr); + if (!emulate_step(regs, instr)) + goto fail; + return true; + +fail: /* -* emulate_step() could not execute it. We've failed in reliably -* handling the hw-breakpoint. Unregister it and throw a warning -* message to let the user know about it. +* We've failed in reliably handling the hw-breakpoint. Unregister +* it and throw a warning message to let the user know about it. */ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", addr); - perf_event_disable_inatomic(bp); - return false; - } - return true; + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + +disable: + perf_event_disable_inatomic(bp); + return false; } int hw_breakpoint_handler(struct die_args *args) -- 2.21.0
Re: [PATCH 0/2] powerpc/watchpoint: Disable watchpoint hit by larx/stcx instruction
Le 10/09/2019 à 12:24, Ravi Bangoria a écrit : I've prepared my patch on top of Christophe's patch as it's easy to change stepping_handler() rather than hw_breakpoint_handler(). 2nd patch is the actual fix. Anyway, my patch is already commited on powerpc/next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20190904&id=658d029df0bce6472c94b724ca54d74bc6659c2e Christophe Christophe Leroy (1): powerpc/hw_breakpoint: move instruction stepping out of hw_breakpoint_handler() Ravi Bangoria (1): powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions arch/powerpc/kernel/hw_breakpoint.c | 77 +++-- 1 file changed, 50 insertions(+), 27 deletions(-)
Re: [PATCH 2/2] powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions
Ravi Bangoria wrote: If watchpoint exception is generated by larx/stcx instructions, the reservation created by larx gets lost while handling exception, and thus stcx instruction always fails. Generally these instructions are used in a while(1) loop, for example spinlocks. And because stcx never succeeds, it loops forever and ultimately hangs the system. Note that ptrace anyway works in one-shot mode and thus for ptrace we don't change the behaviour. It's up to ptrace user to take care of this. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 49 +++-- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 28ad3171bb82..9fa496a598ce 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -195,14 +195,32 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) tsk->thread.last_hit_ubp = NULL; } +static bool is_larx_stcx_instr(struct pt_regs *regs, unsigned int instr) +{ + int ret, type; + struct instruction_op op; + + ret = analyse_instr(&op, regs, instr); + type = GETTYPE(op.type); + return (!ret && (type == LARX || type == STCX)); +} + /* * Handle debug exception notifications. */ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, unsigned long addr) { - int stepped; - unsigned int instr; + unsigned int instr = 0; + + if (__get_user_inatomic(instr, (unsigned int *)regs->nip)) + goto fail; + + if (is_larx_stcx_instr(regs, instr)) { + printk_ratelimited("Watchpoint: Can't emulate/single-step larx/" + "stcx instructions. Disabling watchpoint.\n"); The below WARN() uses the term 'breakpoint'. Better to use consistent terminology. I would rewrite the above as: printk_ratelimited("Breakpoint hit on instruction that can't be emulated. " "Breakpoint at 0x%lx will be disabled.\n", addr); Otherwise: Acked-by: Naveen N. Rao - Naveen + goto disable; + } /* Do not emulate user-space instructions, instead single-step them */ if (user_mode(regs)) { @@ -211,23 +229,22 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, return false; } - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) - stepped = emulate_step(regs, instr); + if (!emulate_step(regs, instr)) + goto fail; + return true; + +fail: /* -* emulate_step() could not execute it. We've failed in reliably -* handling the hw-breakpoint. Unregister it and throw a warning -* message to let the user know about it. +* We've failed in reliably handling the hw-breakpoint. Unregister +* it and throw a warning message to let the user know about it. */ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", addr); - perf_event_disable_inatomic(bp); - return false; - } - return true; + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + +disable: + perf_event_disable_inatomic(bp); + return false; } int hw_breakpoint_handler(struct die_args *args) -- 2.21.0
[PATCH 1/2] powerpc/32: Split kexec low level code out of misc_32.S
Almost half of misc_32.S is dedicated to kexec. Drop it into a dedicated kexec_32.S Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/kexec_32.S | 500 + arch/powerpc/kernel/misc_32.S | 491 3 files changed, 501 insertions(+), 491 deletions(-) create mode 100644 arch/powerpc/kernel/kexec_32.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index c9cc4b689e60..df708de6f866 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o +obj-$(CONFIG_KEXEC_CORE) += kexec_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o diff --git a/arch/powerpc/kernel/kexec_32.S b/arch/powerpc/kernel/kexec_32.S new file mode 100644 index ..3f8ca6a566fb --- /dev/null +++ b/arch/powerpc/kernel/kexec_32.S @@ -0,0 +1,500 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * This file contains kexec low-level functions. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * PPC44x port. Copyright (C) 2011, IBM Corporation + * Author: Suzuki Poulose + */ + +#include +#include +#include +#include +#include + + .text + + /* +* Must be relocatable PIC code callable as a C function. +*/ + .globl relocate_new_kernel +relocate_new_kernel: + /* r3 = page_list */ + /* r4 = reboot_code_buffer */ + /* r5 = start_address */ + +#ifdef CONFIG_FSL_BOOKE + + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#define ENTRY_MAPPING_KEXEC_SETUP +#include "fsl_booke_entry_mapping.S" +#undef ENTRY_MAPPING_KEXEC_SETUP + + mr r3, r29 + mr r4, r30 + mr r5, r31 + + li r0, 0 +#elif defined(CONFIG_44x) + + /* Save our parameters */ + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#ifdef CONFIG_PPC_47x + /* Check for 47x cores */ + mfspr r3,SPRN_PVR + srwir3,r3,16 + cmplwi cr0,r3,PVR_476FPE@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476_ISS@h + beq setup_map_47x +#endif /* CONFIG_PPC_47x */ + +/* + * Code for setting up 1:1 mapping for PPC440x for KEXEC + * + * We cannot switch off the MMU on PPC44x. + * So we: + * 1) Invalidate all the mappings except the one we are running from. + * 2) Create a tmp mapping for our code in the other address space(TS) and + *jump to it. Invalidate the entry we started in. + * 3) Create a 1:1 mapping for 0-2GiB in chunks of 256M in original TS. + * 4) Jump to the 1:1 mapping in original TS. + * 5) Invalidate the tmp mapping. + * + * - Based on the kexec support code for FSL BookE + * + */ + + /* +* Load the PID with kernel PID (0). +* Also load our MSR_IS and TID to MMUCR for TLB search. +*/ + li r3, 0 + mtspr SPRN_PID, r3 + mfmsr r4 + andi. r4,r4,MSR_IS@l + beq wmmucr + orisr3,r3,PPC44x_MMUCR_STS@h +wmmucr: + mtspr SPRN_MMUCR,r3 + sync + + /* +* Invalidate all the TLB entries except the current entry +* where we are running from +*/ + bl 0f /* Find our address */ +0: mflrr5 /* Make it accessible */ + tlbsx r23,0,r5/* Find entry we are in */ + li r4,0/* Start at TLB entry 0 */ + li r3,0/* Set PAGEID inval value */ +1: cmpwr23,r4 /* Is this our entry? */ + beq skip/* If so, skip the inval */ + tlbwe r3,r4,PPC44x_TLB_PAGEID /* If not, inval the entry */ +skip: + addir4,r4,1 /* Increment */ + cmpwi r4,64 /* Are we done? */ + bne 1b /* If not, repeat */ + isync + + /* Create a temp mapping and jump to it */ + andi. r6, r23, 1 /* Find the index to use */ + addir24, r6, 1 /* r24 will contain 1 or 2 */ + + mfmsr r9 /* get the MSR */ + rlwinm r5, r9, 27, 31, 31 /* Extract the MSR[IS] */ + xorir7, r5, 1 /* Use the other address space */ + + /* Read the current mapping entries */ + tlbre r3, r23, PPC44x_TLB_PAGEID + tlbre r4, r23, PPC44x_TLB_XLAT + tlbre r5, r23, PPC44x_TLB_ATTRIB + +
[PATCH 2/2] powerpc/kexec: move kexec files into a dedicated subdir.
arch/powerpc/kernel/ contains 7 files dedicated to kexec. Move them into a dedicated subdirectory. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/Makefile | 16 +--- arch/powerpc/kernel/kexec/Makefile | 22 ++ arch/powerpc/kernel/{ => kexec}/ima_kexec.c| 0 arch/powerpc/kernel/{ => kexec}/kexec_32.S | 2 +- arch/powerpc/kernel/{ => kexec}/kexec_elf_64.c | 0 arch/powerpc/kernel/{ => kexec}/machine_kexec.c| 0 arch/powerpc/kernel/{ => kexec}/machine_kexec_32.c | 0 arch/powerpc/kernel/{ => kexec}/machine_kexec_64.c | 0 .../kernel/{ => kexec}/machine_kexec_file_64.c | 0 9 files changed, 24 insertions(+), 16 deletions(-) create mode 100644 arch/powerpc/kernel/kexec/Makefile rename arch/powerpc/kernel/{ => kexec}/ima_kexec.c (100%) rename arch/powerpc/kernel/{ => kexec}/kexec_32.S (99%) rename arch/powerpc/kernel/{ => kexec}/kexec_elf_64.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_32.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_64.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_file_64.c (100%) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index df708de6f866..b65c44d47157 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -81,7 +81,6 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o -obj-$(CONFIG_KEXEC_CORE) += kexec_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o @@ -125,14 +124,7 @@ pci64-$(CONFIG_PPC64) += pci_dn.o pci-hotplug.o isa-bridge.o obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \ pci-common.o pci_of_scan.o obj-$(CONFIG_PCI_MSI) += msi.o -obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o crash.o \ - machine_kexec_$(BITS).o -obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o -ifdef CONFIG_HAVE_IMA_KEXEC -ifdef CONFIG_IMA -obj-y += ima_kexec.o -endif -endif +obj-$(CONFIG_KEXEC_CORE) += kexec/ obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o @@ -164,12 +156,6 @@ endif GCOV_PROFILE_prom_init.o := n KCOV_INSTRUMENT_prom_init.o := n UBSAN_SANITIZE_prom_init.o := n -GCOV_PROFILE_machine_kexec_64.o := n -KCOV_INSTRUMENT_machine_kexec_64.o := n -UBSAN_SANITIZE_machine_kexec_64.o := n -GCOV_PROFILE_machine_kexec_32.o := n -KCOV_INSTRUMENT_machine_kexec_32.o := n -UBSAN_SANITIZE_machine_kexec_32.o := n GCOV_PROFILE_kprobes.o := n KCOV_INSTRUMENT_kprobes.o := n UBSAN_SANITIZE_kprobes.o := n diff --git a/arch/powerpc/kernel/kexec/Makefile b/arch/powerpc/kernel/kexec/Makefile new file mode 100644 index ..d96ee5660572 --- /dev/null +++ b/arch/powerpc/kernel/kexec/Makefile @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for the linux kernel. +# + +obj-y += machine_kexec.o crash.o machine_kexec_$(BITS).o + +obj-$(CONFIG_PPC32)+= kexec_32.o + +obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o + +ifdef CONFIG_HAVE_IMA_KEXEC +ifdef CONFIG_IMA +obj-y += ima_kexec.o +endif +endif + + +# Disable GCOV, KCOV & sanitizers in odd or sensitive code +GCOV_PROFILE_machine_kexec_$(BITS).o := n +KCOV_INSTRUMENT_machine_kexec_$(BITS).o := n +UBSAN_SANITIZE_machine_kexec_$(BITS).o := n diff --git a/arch/powerpc/kernel/ima_kexec.c b/arch/powerpc/kernel/kexec/ima_kexec.c similarity index 100% rename from arch/powerpc/kernel/ima_kexec.c rename to arch/powerpc/kernel/kexec/ima_kexec.c diff --git a/arch/powerpc/kernel/kexec_32.S b/arch/powerpc/kernel/kexec/kexec_32.S similarity index 99% rename from arch/powerpc/kernel/kexec_32.S rename to arch/powerpc/kernel/kexec/kexec_32.S index 3f8ca6a566fb..b9355e0d5c85 100644 --- a/arch/powerpc/kernel/kexec_32.S +++ b/arch/powerpc/kernel/kexec/kexec_32.S @@ -32,7 +32,7 @@ relocate_new_kernel: mr r31, r5 #define ENTRY_MAPPING_KEXEC_SETUP -#include "fsl_booke_entry_mapping.S" +#include #undef ENTRY_MAPPING_KEXEC_SETUP mr r3, r29 diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec/kexec_elf_64.c similarity index 100% rename from arch/powerpc/kernel/kexec_elf_64.c rename to arch/powerpc/kernel/kexec/kexec_elf_64.c diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/kexec/machine_kexec.c similarity index 100% rename from arch/powerpc/kernel/machine_kexec.c rename to arch/powerpc/kernel/kexec/machine_kexec.c diff --git a/arch/powerpc/kernel/machine_kexec_32.c b/arch/powerpc/kernel/kexec/machine_
[PATCH v2] powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions
If watchpoint exception is generated by larx/stcx instructions, the reservation created by larx gets lost while handling exception, and thus stcx instruction always fails. Generally these instructions are used in a while(1) loop, for example spinlocks. And because stcx never succeeds, it loops forever and ultimately hangs the system. Note that ptrace anyway works in one-shot mode and thus for ptrace we don't change the behaviour. It's up to ptrace user to take care of this. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao --- v1->v2: - v1: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-September/196818.html - Christophe's patch is already merged. Don't include it. - Rewrite warning message arch/powerpc/kernel/hw_breakpoint.c | 49 +++-- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 28ad3171bb82..1007ec36b4cb 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -195,14 +195,32 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) tsk->thread.last_hit_ubp = NULL; } +static bool is_larx_stcx_instr(struct pt_regs *regs, unsigned int instr) +{ + int ret, type; + struct instruction_op op; + + ret = analyse_instr(&op, regs, instr); + type = GETTYPE(op.type); + return (!ret && (type == LARX || type == STCX)); +} + /* * Handle debug exception notifications. */ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, unsigned long addr) { - int stepped; - unsigned int instr; + unsigned int instr = 0; + + if (__get_user_inatomic(instr, (unsigned int *)regs->nip)) + goto fail; + + if (is_larx_stcx_instr(regs, instr)) { + printk_ratelimited("Breakpoint hit on instruction that can't be emulated." + " Breakpoint at 0x%lx will be disabled.\n", addr); + goto disable; + } /* Do not emulate user-space instructions, instead single-step them */ if (user_mode(regs)) { @@ -211,23 +229,22 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, return false; } - stepped = 0; - instr = 0; - if (!__get_user_inatomic(instr, (unsigned int *)regs->nip)) - stepped = emulate_step(regs, instr); + if (!emulate_step(regs, instr)) + goto fail; + return true; + +fail: /* -* emulate_step() could not execute it. We've failed in reliably -* handling the hw-breakpoint. Unregister it and throw a warning -* message to let the user know about it. +* We've failed in reliably handling the hw-breakpoint. Unregister +* it and throw a warning message to let the user know about it. */ - if (!stepped) { - WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " - "0x%lx will be disabled.", addr); - perf_event_disable_inatomic(bp); - return false; - } - return true; + WARN(1, "Unable to handle hardware breakpoint. Breakpoint at " + "0x%lx will be disabled.", addr); + +disable: + perf_event_disable_inatomic(bp); + return false; } int hw_breakpoint_handler(struct die_args *args) -- 2.21.0
CVE-2019-15030: Linux kernel: powerpc: data leak with FP/VMX triggerable by unavailable exception in transaction
The Linux kernel for powerpc since v4.12 has a bug in it's TM handling where any user can read the FP/VMX registers of a difference user's process. Users of TM + FP/VMX can also experience corruption of their FP/VMX state. To trigger the bug, a process starts a transaction and reads a FP/VMX register. This transaction can then fail which causes a rollback to the checkpointed state. Due to the kernel taking an FP/VMX unavaliable exception inside a transaction and the kernel's incorrect handling of this, the checkpointed state can be set to the FP/VMX registers of another process. This checkpointed state can then be read by the process hence leaking data from one process to another. The trigger for this bug is an FP/VMX unavailable exception inside a transaction, hence the process needs FP/VMX off when starting the transaction. FP/VMX availability is under the control of the kernel and is transparent to the user, hence the user has to retry the transaction many times to trigger this bug. All 64-bit machines where TM is present are affected. This includes all POWER8 variants and POWER9 VMs under KVM or LPARs under PowerVM. POWER9 bare metal doesn't support TM and hence is not affected. The bug was introduced in commit: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption") Which was originally merged in v4.12 The upstream fix is here: https://git.kernel.org/torvalds/c/8205d5d98ef7f155de211f5e2eb6ca03d95a5a60 The fix can be verified by running the tm-poison from the kernel selftests. This test is in a patch here: https://patchwork.ozlabs.org/patch/1157467/ which should eventually end up here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/tm/tm-poison.c cheers Mikey
CVE-2019-15031: Linux kernel: powerpc: data leak with FP/VMX triggerable by interrupt in transaction
The Linux kernel for powerpc since v4.15 has a bug in it's TM handling during interrupts where any user can read the FP/VMX registers of a difference user's process. Users of TM + FP/VMX can also experience corruption of their FP/VMX state. To trigger the bug, a process starts a transaction with FP/VMX off and then takes an interrupt. Due to the kernels incorrect handling of the interrupt, FP/VMX is turned on but the checkpointed state is not updated. If this transaction then rolls back, the checkpointed state may contain the state of a different process. This checkpointed state can then be read by the process hence leaking data from one process to another. The trigger for this bug is an interrupt inside a transaction where FP/VMX is off, hence the process needs FP/VMX off when starting the transaction. FP/VMX availability is under the control of the kernel and is transparent to the user, hence the user has to retry the transaction many times to trigger this bug. High interrupt loads also help trigger this bug. All 64-bit machines where TM is present are affected. This includes all POWER8 variants and POWER9 VMs under KVM or LPARs under PowerVM. POWER9 bare metal doesn't support TM and hence is not affected. The bug was introduced in commit: fa7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed") Which was originally merged in v4.15 The upstream fix is here: https://git.kernel.org/torvalds/c/a8318c13e79badb92bc6640704a64cc022a6eb97 The fix can be verified by running the tm-poison from the kernel selftests. This test is in a patch here: https://patchwork.ozlabs.org/patch/1157467/ which should eventually end up here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/tm/tm-poison.c cheers Mikey
[PATCH] powerpc/xive: Fix bogus error code returned by OPAL
There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call to return the 32-bit value 0x when OPAL has run out of IRQs. Unfortunatelty, OPAL return values are signed 64-bit entities and errors are supposed to be negative. If that happens, the linux code confusingly treats 0x as a valid IRQ number and panics at some point. A fix was recently merged in skiboot: e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") but we need a workaround anyway to support older skiboots already on the field. Internally convert 0x to OPAL_RESOURCE which is the usual error returned upon resource exhaustion. Signed-off-by: Greg Kurz --- arch/powerpc/sysdev/xive/native.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c index 37987c815913..c35583f84f9f 100644 --- a/arch/powerpc/sysdev/xive/native.c +++ b/arch/powerpc/sysdev/xive/native.c @@ -231,6 +231,15 @@ static bool xive_native_match(struct device_node *node) return of_device_is_compatible(node, "ibm,opal-xive-vc"); } +static int64_t opal_xive_allocate_irq_fixup(uint32_t chip_id) +{ + s64 irq = opal_xive_allocate_irq(chip_id); + +#define XIVE_ALLOC_NO_SPACE0x /* No possible space */ + return + irq == XIVE_ALLOC_NO_SPACE ? OPAL_RESOURCE : irq; +} + #ifdef CONFIG_SMP static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) { @@ -238,7 +247,7 @@ static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) /* Allocate an IPI and populate info about it */ for (;;) { - irq = opal_xive_allocate_irq(xc->chip_id); + irq = opal_xive_allocate_irq_fixup(xc->chip_id); if (irq == OPAL_BUSY) { msleep(OPAL_BUSY_DELAY_MS); continue; @@ -259,7 +268,7 @@ u32 xive_native_alloc_irq(void) s64 rc; for (;;) { - rc = opal_xive_allocate_irq(OPAL_XIVE_ANY_CHIP); + rc = opal_xive_allocate_irq_fixup(OPAL_XIVE_ANY_CHIP); if (rc != OPAL_BUSY) break; msleep(OPAL_BUSY_DELAY_MS);
Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
Hari Bathini writes: > On 09/09/19 9:03 PM, Oliver O'Halloran wrote: >> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini wrote: >>> On 04/09/19 5:50 PM, Michael Ellerman wrote: Hari Bathini writes: >>> [...] >>> > +/* > + * CPU state data is provided by f/w. Below are the definitions > + * provided in HDAT spec. Refer to latest HDAT specification for > + * any update to this format. > + */ How is this meant to work? If HDAT ever changes the format they will break all existing kernels in the field. > +#define HDAT_FADUMP_CPU_DATA_VERSION1 >>> >>> Changes are not expected here. But this is just to cover for such scenario, >>> if that ever happens. >> >> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR. >> As far as I can tell the values you've assumed here are chip-specific, >> non-architected SPR numbers that come from an array buried somewhere >> in the SBE codebase. I don't believe you for a second when you say >> that this will never change. > > At least, the understanding is that this numbers not change across processor > generations. If something changes, it is supposed to be handled in SBE. Also, > I am told this numbers would be listed in the HDAT Spec. Not sure if that > happened yet though. Vasant, you have anything to add? That doesn't help much because the HDAT spec is not public. The point is with the code written the way it is, these values *must not* change, or else all existing kernels will be broken, which is not acceptable. >>> Also, I think it is a bit far-fetched to error out if versions mismatch. >>> Warning and proceeding sounds worthier because the changes are usually >>> backward compatible, if and when there are any. Will update accordingly... >> >> Literally the only reason I didn't drop the CPU DATA parts of the OPAL >> MPIPL series was because I assumed the kernel would do the sensible >> thing and reject or ignore the structure if it did not know how to >> parse the data. > > I think, the changes if any, would have to be backward compatible for the sake > of sanity. People need to understand that this is an ABI between firmware and in-the-field distribution kernels which are only updated at customer discretion, or possibly never. Any changes *must be* backward compatible. Looking at the header struct: +struct hdat_fadump_thread_hdr { + __be32 pir; + /* 0x00 - 0x0F - The corresponding stop state of the core */ + u8 core_state; + u8 reserved[3]; You have those 3 reserved bytes, so a future revision could repurpose one of those as a flag to indicate a new format. And/or the hdr could be made bigger and new kernels could be taught to look for new things in the space after the hdr but before the reg entries. So I think there is a reasonable mechanism for extending the format in future, but my point is people must understand that this is an ABI and changes must be made accordingly. > Even if they are not, we are better off exporting the /proc/vmcore > with a warning and some crazy CPU register data (if parsing goes alright) than > no dump at all? If it's just a case of reg entries that we don't recognise then yes I think it would be OK to just skip them and continue exporting. But if there's any deeper misunderstanding of the format then we should bail out. I notice now that you don't do anything in opal_fadump_set_regval_regnum() if you are passed a register we don't understand, so that probably needs fixing. cheers
Re: [PATCH v3] powerpc/lockdep: fix a false positive warning
Hi Qian, Sorry I haven't replied sooner, I've been travelling. Qian Cai writes: > The commit 108c14858b9e ("locking/lockdep: Add support for dynamic > keys") introduced a boot warning on powerpc below, because since the > commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds > kvm_tmp[] into the .bss section and then free the rest of unused spaces > back to the page allocator. Thanks for debugging this, but I'd like to fix it differently. kvm_tmp has caused trouble before, with kmemleak, and it can also cause trouble with STRICT_KERNEL_RWX, so I'd like to change how it's done, rather than doing more hacks for it. It should just be a page in text that we use if needed, and don't free, which should avoid all these problems. I'll try and get that done and posted soon. cheers
[PATCH v2 1/2] NFS: Fix inode fileid checks in attribute revalidation code
We want to throw out the attrbute if it refers to the mounted on fileid, and not the real fileid. However we do not want to block cache consistency updates from NFSv4 writes. Reported-by: Murphy Zhou Fixes: 7e10cc25bfa0 ("NFS: Don't refresh attributes with mounted-on-file...") Signed-off-by: Trond Myklebust Signed-off-by: Christophe Leroy --- fs/nfs/inode.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index c764cfe456e5..2a03bfeec10a 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1403,11 +1403,12 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ)) return 0; - /* No fileid? Just exit */ - if (!(fattr->valid & NFS_ATTR_FATTR_FILEID)) - return 0; + if (!(fattr->valid & NFS_ATTR_FATTR_FILEID)) { + /* Only a mounted-on-fileid? Just exit */ + if (fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) + return 0; /* Has the inode gone and changed behind our back? */ - if (nfsi->fileid != fattr->fileid) { + } else if (nfsi->fileid != fattr->fileid) { /* Is this perhaps the mounted-on fileid? */ if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) && nfsi->fileid == fattr->mounted_on_fileid) @@ -1807,11 +1808,12 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) nfs_display_fhandle_hash(NFS_FH(inode)), atomic_read(&inode->i_count), fattr->valid); - /* No fileid? Just exit */ - if (!(fattr->valid & NFS_ATTR_FATTR_FILEID)) - return 0; + if (!(fattr->valid & NFS_ATTR_FATTR_FILEID)) { + /* Only a mounted-on-fileid? Just exit */ + if (fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) + return 0; /* Has the inode gone and changed behind our back? */ - if (nfsi->fileid != fattr->fileid) { + } else if (nfsi->fileid != fattr->fileid) { /* Is this perhaps the mounted-on fileid? */ if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) && nfsi->fileid == fattr->mounted_on_fileid) -- 2.13.3
[PATCH v2 2/2] powerpc/32: Split kexec low level code out of misc_32.S
Almost half of misc_32.S is dedicated to kexec. Drop it into a dedicated kexec_32.S Signed-off-by: Christophe Leroy --- v2: no change --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/kexec_32.S | 500 + arch/powerpc/kernel/misc_32.S | 491 3 files changed, 501 insertions(+), 491 deletions(-) create mode 100644 arch/powerpc/kernel/kexec_32.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index c9cc4b689e60..df708de6f866 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o +obj-$(CONFIG_KEXEC_CORE) += kexec_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o diff --git a/arch/powerpc/kernel/kexec_32.S b/arch/powerpc/kernel/kexec_32.S new file mode 100644 index ..3f8ca6a566fb --- /dev/null +++ b/arch/powerpc/kernel/kexec_32.S @@ -0,0 +1,500 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * This file contains kexec low-level functions. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * PPC44x port. Copyright (C) 2011, IBM Corporation + * Author: Suzuki Poulose + */ + +#include +#include +#include +#include +#include + + .text + + /* +* Must be relocatable PIC code callable as a C function. +*/ + .globl relocate_new_kernel +relocate_new_kernel: + /* r3 = page_list */ + /* r4 = reboot_code_buffer */ + /* r5 = start_address */ + +#ifdef CONFIG_FSL_BOOKE + + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#define ENTRY_MAPPING_KEXEC_SETUP +#include "fsl_booke_entry_mapping.S" +#undef ENTRY_MAPPING_KEXEC_SETUP + + mr r3, r29 + mr r4, r30 + mr r5, r31 + + li r0, 0 +#elif defined(CONFIG_44x) + + /* Save our parameters */ + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#ifdef CONFIG_PPC_47x + /* Check for 47x cores */ + mfspr r3,SPRN_PVR + srwir3,r3,16 + cmplwi cr0,r3,PVR_476FPE@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476_ISS@h + beq setup_map_47x +#endif /* CONFIG_PPC_47x */ + +/* + * Code for setting up 1:1 mapping for PPC440x for KEXEC + * + * We cannot switch off the MMU on PPC44x. + * So we: + * 1) Invalidate all the mappings except the one we are running from. + * 2) Create a tmp mapping for our code in the other address space(TS) and + *jump to it. Invalidate the entry we started in. + * 3) Create a 1:1 mapping for 0-2GiB in chunks of 256M in original TS. + * 4) Jump to the 1:1 mapping in original TS. + * 5) Invalidate the tmp mapping. + * + * - Based on the kexec support code for FSL BookE + * + */ + + /* +* Load the PID with kernel PID (0). +* Also load our MSR_IS and TID to MMUCR for TLB search. +*/ + li r3, 0 + mtspr SPRN_PID, r3 + mfmsr r4 + andi. r4,r4,MSR_IS@l + beq wmmucr + orisr3,r3,PPC44x_MMUCR_STS@h +wmmucr: + mtspr SPRN_MMUCR,r3 + sync + + /* +* Invalidate all the TLB entries except the current entry +* where we are running from +*/ + bl 0f /* Find our address */ +0: mflrr5 /* Make it accessible */ + tlbsx r23,0,r5/* Find entry we are in */ + li r4,0/* Start at TLB entry 0 */ + li r3,0/* Set PAGEID inval value */ +1: cmpwr23,r4 /* Is this our entry? */ + beq skip/* If so, skip the inval */ + tlbwe r3,r4,PPC44x_TLB_PAGEID /* If not, inval the entry */ +skip: + addir4,r4,1 /* Increment */ + cmpwi r4,64 /* Are we done? */ + bne 1b /* If not, repeat */ + isync + + /* Create a temp mapping and jump to it */ + andi. r6, r23, 1 /* Find the index to use */ + addir24, r6, 1 /* r24 will contain 1 or 2 */ + + mfmsr r9 /* get the MSR */ + rlwinm r5, r9, 27, 31, 31 /* Extract the MSR[IS] */ + xorir7, r5, 1 /* Use the other address space */ + + /* Read the current mapping entries */ + tlbre r3, r23, PPC44x_TLB_PAGEID + tlbre r4, r23, PPC44x_TLB_XLAT + tlbre r5, r23, PPC4
[PATCH v2 1/2] powerpc/32: Split kexec low level code out of misc_32.S
Almost half of misc_32.S is dedicated to kexec. Drop it into a dedicated kexec_32.S Signed-off-by: Christophe Leroy --- v2: no change --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/kexec_32.S | 500 + arch/powerpc/kernel/misc_32.S | 491 3 files changed, 501 insertions(+), 491 deletions(-) create mode 100644 arch/powerpc/kernel/kexec_32.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index c9cc4b689e60..df708de6f866 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o +obj-$(CONFIG_KEXEC_CORE) += kexec_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o diff --git a/arch/powerpc/kernel/kexec_32.S b/arch/powerpc/kernel/kexec_32.S new file mode 100644 index ..3f8ca6a566fb --- /dev/null +++ b/arch/powerpc/kernel/kexec_32.S @@ -0,0 +1,500 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * This file contains kexec low-level functions. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * PPC44x port. Copyright (C) 2011, IBM Corporation + * Author: Suzuki Poulose + */ + +#include +#include +#include +#include +#include + + .text + + /* +* Must be relocatable PIC code callable as a C function. +*/ + .globl relocate_new_kernel +relocate_new_kernel: + /* r3 = page_list */ + /* r4 = reboot_code_buffer */ + /* r5 = start_address */ + +#ifdef CONFIG_FSL_BOOKE + + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#define ENTRY_MAPPING_KEXEC_SETUP +#include "fsl_booke_entry_mapping.S" +#undef ENTRY_MAPPING_KEXEC_SETUP + + mr r3, r29 + mr r4, r30 + mr r5, r31 + + li r0, 0 +#elif defined(CONFIG_44x) + + /* Save our parameters */ + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#ifdef CONFIG_PPC_47x + /* Check for 47x cores */ + mfspr r3,SPRN_PVR + srwir3,r3,16 + cmplwi cr0,r3,PVR_476FPE@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476_ISS@h + beq setup_map_47x +#endif /* CONFIG_PPC_47x */ + +/* + * Code for setting up 1:1 mapping for PPC440x for KEXEC + * + * We cannot switch off the MMU on PPC44x. + * So we: + * 1) Invalidate all the mappings except the one we are running from. + * 2) Create a tmp mapping for our code in the other address space(TS) and + *jump to it. Invalidate the entry we started in. + * 3) Create a 1:1 mapping for 0-2GiB in chunks of 256M in original TS. + * 4) Jump to the 1:1 mapping in original TS. + * 5) Invalidate the tmp mapping. + * + * - Based on the kexec support code for FSL BookE + * + */ + + /* +* Load the PID with kernel PID (0). +* Also load our MSR_IS and TID to MMUCR for TLB search. +*/ + li r3, 0 + mtspr SPRN_PID, r3 + mfmsr r4 + andi. r4,r4,MSR_IS@l + beq wmmucr + orisr3,r3,PPC44x_MMUCR_STS@h +wmmucr: + mtspr SPRN_MMUCR,r3 + sync + + /* +* Invalidate all the TLB entries except the current entry +* where we are running from +*/ + bl 0f /* Find our address */ +0: mflrr5 /* Make it accessible */ + tlbsx r23,0,r5/* Find entry we are in */ + li r4,0/* Start at TLB entry 0 */ + li r3,0/* Set PAGEID inval value */ +1: cmpwr23,r4 /* Is this our entry? */ + beq skip/* If so, skip the inval */ + tlbwe r3,r4,PPC44x_TLB_PAGEID /* If not, inval the entry */ +skip: + addir4,r4,1 /* Increment */ + cmpwi r4,64 /* Are we done? */ + bne 1b /* If not, repeat */ + isync + + /* Create a temp mapping and jump to it */ + andi. r6, r23, 1 /* Find the index to use */ + addir24, r6, 1 /* r24 will contain 1 or 2 */ + + mfmsr r9 /* get the MSR */ + rlwinm r5, r9, 27, 31, 31 /* Extract the MSR[IS] */ + xorir7, r5, 1 /* Use the other address space */ + + /* Read the current mapping entries */ + tlbre r3, r23, PPC44x_TLB_PAGEID + tlbre r4, r23, PPC44x_TLB_XLAT + tlbre r5, r23, PPC4
[PATCH v2 2/2] powerpc/kexec: move kexec files into a dedicated subdir.
arch/powerpc/kernel/ contains 7 files dedicated to kexec. Move them into a dedicated subdirectory. Signed-off-by: Christophe Leroy --- v2: moved crash.c as well as it's part of kexec suite. --- arch/powerpc/kernel/Makefile | 19 +--- arch/powerpc/kernel/kexec/Makefile | 25 ++ arch/powerpc/kernel/{ => kexec}/crash.c| 0 arch/powerpc/kernel/{ => kexec}/ima_kexec.c| 0 arch/powerpc/kernel/{ => kexec}/kexec_32.S | 2 +- arch/powerpc/kernel/{ => kexec}/kexec_elf_64.c | 0 arch/powerpc/kernel/{ => kexec}/machine_kexec.c| 0 arch/powerpc/kernel/{ => kexec}/machine_kexec_32.c | 0 arch/powerpc/kernel/{ => kexec}/machine_kexec_64.c | 0 .../kernel/{ => kexec}/machine_kexec_file_64.c | 0 10 files changed, 27 insertions(+), 19 deletions(-) create mode 100644 arch/powerpc/kernel/kexec/Makefile rename arch/powerpc/kernel/{ => kexec}/crash.c (100%) rename arch/powerpc/kernel/{ => kexec}/ima_kexec.c (100%) rename arch/powerpc/kernel/{ => kexec}/kexec_32.S (99%) rename arch/powerpc/kernel/{ => kexec}/kexec_elf_64.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_32.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_64.c (100%) rename arch/powerpc/kernel/{ => kexec}/machine_kexec_file_64.c (100%) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index df708de6f866..42e150e6e663 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -5,9 +5,6 @@ CFLAGS_ptrace.o+= -DUTS_MACHINE='"$(UTS_MACHINE)"' -# Disable clang warning for using setjmp without setjmp.h header -CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header) - ifdef CONFIG_PPC64 CFLAGS_prom_init.o += $(NO_MINIMAL_TOC) endif @@ -81,7 +78,6 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o -obj-$(CONFIG_KEXEC_CORE) += kexec_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o @@ -125,14 +121,7 @@ pci64-$(CONFIG_PPC64) += pci_dn.o pci-hotplug.o isa-bridge.o obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \ pci-common.o pci_of_scan.o obj-$(CONFIG_PCI_MSI) += msi.o -obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o crash.o \ - machine_kexec_$(BITS).o -obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o -ifdef CONFIG_HAVE_IMA_KEXEC -ifdef CONFIG_IMA -obj-y += ima_kexec.o -endif -endif +obj-$(CONFIG_KEXEC_CORE) += kexec/ obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o @@ -164,12 +153,6 @@ endif GCOV_PROFILE_prom_init.o := n KCOV_INSTRUMENT_prom_init.o := n UBSAN_SANITIZE_prom_init.o := n -GCOV_PROFILE_machine_kexec_64.o := n -KCOV_INSTRUMENT_machine_kexec_64.o := n -UBSAN_SANITIZE_machine_kexec_64.o := n -GCOV_PROFILE_machine_kexec_32.o := n -KCOV_INSTRUMENT_machine_kexec_32.o := n -UBSAN_SANITIZE_machine_kexec_32.o := n GCOV_PROFILE_kprobes.o := n KCOV_INSTRUMENT_kprobes.o := n UBSAN_SANITIZE_kprobes.o := n diff --git a/arch/powerpc/kernel/kexec/Makefile b/arch/powerpc/kernel/kexec/Makefile new file mode 100644 index ..aa765037f0c0 --- /dev/null +++ b/arch/powerpc/kernel/kexec/Makefile @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for the linux kernel. +# + +# Disable clang warning for using setjmp without setjmp.h header +CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header) + +obj-y += machine_kexec.o crash.o machine_kexec_$(BITS).o + +obj-$(CONFIG_PPC32)+= kexec_32.o + +obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o + +ifdef CONFIG_HAVE_IMA_KEXEC +ifdef CONFIG_IMA +obj-y += ima_kexec.o +endif +endif + + +# Disable GCOV, KCOV & sanitizers in odd or sensitive code +GCOV_PROFILE_machine_kexec_$(BITS).o := n +KCOV_INSTRUMENT_machine_kexec_$(BITS).o := n +UBSAN_SANITIZE_machine_kexec_$(BITS).o := n diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/kexec/crash.c similarity index 100% rename from arch/powerpc/kernel/crash.c rename to arch/powerpc/kernel/kexec/crash.c diff --git a/arch/powerpc/kernel/ima_kexec.c b/arch/powerpc/kernel/kexec/ima_kexec.c similarity index 100% rename from arch/powerpc/kernel/ima_kexec.c rename to arch/powerpc/kernel/kexec/ima_kexec.c diff --git a/arch/powerpc/kernel/kexec_32.S b/arch/powerpc/kernel/kexec/kexec_32.S similarity index 99% rename from arch/powerpc/kernel/kexec_32.S rename to arch/powerpc/kernel/kexec/kexec_32.S index 3f8ca6a566fb..b9355e0d5c85 100
Re: [PATCH v2 2/2] powerpc/kexec: move kexec files into a dedicated subdir.
On Tue, Sep 10, 2019 at 02:55:27PM +, Christophe Leroy wrote: > arch/powerpc/kernel/ contains 7 files dedicated to kexec. > > Move them into a dedicated subdirectory. > arch/powerpc/kernel/{ => kexec}/ima_kexec.c| 0 > arch/powerpc/kernel/{ => kexec}/kexec_32.S | 2 +- > arch/powerpc/kernel/{ => kexec}/kexec_elf_64.c | 0 > arch/powerpc/kernel/{ => kexec}/machine_kexec.c| 0 > arch/powerpc/kernel/{ => kexec}/machine_kexec_32.c | 0 > arch/powerpc/kernel/{ => kexec}/machine_kexec_64.c | 0 > .../kernel/{ => kexec}/machine_kexec_file_64.c | 0 The filenames do not really need "kexec" in there anymore then? Segher
Re: [PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware
On 10/09/19 7:35 PM, Michael Ellerman wrote: > Hari Bathini writes: >> On 09/09/19 9:03 PM, Oliver O'Halloran wrote: >>> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini wrote: On 04/09/19 5:50 PM, Michael Ellerman wrote: > Hari Bathini writes: [...] >> +/* >> + * CPU state data is provided by f/w. Below are the definitions >> + * provided in HDAT spec. Refer to latest HDAT specification for >> + * any update to this format. >> + */ > > How is this meant to work? If HDAT ever changes the format they will > break all existing kernels in the field. > >> +#define HDAT_FADUMP_CPU_DATA_VERSION1 Changes are not expected here. But this is just to cover for such scenario, if that ever happens. >>> >>> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR. >>> As far as I can tell the values you've assumed here are chip-specific, >>> non-architected SPR numbers that come from an array buried somewhere >>> in the SBE codebase. I don't believe you for a second when you say >>> that this will never change. >> >> At least, the understanding is that this numbers not change across processor >> generations. If something changes, it is supposed to be handled in SBE. Also, >> I am told this numbers would be listed in the HDAT Spec. Not sure if that >> happened yet though. Vasant, you have anything to add? > > That doesn't help much because the HDAT spec is not public. > > The point is with the code written the way it is, these values *must > not* change, or else all existing kernels will be broken, which is not > acceptable. Yeah. It is absurd to error out just by looking at version number... > Also, I think it is a bit far-fetched to error out if versions mismatch. Warning and proceeding sounds worthier because the changes are usually backward compatible, if and when there are any. Will update accordingly... >>> >>> Literally the only reason I didn't drop the CPU DATA parts of the OPAL >>> MPIPL series was because I assumed the kernel would do the sensible >>> thing and reject or ignore the structure if it did not know how to >>> parse the data. >> >> I think, the changes if any, would have to be backward compatible for the >> sake >> of sanity. > > People need to understand that this is an ABI between firmware and > in-the-field distribution kernels which are only updated at customer > discretion, or possibly never. > > Any changes *must be* backward compatible. > > Looking at the header struct: > > +struct hdat_fadump_thread_hdr { > + __be32 pir; > + /* 0x00 - 0x0F - The corresponding stop state of the core */ > + u8 core_state; > + u8 reserved[3]; > > You have those 3 reserved bytes, so a future revision could repurpose > one of those as a flag to indicate a new format. And/or the hdr could be > made bigger and new kernels could be taught to look for new things in > the space after the hdr but before the reg entries. > > So I think there is a reasonable mechanism for extending the format in > future, but my point is people must understand that this is an ABI and > changes must be made accordingly. True. The folks who make the changes to this format should be aware that breaking kernel ABI is not going to be pretty and I think they are :) > >> Even if they are not, we are better off exporting the /proc/vmcore >> with a warning and some crazy CPU register data (if parsing goes alright) >> than >> no dump at all? > > If it's just a case of reg entries that we don't recognise then yes I > think it would be OK to just skip them and continue exporting. But if > there's any deeper misunderstanding of the format then we should bail > out. Sure. Will try and fix that by first trying to do a sanity check on the fields that are needed for parsing the data and proceed with a warning if nothing weird is detected and fallback to just appending crashing cpu as done in patch 16/31, if anything weird is observed. That should hopefully take care of all cases in the best possible way.. > > I notice now that you don't do anything in opal_fadump_set_regval_regnum() > if you are passed a register we don't understand, so that probably needs > fixing. f/w provides about 100 odd registers in the CPU state data. Most of them pt_regs doesn't care about. So, opal_fadump_set_regval_regnum is happy as long as it find the registers listed in it. Unless, pt_regs changes, we can stick to this and ignore the rest of them? - Hari
[PATCH v3 1/2] powerpc/32: Split kexec low level code out of misc_32.S
Almost half of misc_32.S is dedicated to kexec. That's the relocation function for kexec. Drop it into a dedicated kexec_relocate_32.S Signed-off-by: Christophe Leroy --- v2: no change v3: renamed kexec_32.S to kexec_relocate_32.S --- arch/powerpc/kernel/Makefile| 1 + arch/powerpc/kernel/kexec_relocate_32.S | 500 arch/powerpc/kernel/misc_32.S | 491 --- 3 files changed, 501 insertions(+), 491 deletions(-) create mode 100644 arch/powerpc/kernel/kexec_relocate_32.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index c9cc4b689e60..f6c80f31502a 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o +obj-$(CONFIG_KEXEC_CORE) += kexec_relocate_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o diff --git a/arch/powerpc/kernel/kexec_relocate_32.S b/arch/powerpc/kernel/kexec_relocate_32.S new file mode 100644 index ..3f8ca6a566fb --- /dev/null +++ b/arch/powerpc/kernel/kexec_relocate_32.S @@ -0,0 +1,500 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * This file contains kexec low-level functions. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * PPC44x port. Copyright (C) 2011, IBM Corporation + * Author: Suzuki Poulose + */ + +#include +#include +#include +#include +#include + + .text + + /* +* Must be relocatable PIC code callable as a C function. +*/ + .globl relocate_new_kernel +relocate_new_kernel: + /* r3 = page_list */ + /* r4 = reboot_code_buffer */ + /* r5 = start_address */ + +#ifdef CONFIG_FSL_BOOKE + + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#define ENTRY_MAPPING_KEXEC_SETUP +#include "fsl_booke_entry_mapping.S" +#undef ENTRY_MAPPING_KEXEC_SETUP + + mr r3, r29 + mr r4, r30 + mr r5, r31 + + li r0, 0 +#elif defined(CONFIG_44x) + + /* Save our parameters */ + mr r29, r3 + mr r30, r4 + mr r31, r5 + +#ifdef CONFIG_PPC_47x + /* Check for 47x cores */ + mfspr r3,SPRN_PVR + srwir3,r3,16 + cmplwi cr0,r3,PVR_476FPE@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476@h + beq setup_map_47x + cmplwi cr0,r3,PVR_476_ISS@h + beq setup_map_47x +#endif /* CONFIG_PPC_47x */ + +/* + * Code for setting up 1:1 mapping for PPC440x for KEXEC + * + * We cannot switch off the MMU on PPC44x. + * So we: + * 1) Invalidate all the mappings except the one we are running from. + * 2) Create a tmp mapping for our code in the other address space(TS) and + *jump to it. Invalidate the entry we started in. + * 3) Create a 1:1 mapping for 0-2GiB in chunks of 256M in original TS. + * 4) Jump to the 1:1 mapping in original TS. + * 5) Invalidate the tmp mapping. + * + * - Based on the kexec support code for FSL BookE + * + */ + + /* +* Load the PID with kernel PID (0). +* Also load our MSR_IS and TID to MMUCR for TLB search. +*/ + li r3, 0 + mtspr SPRN_PID, r3 + mfmsr r4 + andi. r4,r4,MSR_IS@l + beq wmmucr + orisr3,r3,PPC44x_MMUCR_STS@h +wmmucr: + mtspr SPRN_MMUCR,r3 + sync + + /* +* Invalidate all the TLB entries except the current entry +* where we are running from +*/ + bl 0f /* Find our address */ +0: mflrr5 /* Make it accessible */ + tlbsx r23,0,r5/* Find entry we are in */ + li r4,0/* Start at TLB entry 0 */ + li r3,0/* Set PAGEID inval value */ +1: cmpwr23,r4 /* Is this our entry? */ + beq skip/* If so, skip the inval */ + tlbwe r3,r4,PPC44x_TLB_PAGEID /* If not, inval the entry */ +skip: + addir4,r4,1 /* Increment */ + cmpwi r4,64 /* Are we done? */ + bne 1b /* If not, repeat */ + isync + + /* Create a temp mapping and jump to it */ + andi. r6, r23, 1 /* Find the index to use */ + addir24, r6, 1 /* r24 will contain 1 or 2 */ + + mfmsr r9 /* get the MSR */ + rlwinm r5, r9, 27, 31, 31 /* Extract the MSR[IS] */ + xorir7, r5, 1 /* Use the other address space */ + +
[PATCH v3 2/2] powerpc/kexec: move kexec files into a dedicated subdir.
arch/powerpc/kernel/ contains 8 files dedicated to kexec. Move them into a dedicated subdirectory. Signed-off-by: Christophe Leroy --- v2: moved crash.c as well as it's part of kexec suite. v3: renamed files to remove 'kexec' keyword from names. --- arch/powerpc/kernel/Makefile | 19 +--- arch/powerpc/kernel/kexec/Makefile | 25 ++ arch/powerpc/kernel/{ => kexec}/crash.c| 0 .../kernel/{kexec_elf_64.c => kexec/elf_64.c} | 0 arch/powerpc/kernel/{ima_kexec.c => kexec/ima.c} | 0 .../kernel/{machine_kexec.c => kexec/machine.c}| 0 .../{machine_kexec_32.c => kexec/machine_32.c} | 0 .../{machine_kexec_64.c => kexec/machine_64.c} | 0 .../machine_file_64.c} | 0 .../{kexec_relocate_32.S => kexec/relocate_32.S} | 2 +- 10 files changed, 27 insertions(+), 19 deletions(-) create mode 100644 arch/powerpc/kernel/kexec/Makefile rename arch/powerpc/kernel/{ => kexec}/crash.c (100%) rename arch/powerpc/kernel/{kexec_elf_64.c => kexec/elf_64.c} (100%) rename arch/powerpc/kernel/{ima_kexec.c => kexec/ima.c} (100%) rename arch/powerpc/kernel/{machine_kexec.c => kexec/machine.c} (100%) rename arch/powerpc/kernel/{machine_kexec_32.c => kexec/machine_32.c} (100%) rename arch/powerpc/kernel/{machine_kexec_64.c => kexec/machine_64.c} (100%) rename arch/powerpc/kernel/{machine_kexec_file_64.c => kexec/machine_file_64.c} (100%) rename arch/powerpc/kernel/{kexec_relocate_32.S => kexec/relocate_32.S} (99%) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index f6c80f31502a..42e150e6e663 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -5,9 +5,6 @@ CFLAGS_ptrace.o+= -DUTS_MACHINE='"$(UTS_MACHINE)"' -# Disable clang warning for using setjmp without setjmp.h header -CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header) - ifdef CONFIG_PPC64 CFLAGS_prom_init.o += $(NO_MINIMAL_TOC) endif @@ -81,7 +78,6 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_FA_DUMP) += fadump.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o -obj-$(CONFIG_KEXEC_CORE) += kexec_relocate_32.o endif obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o @@ -125,14 +121,7 @@ pci64-$(CONFIG_PPC64) += pci_dn.o pci-hotplug.o isa-bridge.o obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \ pci-common.o pci_of_scan.o obj-$(CONFIG_PCI_MSI) += msi.o -obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o crash.o \ - machine_kexec_$(BITS).o -obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file_$(BITS).o kexec_elf_$(BITS).o -ifdef CONFIG_HAVE_IMA_KEXEC -ifdef CONFIG_IMA -obj-y += ima_kexec.o -endif -endif +obj-$(CONFIG_KEXEC_CORE) += kexec/ obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o @@ -164,12 +153,6 @@ endif GCOV_PROFILE_prom_init.o := n KCOV_INSTRUMENT_prom_init.o := n UBSAN_SANITIZE_prom_init.o := n -GCOV_PROFILE_machine_kexec_64.o := n -KCOV_INSTRUMENT_machine_kexec_64.o := n -UBSAN_SANITIZE_machine_kexec_64.o := n -GCOV_PROFILE_machine_kexec_32.o := n -KCOV_INSTRUMENT_machine_kexec_32.o := n -UBSAN_SANITIZE_machine_kexec_32.o := n GCOV_PROFILE_kprobes.o := n KCOV_INSTRUMENT_kprobes.o := n UBSAN_SANITIZE_kprobes.o := n diff --git a/arch/powerpc/kernel/kexec/Makefile b/arch/powerpc/kernel/kexec/Makefile new file mode 100644 index ..46e52ee95322 --- /dev/null +++ b/arch/powerpc/kernel/kexec/Makefile @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for the linux kernel. +# + +# Disable clang warning for using setjmp without setjmp.h header +CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header) + +obj-y += machine.o crash.o machine_$(BITS).o + +obj-$(CONFIG_PPC32)+= relocate_32.o + +obj-$(CONFIG_KEXEC_FILE) += machine_file_$(BITS).o elf_$(BITS).o + +ifdef CONFIG_HAVE_IMA_KEXEC +ifdef CONFIG_IMA +obj-y += ima.o +endif +endif + + +# Disable GCOV, KCOV & sanitizers in odd or sensitive code +GCOV_PROFILE_machine_$(BITS).o := n +KCOV_INSTRUMENT_machine_$(BITS).o := n +UBSAN_SANITIZE_machine_$(BITS).o := n diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/kexec/crash.c similarity index 100% rename from arch/powerpc/kernel/crash.c rename to arch/powerpc/kernel/kexec/crash.c diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec/elf_64.c similarity index 100% rename from arch/powerpc/kernel/kexec_elf_64.c rename to arch/powerpc/kernel/kexec/elf_64.c diff --git a/arch/powerpc/kernel/ima_kexec.c b/arch/powerpc/kernel/kexec/ima.c similarity index 100% rename from arch/powerpc/k
[PATCH v1] powerpc/pseries: CMM: Drop page array
We can simply store the pages in a list (page->lru), no need for a separate data structure (+ complicated handling). This is how most other balloon drivers store allocated pages without additional tracking data. For the notifiers, use page_to_pfn() to check if a page is in the applicable range. plpar_page_set_loaned()/plpar_page_set_active() were called with __pa(page_address()) for now, I assume we can simply switch to page_to_phys() here. The pfn_to_kaddr() handling is now mostly gone. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Arun KS Cc: Pavel Tatashin Cc: Thomas Gleixner Cc: Andrew Morton Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- Only compile-tested. I hope the page_to_phys() thingy is correct and I didn't mess up something else / ignoring something important why the array is needed. I stumbled over this while looking at how the memory isolation notifier is used - and wondered why the additional array is necessary. Also, I think by switching to the generic balloon compaction mechanism, we could get rid of the memory hotplug notifier and the memory isolation notifier in this code, as the migration capability of the inflated pages is the real requirement: commit 14b8a76b9d53346f2871bf419da2aaf219940c50 Author: Robert Jennings Date: Thu Dec 17 14:44:52 2009 + powerpc: Make the CMM memory hotplug aware The Collaborative Memory Manager (CMM) module allocates individual pages over time that are not migratable. On a long running system this can severely impact the ability to find enough pages to support a hotplug memory remove operation. [...] Thoughts? --- arch/powerpc/platforms/pseries/cmm.c | 155 ++- 1 file changed, 31 insertions(+), 124 deletions(-) diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c index b33251d75927..9cab34a667bf 100644 --- a/arch/powerpc/platforms/pseries/cmm.c +++ b/arch/powerpc/platforms/pseries/cmm.c @@ -75,21 +75,13 @@ module_param_named(debug, cmm_debug, uint, 0644); MODULE_PARM_DESC(debug, "Enable module debugging logging. Set to 1 to enable. " "[Default=" __stringify(CMM_DEBUG) "]"); -#define CMM_NR_PAGES ((PAGE_SIZE - sizeof(void *) - sizeof(unsigned long)) / sizeof(unsigned long)) - #define cmm_dbg(...) if (cmm_debug) { printk(KERN_INFO "cmm: "__VA_ARGS__); } -struct cmm_page_array { - struct cmm_page_array *next; - unsigned long index; - unsigned long page[CMM_NR_PAGES]; -}; - static unsigned long loaned_pages; static unsigned long loaned_pages_target; static unsigned long oom_freed_pages; -static struct cmm_page_array *cmm_page_list; +static LIST_HEAD(cmm_page_list); static DEFINE_SPINLOCK(cmm_lock); static DEFINE_MUTEX(hotplug_mutex); @@ -138,8 +130,7 @@ static long plpar_page_set_active(unsigned long vpa) **/ static long cmm_alloc_pages(long nr) { - struct cmm_page_array *pa, *npa; - unsigned long addr; + struct page *page; long rc; cmm_dbg("Begin request for %ld pages\n", nr); @@ -156,43 +147,20 @@ static long cmm_alloc_pages(long nr) break; } - addr = __get_free_page(GFP_NOIO | __GFP_NOWARN | - __GFP_NORETRY | __GFP_NOMEMALLOC); - if (!addr) + page = alloc_page(GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC); + if (!page) break; spin_lock(&cmm_lock); - pa = cmm_page_list; - if (!pa || pa->index >= CMM_NR_PAGES) { - /* Need a new page for the page list. */ - spin_unlock(&cmm_lock); - npa = (struct cmm_page_array *)__get_free_page( - GFP_NOIO | __GFP_NOWARN | - __GFP_NORETRY | __GFP_NOMEMALLOC); - if (!npa) { - pr_info("%s: Can not allocate new page list\n", __func__); - free_page(addr); - break; - } - spin_lock(&cmm_lock); - pa = cmm_page_list; - - if (!pa || pa->index >= CMM_NR_PAGES) { - npa->next = pa; - npa->index = 0; - pa = npa; - cmm_page_list = pa; - } else - free_page((unsigned long) npa); - } - - if ((rc = plpar_page_set_loaned(__pa(addr { + rc = plpar_page_set_loaned(page_to_phys(page)); + if (rc) {
[PATCH] KVM: PPC: Book3S HV: Tunable to configure maximum # of vCPUs per VM
Each vCPU of a VM allocates a XIVE VP in OPAL which is associated with 8 event queue (EQ) descriptors, one for each priority. A POWER9 socket can handle a maximum of 1M event queues. The powernv platform allocates NR_CPUS (== 2048) VPs for the hypervisor, and each XIVE KVM device allocates KVM_MAX_VCPUS (== 2048) VPs. This means that on a bi-socket system, we can create at most: (2 * 1M) / (8 * 2048) - 1 == 127 XIVE or XICS-on-XIVE KVM devices ie, start at most 127 VMs benefiting from an in-kernel interrupt controller. Subsequent VMs need to rely on much slower userspace emulated XIVE device in QEMU. This is problematic as one can legitimately expect to start the same number of mono-CPU VMs as the number of HW threads available on the system (eg, 144 on Witherspoon). I'm not aware of any userspace supporting more that 1024 vCPUs. It thus seem overkill to consume that many VPs per VM. Ideally we would even want userspace to be able to tell KVM about the maximum number of vCPUs when creating the VM. For now, provide a module parameter to configure the maximum number of vCPUs per VM. While here, reduce the default value to 1024 to match the current limit in QEMU. This number is only used by the XIVE KVM devices, but some more users of KVM_MAX_VCPUS could possibly be converted. With this change, I could successfully run 230 mono-CPU VMs on a Witherspoon system using the official skiboot-6.3. I could even run more VMs by using upstream skiboot containing this fix, that allows to better spread interrupts between sockets: e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") MAX VPCUS | MAX VMS --+- 1024 | 255 512 | 511 256 |1023 (*) (*) the system was barely usable because of the extreme load and memory exhaustion but the VMs did start. Signed-off-by: Greg Kurz --- arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kvm/book3s_hv.c | 32 arch/powerpc/kvm/book3s_xive.c|2 +- arch/powerpc/kvm/book3s_xive_native.c |2 +- 4 files changed, 35 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 6fb5fb4779e0..17582ce38788 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -335,6 +335,7 @@ struct kvm_arch { struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS]; /* This array can grow quite large, keep it at the end */ struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; + unsigned int max_vcpus; #endif }; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f8975c620f41..393d8a1ce9d8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -125,6 +125,36 @@ static bool nested = true; module_param(nested, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)"); +#define MIN(x, y) (((x) < (y)) ? (x) : (y)) + +static unsigned int max_vcpus = MIN(KVM_MAX_VCPUS, 1024); + +static int set_max_vcpus(const char *val, const struct kernel_param *kp) +{ + unsigned int new_max_vcpus; + int ret; + + ret = kstrtouint(val, 0, &new_max_vcpus); + if (ret) + return ret; + + if (new_max_vcpus > KVM_MAX_VCPUS) + return -EINVAL; + + max_vcpus = new_max_vcpus; + + return 0; +} + +static struct kernel_param_ops max_vcpus_ops = { + .set = set_max_vcpus, + .get = param_get_uint, +}; + +module_param_cb(max_vcpus, &max_vcpus_ops, &max_vcpus, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(max_vcpus, "Maximum number of vCPUS per VM (max = " +__stringify(KVM_MAX_VCPUS) ")"); + static inline bool nesting_enabled(struct kvm *kvm) { return kvm->arch.nested_enable && kvm_is_radix(kvm); @@ -4918,6 +4948,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm) if (radix_enabled()) kvmhv_radix_debugfs_init(kvm); + kvm->arch.max_vcpus = max_vcpus; + return 0; } diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index 2ef43d037a4f..0fea31b64564 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -2026,7 +2026,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type) xive->q_page_order = xive->q_order - PAGE_SHIFT; /* Allocate a bunch of VPs */ - xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS); + xive->vp_base = xive_native_alloc_vp_block(kvm->arch.max_vcpus); pr_devel("VP_Base=%x\n", xive->vp_base); if (xive->vp_base == XIVE_INVALID_VP) diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c index 84a354b90f60..20314010da56 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -1095,7 +1095,7 @@ static int kvmppc_xive_nativ
Re: [PATCH] powerpc/xive: Fix bogus error code returned by OPAL
On 10/09/2019 15:53, Greg Kurz wrote: > There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call > to return the 32-bit value 0x when OPAL has run out of IRQs. > Unfortunatelty, OPAL return values are signed 64-bit entities and > errors are supposed to be negative. If that happens, the linux code > confusingly treats 0x as a valid IRQ number and panics at some > point. > > A fix was recently merged in skiboot: > > e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") > > but we need a workaround anyway to support older skiboots already > on the field. > > Internally convert 0x to OPAL_RESOURCE which is the usual error > returned upon resource exhaustion. > > Signed-off-by: Greg Kurz Reviewed-by: Cédric Le Goater Thanks, C. > --- > arch/powerpc/sysdev/xive/native.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/native.c > b/arch/powerpc/sysdev/xive/native.c > index 37987c815913..c35583f84f9f 100644 > --- a/arch/powerpc/sysdev/xive/native.c > +++ b/arch/powerpc/sysdev/xive/native.c > @@ -231,6 +231,15 @@ static bool xive_native_match(struct device_node *node) > return of_device_is_compatible(node, "ibm,opal-xive-vc"); > } > > +static int64_t opal_xive_allocate_irq_fixup(uint32_t chip_id) > +{ > + s64 irq = opal_xive_allocate_irq(chip_id); > + > +#define XIVE_ALLOC_NO_SPACE 0x /* No possible space */ > + return > + irq == XIVE_ALLOC_NO_SPACE ? OPAL_RESOURCE : irq; > +} > + > #ifdef CONFIG_SMP > static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) > { > @@ -238,7 +247,7 @@ static int xive_native_get_ipi(unsigned int cpu, struct > xive_cpu *xc) > > /* Allocate an IPI and populate info about it */ > for (;;) { > - irq = opal_xive_allocate_irq(xc->chip_id); > + irq = opal_xive_allocate_irq_fixup(xc->chip_id); > if (irq == OPAL_BUSY) { > msleep(OPAL_BUSY_DELAY_MS); > continue; > @@ -259,7 +268,7 @@ u32 xive_native_alloc_irq(void) > s64 rc; > > for (;;) { > - rc = opal_xive_allocate_irq(OPAL_XIVE_ANY_CHIP); > + rc = opal_xive_allocate_irq_fixup(OPAL_XIVE_ANY_CHIP); > if (rc != OPAL_BUSY) > break; > msleep(OPAL_BUSY_DELAY_MS); >
Re: [PATCH] powerpc: Avoid clang warnings around setjmp and longjmp
Nathan Chancellor writes: > On Wed, Sep 04, 2019 at 08:01:35AM -0500, Segher Boessenkool wrote: >> On Wed, Sep 04, 2019 at 08:16:45AM +, David Laight wrote: >> > From: Nathan Chancellor [mailto:natechancel...@gmail.com] >> > > Fair enough so I guess we are back to just outright disabling the >> > > warning. >> > >> > Just disabling the warning won't stop the compiler generating code >> > that breaks a 'user' implementation of setjmp(). >> >> Yeah. I have a patch (will send in an hour or so) that enables the >> "returns_twice" attribute for setjmp (in ). In testing >> (with GCC trunk) it showed no difference in code generation, but >> better save than sorry. >> >> It also sets "noreturn" on longjmp, and that *does* help, it saves a >> hundred insns or so (all in xmon, no surprise there). >> >> I don't think this will make LLVM shut up about this though. And >> technically it is right: the C standard does say that in hosted mode >> setjmp is a reserved name and you need to include to access >> it (not ). > > It does not fix the warning, I tested your patch. > >> So why is the kernel compiled as hosted? Does adding -ffreestanding >> hurt anything? Is that actually supported on LLVM, on all relevant >> versions of it? Does it shut up the warning there (if not, that would >> be an LLVM bug)? > > It does fix this warning because -ffreestanding implies -fno-builtin, > which also solves the warning. LLVM has supported -ffreestanding since > at least 3.0.0. There are some parts of the kernel that are compiled > with this and it probably should be used in more places but it sounds > like there might be some good codegen improvements that are disabled > with it: > > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=lpufhvybmzsteo8q0...@mail.gmail.com/ For xmon.c and crash.c I think using -ffreestanding would be fine. They're both crash/debug code, so we don't care about minor optimisation differences. If anything we don't want the compiler being too clever when generating that code. cheers
Re: [PATCH] powerpc: Avoid clang warnings around setjmp and longjmp
On Wed, Sep 11, 2019 at 04:30:38AM +1000, Michael Ellerman wrote: > Nathan Chancellor writes: > > On Wed, Sep 04, 2019 at 08:01:35AM -0500, Segher Boessenkool wrote: > >> On Wed, Sep 04, 2019 at 08:16:45AM +, David Laight wrote: > >> > From: Nathan Chancellor [mailto:natechancel...@gmail.com] > >> > > Fair enough so I guess we are back to just outright disabling the > >> > > warning. > >> > > >> > Just disabling the warning won't stop the compiler generating code > >> > that breaks a 'user' implementation of setjmp(). > >> > >> Yeah. I have a patch (will send in an hour or so) that enables the > >> "returns_twice" attribute for setjmp (in ). In testing > >> (with GCC trunk) it showed no difference in code generation, but > >> better save than sorry. > >> > >> It also sets "noreturn" on longjmp, and that *does* help, it saves a > >> hundred insns or so (all in xmon, no surprise there). > >> > >> I don't think this will make LLVM shut up about this though. And > >> technically it is right: the C standard does say that in hosted mode > >> setjmp is a reserved name and you need to include to access > >> it (not ). > > > > It does not fix the warning, I tested your patch. > > > >> So why is the kernel compiled as hosted? Does adding -ffreestanding > >> hurt anything? Is that actually supported on LLVM, on all relevant > >> versions of it? Does it shut up the warning there (if not, that would > >> be an LLVM bug)? > > > > It does fix this warning because -ffreestanding implies -fno-builtin, > > which also solves the warning. LLVM has supported -ffreestanding since > > at least 3.0.0. There are some parts of the kernel that are compiled > > with this and it probably should be used in more places but it sounds > > like there might be some good codegen improvements that are disabled > > with it: > > > > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=lpufhvybmzsteo8q0...@mail.gmail.com/ > > For xmon.c and crash.c I think using -ffreestanding would be fine. > They're both crash/debug code, so we don't care about minor optimisation > differences. If anything we don't want the compiler being too clever > when generating that code. > > cheers I will send a v2 later today along with another patch to fix this warning and another build error. Cheers, Nathan
Re: [PATCH] KVM: PPC: Book3S HV: add smp_mb() in kvmppc_set_host_ipi()
Quoting Michael Roth (2019-09-05 18:21:22) > Quoting Michael Ellerman (2019-09-04 22:04:48) > > That raises the question of whether this needs to be a full barrier or > > just a write barrier, and where is the matching barrier on the reading > > side? > > For this particular case I think the same barrier orders it on the > read-side via kvmppc_set_host_ipi(42, 0) above, but I'm not sure that > work as a general solution, unless maybe we make that sort of usage > (clear-before-processing) part of the protocol of using > kvmppc_set_host_ipi()... it makes sense given we already need to take > care to not miss clearing them else we get issues like what was fixed > in 755563bc79c7, which introduced the clear in doorbell_exception(). So > then it's a matter of additionally making sure we do it prior to > processing host_ipi state. I haven't looked too closely at the other > users of kvmppc_set_host_ipi() yet though. > As far as using rw barriers, I can't think of any reason we couldn't, but > I wouldn't say I'm at all confident in declaring that safe atm... I think we need a full barrier after all. The following seems possible otherwise: CPU X: smp_mb() X: ipi_message[RESCHEDULE] = 1 X: kvmppc_set_host_ipi(42, 1) X: smp_mb() X: doorbell/msgsnd -> 42 42: doorbell_exception() (from CPU X) 42: msgsync 42: kvmppc_set_host_ipi(42, 0) // STORE DEFERRED DUE TO RE-ORDERING 42: smp_ipi_demux_relaxed() 105: smb_mb() 105: ipi_message[CALL_FUNCTION] = 1 105: smp_mb() 105: kvmppc_set_host_ipi(42, 1) 42: kvmppc_set_host_ipi(42, 0) // RE-ORDERED STORE COMPLETES 42: // returns to executing guest 105: doorbell/msgsnd -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue However that also means the current patch is insufficient, since the barrier for preventing this scenario needs to come *after* setting paca_ptrs[cpu]->kvm_hstate.host_ipi to 0. So I think the right interface is for this is to split kvmppc_set_host_ipi out into: static inline void kvmppc_set_host_ipi(int cpu) { smp_mb(); paca_ptrs[cpu]->kvm_hstate.host_ipi = 1; } static inline void kvmppc_clear_host_ipi(int cpu) { paca_ptrs[cpu]->kvm_hstate.host_ipi = 0; smp_mb(); }
[PATCH] powerpc/pseries: correctly track irq state in default idle
prep_irq_for_idle() is intended to be called before entering H_CEDE (and it is used by the pseries cpuidle driver). However the default pseries idle routine does not call it, leading to mismanaged lazy irq state when the cpuidle driver isn't in use. Manifestations of this include: * Dropped IPIs in the time immediately after a cpu comes online (before it has installed the cpuidle handler), making the online operation block indefinitely waiting for the new cpu to respond. * Hitting this WARN_ON in arch_local_irq_restore(): /* * We should already be hard disabled here. We had bugs * where that wasn't the case so let's dbl check it and * warn if we are wrong. Only do that when IRQ tracing * is enabled as mfmsr() can be costly. */ if (WARN_ON_ONCE(mfmsr() & MSR_EE)) __hard_irq_disable(); Call prep_irq_for_idle() from pseries_lpar_idle() and honor its result. Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/setup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index b955d54628ff..f8adcd0e4589 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -321,6 +321,9 @@ static void pseries_lpar_idle(void) * low power mode by ceding processor to hypervisor */ + if (!prep_irq_for_idle()) + return; + /* Indicate to hypervisor that we are idle. */ get_lppaca()->idle = 1; -- 2.20.1
Re: missing doorbell interrupt when onlining cpu
Nathan Lynch writes: > Nathan Lynch writes: > >> I'm hoping for some help investigating a behavior I see when doing cpu >> hotplug under load on P9 and P8 LPARs. Occasionally, while coming online >> a cpu will seem to get "stuck" in idle, with a pending doorbell >> interrupt unserviced (cpu 12 here): >> >> cpuhp/12-70[012] 46133.602202: cpuhp_enter: cpu: 0012 target: >> 205 step: 174 (0xc0028920s) >> load.sh-8201 [014] 46133.602248: sched_waking: comm=cpuhp/12 >> pid=70 prio=120 target_cpu=012 >> load.sh-8201 [014] 46133.602251: smp_send_reschedule: (c0052868) >> cpu=12 >> -0 [012] 46133.602252: do_idle: (c0162e08) >> load.sh-8201 [014] 46133.602252: smp_muxed_ipi_message_pass: >> (c00527e8) cpu=12 msg=1 >> load.sh-8201 [014] 46133.602253: doorbell_core_ipi:(c004d3e8) >> cpu=12 >> -0 [012] 46133.602257: arch_cpu_idle:(c0022d08) >> -0 [012] 46133.602259: pseries_lpar_idle:(c00d43c8) > > I should be more explicit that given my tracing configuration I would > expect to see doorbell events etc here e.g. > > -0 [012] 46133.602086: doorbell_entry: > pt_regs=0xc00200e7fb50 > -0 [012] 46133.602087: smp_ipi_demux_relaxed: > (c00530f8) > -0 [012] 46133.602088: scheduler_ipi: > (c015e4f8) > -0 [012] 46133.602091: sched_wakeup: cpuhp/12:70 > [120] success=1 CPU:012 > -0 [012] 46133.602092: sched_wakeup: > migration/12:71 [0] success=1 CPU:012 > -0 [012] 46133.602093: doorbell_exit: > pt_regs=0xc00200e7fb50 > > but instead cpu 12 goes to idle. Another clue is that I've occasionaly provoked this warning: WARNING: CPU: 7 PID: 9045 at arch/powerpc/kernel/irq.c:282 arch_local_irq_restore+0xdc/0x150 Modules linked in: CPU: 7 PID: 9045 Comm: offliner Not tainted 5.3.0-rc2-00190-g9b123d1ea237-dirty #45 NIP: c001d91c LR: c1988210 CTR: 00334ee8 REGS: ce19f390 TRAP: 0700 Not tainted (5.3.0-rc2-00190-g9b123d1ea237-dirty) MSR: 80010282b033 CR: 4424 XER: 2004 CFAR: c001d884 IRQMASK: 0 GPR00: c1988210 ce19f620 c32f6200 GPR04: ce589f10 0006 ce19f664 c395f260 GPR08: 003b 8000 0009 GPR12: 0001 c0001eca7780 005c 0100106c7de0 GPR16: 100c0a48 0001 GPR20: 100c5748 0001fc71 0078 c3345c78 GPR24: c003ffd99a00 c3349de0 c003fb086c10 GPR28: 000f c003fb086c10 NIP [c001d91c] arch_local_irq_restore+0xdc/0x150 LR [c1988210] _raw_spin_unlock_irqrestore+0xa0/0xd0 Call Trace: [ce19f6a0] [c1988210] _raw_spin_unlock_irqrestore+0xa0/0xd0 [ce19f6d0] [c01be920] try_to_wake_up+0x330/0xf30 [ce19f7a0] [c01bf5b0] wake_up_q+0x70/0xc0 [ce19f7e0] [c02b5a08] cpu_stop_queue_work+0xc8/0x140 [ce19f850] [c02b5bac] queue_stop_cpus_work+0xdc/0x160 [ce19f8b0] [c02b5c98] __stop_cpus+0x68/0xc0 [ce19f950] [c02b65ec] stop_cpus+0x5c/0x90 [ce19f9a0] [c02b6924] stop_machine_cpuslocked+0x194/0x1f0 [ce19fa10] [c016c768] takedown_cpu+0x98/0x260 [ce19fad0] [c016cea4] cpuhp_invoke_callback+0x114/0xf40 [ce19fb60] [c017194c] _cpu_down+0x19c/0x320 [ce19fbd0] [c016ff58] do_cpu_down+0x68/0xb0 [ce19fc10] [c0d4] cpu_subsys_offline+0x24/0x40 [ce19fc30] [c0cc2860] device_offline+0x100/0x140 [ce19fc70] [c0cc2a00] online_store+0x70/0xf0 [ce19fcb0] [c0cbcee8] dev_attr_store+0x38/0x60 [ce19fcd0] [c059c970] sysfs_kf_write+0x70/0xb0 [ce19fd10] [c059afa8] kernfs_fop_write+0xf8/0x280 [ce19fd60] [c04b436c] __vfs_write+0x3c/0x70 [ce19fd80] [c04b8700] vfs_write+0xd0/0x220 [ce19fdd0] [c04b8abc] ksys_write+0x7c/0x140 [ce19fe20] [c000bbd8] system_call+0x5c/0x68 i.e. in arch_local_irq_restore(): /* * We should already be hard disabled here. We had bugs * where that wasn't the case so let's dbl check it and * warn if we are wrong. Only do that when IRQ tracing * is enabled as mfmsr() can be costly. */ if (WARN_ON_ONCE(mfmsr() & MSR_EE)) __hard_irq_disable(); Anyway, I've proposed a fix: https://patchwork.ozlabs.org/patch/1160572/
[PATCH] powerpc/ptrace: Do not return ENOSYS if invalid syscall
If a tracer sets the syscall number to an invalid one, allow the return value set by the tracer to be returned the tracee. The test for NR_syscalls is already at entry_64.S, and it's at do_syscall_trace_enter only to skip audit and trace. After this, seccomp_bpf selftests complete just fine, as the failing test was using ptrace to change the syscall to return an error or a fake value, but were failing as it was always returning -ENOSYS. Signed-off-by: Thadeu Lima de Souza Cascardo --- arch/powerpc/kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 8c92febf5f44..87315335f66a 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -3316,7 +3316,7 @@ long do_syscall_trace_enter(struct pt_regs *regs) /* Avoid trace and audit when syscall is invalid. */ if (regs->gpr[0] >= NR_syscalls) - goto skip; + return regs->gpr[0]; if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) trace_sys_enter(regs, regs->gpr[0]); -- 2.20.1
[PATCH 1/2] ASoC: fsl_mqs: add DT binding documentation
Add the DT binding documentation for NXP MQS driver Signed-off-by: Shengjiu Wang --- .../devicetree/bindings/sound/fsl,mqs.txt | 20 +++ 1 file changed, 20 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/fsl,mqs.txt diff --git a/Documentation/devicetree/bindings/sound/fsl,mqs.txt b/Documentation/devicetree/bindings/sound/fsl,mqs.txt new file mode 100644 index ..a1dbe181204a --- /dev/null +++ b/Documentation/devicetree/bindings/sound/fsl,mqs.txt @@ -0,0 +1,20 @@ +fsl,mqs audio CODEC + +Required properties: + + - compatible : Must contain one of "fsl,imx6sx-mqs", "fsl,codec-mqs" + "fsl,imx8qm-mqs", "fsl,imx8qxp-mqs". + - clocks : A list of phandles + clock-specifiers, one for each entry in +clock-names + - clock-names : Must contain "mclk" + - gpr : The gpr node. + +Example: + +mqs: mqs { + compatible = "fsl,imx6sx-mqs"; + gpr = <&gpr>; + clocks = <&clks IMX6SX_CLK_SAI1>; + clock-names = "mclk"; + status = "disabled"; +}; -- 2.21.0
[PATCH 2/2] ASoC: fsl_mqs: Add MQS component driver
MQS (medium quality sound), is used to generate medium quality audio via a standard digital output pin. It can be used to connect stereo speakers or headphones simply via power amplifier stages without an additional DAC chip. It only accepts 2-channel, LSB-valid 16bit, MSB shift-out first, frame sync asserting with the first bit of the frame, data shifted with the posedge of bit clock, 44.1 kHz or 48 kHz signals from SAI1 in left justified format; and it provides the SNR target as no more than 20dB for the signals below 10 kHz. The signals above 10 kHz will have worse THD+N values. MQS provides only simple audio reproduction. No internal pop, click or distortion artifact reduction methods are provided. The MQS receives the audio data from the SAI1 Tx section. Signed-off-by: Shengjiu Wang --- sound/soc/fsl/Kconfig | 10 ++ sound/soc/fsl/Makefile | 2 + sound/soc/fsl/fsl_mqs.c | 336 3 files changed, 348 insertions(+) create mode 100644 sound/soc/fsl/fsl_mqs.c diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index aa99c008a925..65e8cd4be930 100644 --- a/sound/soc/fsl/Kconfig +++ b/sound/soc/fsl/Kconfig @@ -25,6 +25,16 @@ config SND_SOC_FSL_SAI This option is only useful for out-of-tree drivers since in-tree drivers select it automatically. +config SND_SOC_FSL_MQS + tristate "Medium Quality Sound (MQS) module support" + depends on SND_SOC_FSL_SAI + select REGMAP_MMIO + help + Say Y if you want to add Medium Quality Sound (MQS) + support for the Freescale CPUs. + This option is only useful for out-of-tree drivers since + in-tree drivers select it automatically. + config SND_SOC_FSL_AUDMIX tristate "Audio Mixer (AUDMIX) module support" select REGMAP_MMIO diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile index c0dd04422fe9..8cde88c72d93 100644 --- a/sound/soc/fsl/Makefile +++ b/sound/soc/fsl/Makefile @@ -23,6 +23,7 @@ snd-soc-fsl-esai-objs := fsl_esai.o snd-soc-fsl-micfil-objs := fsl_micfil.o snd-soc-fsl-utils-objs := fsl_utils.o snd-soc-fsl-dma-objs := fsl_dma.o +snd-soc-fsl-mqs-objs := fsl_mqs.o obj-$(CONFIG_SND_SOC_FSL_AUDMIX) += snd-soc-fsl-audmix.o obj-$(CONFIG_SND_SOC_FSL_ASOC_CARD) += snd-soc-fsl-asoc-card.o @@ -33,6 +34,7 @@ obj-$(CONFIG_SND_SOC_FSL_SPDIF) += snd-soc-fsl-spdif.o obj-$(CONFIG_SND_SOC_FSL_ESAI) += snd-soc-fsl-esai.o obj-$(CONFIG_SND_SOC_FSL_MICFIL) += snd-soc-fsl-micfil.o obj-$(CONFIG_SND_SOC_FSL_UTILS) += snd-soc-fsl-utils.o +obj-$(CONFIG_SND_SOC_FSL_MQS) += snd-soc-fsl-mqs.o obj-$(CONFIG_SND_SOC_POWERPC_DMA) += snd-soc-fsl-dma.o # MPC5200 Platform Support diff --git a/sound/soc/fsl/fsl_mqs.c b/sound/soc/fsl/fsl_mqs.c new file mode 100644 index ..d164f5da3460 --- /dev/null +++ b/sound/soc/fsl/fsl_mqs.c @@ -0,0 +1,336 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ALSA SoC IMX MQS driver + * + * Copyright (C) 2014-2019 Freescale Semiconductor, Inc. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define REG_MQS_CTRL 0x00 + +#define MQS_EN_MASK(0x1 << 28) +#define MQS_EN_SHIFT (28) +#define MQS_SW_RST_MASK(0x1 << 24) +#define MQS_SW_RST_SHIFT (24) +#define MQS_OVERSAMPLE_MASK(0x1 << 20) +#define MQS_OVERSAMPLE_SHIFT (20) +#define MQS_CLK_DIV_MASK (0xFF << 0) +#define MQS_CLK_DIV_SHIFT (0) + +/* codec private data */ +struct fsl_mqs { + struct regmap *regmap; + struct clk *mclk; + struct clk *ipg; + + unsigned int reg_iomuxc_gpr2; + unsigned int reg_mqs_ctrl; + bool use_gpr; +}; + +#define FSL_MQS_RATES (SNDRV_PCM_RATE_44100 | SNDRV_PCM_RATE_48000) +#define FSL_MQS_FORMATSSNDRV_PCM_FMTBIT_S16_LE + +static int fsl_mqs_hw_params(struct snd_pcm_substream *substream, +struct snd_pcm_hw_params *params, +struct snd_soc_dai *dai) +{ + struct snd_soc_component *component = dai->component; + struct fsl_mqs *mqs_priv = snd_soc_component_get_drvdata(component); + unsigned long mclk_rate; + int div, res; + int bclk, lrclk; + + mclk_rate = clk_get_rate(mqs_priv->mclk); + bclk = snd_soc_params_to_bclk(params); + lrclk = params_rate(params); + + /* +* mclk_rate / (oversample(32,64) * FS * 2 * divider ) = repeat_rate; +* if repeat_rate is 8, mqs can achieve better quality. +* oversample rate is fix to 32 currently. +*/ + div = mclk_rate / (32 * lrclk * 2 * 8); + res = mclk_rate % (32 * lrclk * 2 * 8); + + if (res == 0 && div > 0 && div <= 256) { + if (mqs_priv->use_gpr) { + regmap_update_bits(mqs_priv->regmap, IOMUXC_GPR2, +
Re: [PATCH] KVM: PPC: Book3S HV: Tunable to configure maximum # of vCPUs per VM
On Tue, Sep 10, 2019 at 06:49:34PM +0200, Greg Kurz wrote: > Each vCPU of a VM allocates a XIVE VP in OPAL which is associated with > 8 event queue (EQ) descriptors, one for each priority. A POWER9 socket > can handle a maximum of 1M event queues. > > The powernv platform allocates NR_CPUS (== 2048) VPs for the hypervisor, > and each XIVE KVM device allocates KVM_MAX_VCPUS (== 2048) VPs. This means > that on a bi-socket system, we can create at most: > > (2 * 1M) / (8 * 2048) - 1 == 127 XIVE or XICS-on-XIVE KVM devices > > ie, start at most 127 VMs benefiting from an in-kernel interrupt controller. > Subsequent VMs need to rely on much slower userspace emulated XIVE device in > QEMU. > > This is problematic as one can legitimately expect to start the same > number of mono-CPU VMs as the number of HW threads available on the > system (eg, 144 on Witherspoon). > > I'm not aware of any userspace supporting more that 1024 vCPUs. It thus > seem overkill to consume that many VPs per VM. Ideally we would even > want userspace to be able to tell KVM about the maximum number of vCPUs > when creating the VM. > > For now, provide a module parameter to configure the maximum number of > vCPUs per VM. While here, reduce the default value to 1024 to match the > current limit in QEMU. This number is only used by the XIVE KVM devices, > but some more users of KVM_MAX_VCPUS could possibly be converted. > > With this change, I could successfully run 230 mono-CPU VMs on a > Witherspoon system using the official skiboot-6.3. > > I could even run more VMs by using upstream skiboot containing this > fix, that allows to better spread interrupts between sockets: > > e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") > > MAX VPCUS | MAX VMS > --+- > 1024 | 255 > 512 | 511 > 256 |1023 (*) > > (*) the system was barely usable because of the extreme load and > memory exhaustion but the VMs did start. Hrm. I don't love the idea of using a global tunable for this, although I guess it could have some use. It's another global system property that admins have to worry about. A better approach would seem to be a way for userspace to be able to hint the maximum number of cpus for a specific VM to the kernel. > > Signed-off-by: Greg Kurz > --- > arch/powerpc/include/asm/kvm_host.h |1 + > arch/powerpc/kvm/book3s_hv.c | 32 > arch/powerpc/kvm/book3s_xive.c|2 +- > arch/powerpc/kvm/book3s_xive_native.c |2 +- > 4 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 6fb5fb4779e0..17582ce38788 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -335,6 +335,7 @@ struct kvm_arch { > struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS]; > /* This array can grow quite large, keep it at the end */ > struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; > + unsigned int max_vcpus; > #endif > }; > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index f8975c620f41..393d8a1ce9d8 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -125,6 +125,36 @@ static bool nested = true; > module_param(nested, bool, S_IRUGO | S_IWUSR); > MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)"); > > +#define MIN(x, y) (((x) < (y)) ? (x) : (y)) > + > +static unsigned int max_vcpus = MIN(KVM_MAX_VCPUS, 1024); > + > +static int set_max_vcpus(const char *val, const struct kernel_param *kp) > +{ > + unsigned int new_max_vcpus; > + int ret; > + > + ret = kstrtouint(val, 0, &new_max_vcpus); > + if (ret) > + return ret; > + > + if (new_max_vcpus > KVM_MAX_VCPUS) > + return -EINVAL; > + > + max_vcpus = new_max_vcpus; > + > + return 0; > +} > + > +static struct kernel_param_ops max_vcpus_ops = { > + .set = set_max_vcpus, > + .get = param_get_uint, > +}; > + > +module_param_cb(max_vcpus, &max_vcpus_ops, &max_vcpus, S_IRUGO | S_IWUSR); > +MODULE_PARM_DESC(max_vcpus, "Maximum number of vCPUS per VM (max = " > + __stringify(KVM_MAX_VCPUS) ")"); > + > static inline bool nesting_enabled(struct kvm *kvm) > { > return kvm->arch.nested_enable && kvm_is_radix(kvm); > @@ -4918,6 +4948,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm) > if (radix_enabled()) > kvmhv_radix_debugfs_init(kvm); > > + kvm->arch.max_vcpus = max_vcpus; > + > return 0; > } > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > index 2ef43d037a4f..0fea31b64564 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -2026,7 +2026,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, > u32 type) > xive->q_page_order = xi
Re: [PATCH v7 0/5] kasan: support backing vmalloc space with real shadow memory
Hi Daniel, Are any other patches required prior to this series ? I have tried to apply it on later powerpc/merge branch without success: [root@localhost linux-powerpc]# git am /root/Downloads/kasan-support-backing-vmalloc-space-with-real-shadow-memory\(1\).patch Applying: kasan: support backing vmalloc space with real shadow memory .git/rebase-apply/patch:389: trailing whitespace. * (1) (2) (3) error: patch failed: lib/Kconfig.kasan:142 error: lib/Kconfig.kasan: patch does not apply Patch failed at 0001 kasan: support backing vmalloc space with real shadow memory The copy of the patch that failed is found in: .git/rebase-apply/patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". [root@localhost linux-powerpc]# git am -3 /root/Downloads/kasan-support-backing-vmalloc-space-with-real-shadow-memory\(1\).patch Applying: kasan: support backing vmalloc space with real shadow memory error: sha1 information is lacking or useless (include/linux/vmalloc.h). error: could not build fake ancestor Patch failed at 0001 kasan: support backing vmalloc space with real shadow memory The copy of the patch that failed is found in: .git/rebase-apply/patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Christophe On 09/03/2019 02:55 PM, Daniel Axtens wrote: Currently, vmalloc space is backed by the early shadow page. This means that kasan is incompatible with VMAP_STACK. This series provides a mechanism to back vmalloc space with real, dynamically allocated memory. I have only wired up x86, because that's the only currently supported arch I can work with easily, but it's very easy to wire up other architectures, and it appears that there is some work-in-progress code to do this on arm64 and s390. This has been discussed before in the context of VMAP_STACK: - https://bugzilla.kernel.org/show_bug.cgi?id=202009 - https://lkml.org/lkml/2018/7/22/198 - https://lkml.org/lkml/2019/7/19/822 In terms of implementation details: Most mappings in vmalloc space are small, requiring less than a full page of shadow space. Allocating a full shadow page per mapping would therefore be wasteful. Furthermore, to ensure that different mappings use different shadow pages, mappings would have to be aligned to KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE. Instead, share backing space across multiple mappings. Allocate a backing page when a mapping in vmalloc space uses a particular page of the shadow region. This page can be shared by other vmalloc mappings later on. We hook in to the vmap infrastructure to lazily clean up unused shadow memory. v1: https://lore.kernel.org/linux-mm/20190725055503.19507-1-...@axtens.net/ v2: https://lore.kernel.org/linux-mm/20190729142108.23343-1-...@axtens.net/ Address review comments: - Patch 1: use kasan_unpoison_shadow's built-in handling of ranges that do not align to a full shadow byte - Patch 3: prepopulate pgds rather than faulting things in v3: https://lore.kernel.org/linux-mm/20190731071550.31814-1-...@axtens.net/ Address comments from Mark Rutland: - kasan_populate_vmalloc is a better name - handle concurrency correctly - various nits and cleanups - relax module alignment in KASAN_VMALLOC case v4: https://lore.kernel.org/linux-mm/20190815001636.12235-1-...@axtens.net/ Changes to patch 1 only: - Integrate Mark's rework, thanks Mark! - handle the case where kasan_populate_shadow might fail - poision shadow on free, allowing the alloc path to just unpoision memory that it uses v5: https://lore.kernel.org/linux-mm/20190830003821.10737-1-...@axtens.net/ Address comments from Christophe Leroy: - Fix some issues with my descriptions in commit messages and docs - Dynamically free unused shadow pages by hooking into the vmap book-keeping - Split out the test into a separate patch - Optional patch to track the number of pages allocated - minor checkpatch cleanups v6: https://lore.kernel.org/linux-mm/20190902112028.23773-1-...@axtens.net/ Properly guard freeing pages in patch 1, drop debugging code. v7: Add a TLB flush on freeing, thanks Mark Rutland. Explain more clearly how I think freeing is concurrency-safe. Daniel Axtens (5): kasan: support backing vmalloc space with real shadow memory kasan: add test for vmalloc fork: support VMAP_STACK with KASAN_VMALLOC x86/kasan: support KASAN_VMALLOC kasan debug: track pages allocated for vmalloc shadow Documentation/dev-tools/kasan.rst | 63 arch/Kconfig | 9 +- arch/x86/Kconfig | 1 + arch/x86/mm/kasan_init_64.c | 60 include/linux/kasan.h | 31 include/linux
Re: [PATCH 1/2] ASoC: fsl_mqs: add DT binding documentation
Hi Shengjiu, Your mail is dated in the future, its time is 16:42 (GMT+2) whereas it is still the morning. Please fix your clock or timezone for future mails. Thanks Christophe Le 11/09/2019 à 16:42, Shengjiu Wang a écrit : Add the DT binding documentation for NXP MQS driver Signed-off-by: Shengjiu Wang --- .../devicetree/bindings/sound/fsl,mqs.txt | 20 +++ 1 file changed, 20 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/fsl,mqs.txt diff --git a/Documentation/devicetree/bindings/sound/fsl,mqs.txt b/Documentation/devicetree/bindings/sound/fsl,mqs.txt new file mode 100644 index ..a1dbe181204a --- /dev/null +++ b/Documentation/devicetree/bindings/sound/fsl,mqs.txt @@ -0,0 +1,20 @@ +fsl,mqs audio CODEC + +Required properties: + + - compatible : Must contain one of "fsl,imx6sx-mqs", "fsl,codec-mqs" + "fsl,imx8qm-mqs", "fsl,imx8qxp-mqs". + - clocks : A list of phandles + clock-specifiers, one for each entry in +clock-names + - clock-names : Must contain "mclk" + - gpr : The gpr node. + +Example: + +mqs: mqs { + compatible = "fsl,imx6sx-mqs"; + gpr = <&gpr>; + clocks = <&clks IMX6SX_CLK_SAI1>; + clock-names = "mclk"; + status = "disabled"; +};