[PATCH v2] powerpc/powernv: Use darn instr for random_seed on p9
Currently ppc_md.get_random_seed uses the powernv_get_random_long function. A guest calling this function would have to go through the hypervisor. The 'darn' instruction, introduced in POWER9, allows us to bypass this by directly obtaining a value from the mmio region. This patch adds a function for ppc_md.get_random_seed on p9, utilising the darn instruction. Signed-off-by: Matt Brown --- v2: - remove repeat darn attempts - move hook to rng_init --- arch/powerpc/include/asm/ppc-opcode.h | 4 arch/powerpc/platforms/powernv/rng.c | 22 ++ 2 files changed, 26 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index c4ced1d..d5f7082 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -134,6 +134,7 @@ #define PPC_INST_COPY 0x7c00060c #define PPC_INST_COPY_FIRST0x7c20060c #define PPC_INST_CP_ABORT 0x7c00068c +#define PPC_INST_DARN 0x7c0005e6 #define PPC_INST_DCBA 0x7c0005ec #define PPC_INST_DCBA_MASK 0xfc0007fe #define PPC_INST_DCBAL 0x7c2005ec @@ -325,6 +326,9 @@ /* Deal with instructions that older assemblers aren't aware of */ #definePPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT) +#define PPC_DARN(t, l) stringify_in_c(.long PPC_INST_DARN | \ + ___PPC_RT(t) | \ + ___PPC_RA(l)) #definePPC_DCBAL(a, b) stringify_in_c(.long PPC_INST_DCBAL | \ __PPC_RA(a) | __PPC_RB(b)) #definePPC_DCBZL(a, b) stringify_in_c(.long PPC_INST_DCBZL | \ diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c index 5dcbdea..ab6f411 100644 --- a/arch/powerpc/platforms/powernv/rng.c +++ b/arch/powerpc/platforms/powernv/rng.c @@ -8,6 +8,7 @@ */ #define pr_fmt(fmt)"powernv-rng: " fmt +#define DARN_ERR 0xul #include #include @@ -16,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -67,6 +69,21 @@ int powernv_get_random_real_mode(unsigned long *v) return 1; } +int powernv_get_random_darn(unsigned long *v) +{ + unsigned long val; + + /* Using DARN with L=1 - conditioned random number */ + asm (PPC_DARN(%0, 1)"\n" : "=r"(val) :); + + if (val == DARN_ERR) + return 0; + + *v = val; + + return 1; +} + int powernv_get_random_long(unsigned long *v) { struct powernv_rng *rng; @@ -136,6 +153,7 @@ static __init int rng_create(struct device_node *dn) static __init int rng_init(void) { struct device_node *dn; + unsigned long drn_test; int rc; for_each_compatible_node(dn, NULL, "ibm,power-rng") { @@ -150,6 +168,10 @@ static __init int rng_init(void) of_platform_device_create(dn, NULL, NULL); } + if (cpu_has_feature(CPU_FTR_ARCH_300) && + powernv_get_random_darn(&drn_test)) + ppc_md.get_random_seed = powernv_get_random_darn; + return 0; } machine_subsys_initcall(powernv, rng_init); -- 2.9.3
Re: [PATCH] powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
Nicholas Piggin writes: > There are two cases outside the normal address space management > where a CPU's local TLB is to be flushed: > > 1. Host boot; in case something has left stale entries in the > TLB (e.g., kexec). > > 2. Machine check; to clean corrupted TLB entries. > > CPU state restore from deep idle states also flushes the TLB. > However this seems to be a side effect of reusing the boot code to set > CPU state, rather than a requirement itself. > > The current flushing has a number of problems with ISA v3.0B: > > - The current radix mode of the MMU is not taken into account. tlbiel > is undefined if the R field does not match the current radix mode. > > - ISA v3.0B hash must flush the partition and process table caches. > > - ISA v3.0B radix must flush partition and process scoped translations, > partition and process table caches, and also the page walk cache. > > Add POWER9 cases to handle these, with radix vs hash determined by the > host MMU mode. > > Signed-off-by: Nicholas Piggin Reviewed-by: Aneesh Kumar K.V > --- > > This is a relatively minimal version which does not churn code too > much or remove flushing from the CPU deep state idle restore path. > Should be suitable for stable after some upstream testing. > > Thanks, > Nick > > arch/powerpc/kernel/cpu_setup_power.S | 13 ++-- > arch/powerpc/kernel/dt_cpu_ftrs.c | 13 ++-- > arch/powerpc/kernel/mce_power.c | 56 > ++- > 3 files changed, 67 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/kernel/cpu_setup_power.S > b/arch/powerpc/kernel/cpu_setup_power.S > index 10cb2896b2ae..610955fe8b81 100644 > --- a/arch/powerpc/kernel/cpu_setup_power.S > +++ b/arch/powerpc/kernel/cpu_setup_power.S > @@ -218,13 +218,20 @@ __init_tlb_power8: > ptesync > 1: blr > > +/* > + * Flush the TLB in hash mode. Hash must flush with RIC=2 once for process > + * and one for partition scope to clear process and partition table entries. > + */ > __init_tlb_power9: > - li r6,POWER9_TLB_SETS_HASH > + li r6,POWER9_TLB_SETS_HASH - 1 > mtctr r6 > li r7,0xc00/* IS field = 0b11 */ > + li r8,0 > ptesync > -2: tlbiel r7 > - addir7,r7,0x1000 > + PPC_TLBIEL(7, 8, 2, 1, 0) > + PPC_TLBIEL(7, 8, 2, 0, 0) > +2: addir7,r7,0x1000 > + PPC_TLBIEL(7, 8, 0, 0, 0) > bdnz2b > ptesync > 1: blr > diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c > b/arch/powerpc/kernel/dt_cpu_ftrs.c > index 4c7656dc4e04..b0da3718437d 100644 > --- a/arch/powerpc/kernel/dt_cpu_ftrs.c > +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c > @@ -105,24 +105,15 @@ static void cpufeatures_flush_tlb(void) > case PVR_POWER8: > case PVR_POWER8E: > case PVR_POWER8NVL: > - num_sets = POWER8_TLB_SETS; > + __flush_tlb_power8(POWER8_TLB_SETS); > break; > case PVR_POWER9: > - num_sets = POWER9_TLB_SETS_HASH; > + __flush_tlb_power9(POWER9_TLB_SETS_HASH); > break; > default: > - num_sets = 1; > pr_err("unknown CPU version for boot TLB flush\n"); > break; > } > - > - asm volatile("ptesync" : : : "memory"); > - rb = TLBIEL_INVAL_SET; > - for (i = 0; i < num_sets; i++) { > - asm volatile("tlbiel %0" : : "r" (rb)); > - rb += 1 << TLBIEL_INVAL_SET_SHIFT; > - } > - asm volatile("ptesync" : : : "memory"); > } > > static void __restore_cpu_cpufeatures(void) > diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c > index d24e689e893f..b76ca198e09c 100644 > --- a/arch/powerpc/kernel/mce_power.c > +++ b/arch/powerpc/kernel/mce_power.c > @@ -53,6 +53,60 @@ static void flush_tlb_206(unsigned int num_sets, unsigned > int action) > asm volatile("ptesync" : : : "memory"); > } > > +static void flush_tlb_300(unsigned int num_sets, unsigned int action) > +{ > + unsigned long rb; > + unsigned int i; > + unsigned int r; > + > + switch (action) { > + case TLB_INVAL_SCOPE_GLOBAL: > + rb = TLBIEL_INVAL_SET; > + break; > + case TLB_INVAL_SCOPE_LPID: > + rb = TLBIEL_INVAL_SET_LPID; > + break; > + default: > + BUG(); > + break; > + } > + > + asm volatile("ptesync" : : : "memory"); > + > + if (early_radix_enabled()) > + r = 1; > + else > + r = 0; > + > + /* > + * First flush table/PWC caches with set 0, then flush the > + * rest of the sets, partition scope. Radix must then do it > + * all again with process scope. Hash just has to flush > + * process table. > + */ > + asm volatile(PPC_TLBIEL(%0, %1, %2, %3, %4) : : > + "r"(rb), "r"(0), "i"(2), "i"(0), "r"(r)); > + for (i = 1; i < num_sets; i++) { > + unsigned long set = i * (1< +
Re: [RFC v5 01/38] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages
On Wed, 2017-07-05 at 14:21 -0700, Ram Pai wrote: > Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6, > in the 4K backed HPTE pages.These bits continue to be used > for 64K backed HPTE pages in this patch, but will be freed > up in the next patch. The bit numbers are big-endian as > defined in the ISA3.0 > > The patch does the following change to the 4k htpe backed > 64K PTE's format. > The diagrams make the patch much easier to understand, thanks! > NOTE:even though bits 3, 4, 5, 6, 7 are not used when > the 64K PTE is backed by 4k HPTE, they continue to be > used if the PTE gets backed by 64k HPTE. The next > patch will decouple that aswell, and truely release the > bits. > Balbir Singh.
Re: [PATCH v6 0/7] perf report: Show branch type
On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote: > v6: >Update according to the review comments from >Jiri Olsa . Major modifications are: > >1. Move that multiline conditional code inside {} brackets. > >2. Move branch_type_stat_display() from builtin-report.c to > branch.c. Move branch_type_str() from callchain.c to > branch.c. > >3. Keep the original branch info display order, that is: > predicted, abort, cycles, iterations Peter, are you ok with the kernel side of this? thanks, jirka > > v5: > --- >Mainly the v5 patch series are updated according to >comments from Jiri Olsa . > >The kernel part doesn't have functional change. It just >solve the merge issue. > >In userspace, the functions of branch type counting and >branch type name resolving are moved to the new files: >util/branch.c, util/branch.h. > >And refactor the branch info printing code for better >maintenance. > > Not changed (or just fix merge issue): > perf/core: Define the common branch type classification > perf/x86/intel: Record branch type > perf record: Create a new option save_type in --branch-filter > > New patches: > perf report: Refactor the branch info printing code > perf util: Create branch.c/.h for common branch functions > > Changed: > perf report: Show branch type statistics for stdio mode > perf report: Show branch type in callchain entry > > v4: > --- > 1. Describe the major changes in patch description. >Thanks for Peter Zijlstra's reminding. > > 2. Initialize branch type to 0 in intel_pmu_lbr_read_32 and >intel_pmu_lbr_read_64. Remove the invalid else code in >intel_pmu_lbr_filter. > > v3: > --- > 1. Move the JCC forward/backward and cross page computing from >kernel to userspace. > > 2. Use lookup table to replace original switch/case processing. > > Changed: > perf/core: Define the common branch type classification > perf/x86/intel: Record branch type > perf report: Show branch type statistics for stdio mode > perf report: Show branch type in callchain entry > > Not changed: > perf record: Create a new option save_type in --branch-filter > > v2: > --- > 1. Use 4 bits in perf_branch_entry to record branch type. > > 2. Pull out some common branch types from FAR_BRANCH. Now the branch >types defined in perf_event.h: > > Jin Yao (7): > perf/core: Define the common branch type classification > perf/x86/intel: Record branch type > perf record: Create a new option save_type in --branch-filter > perf report: Refactor the branch info printing code > perf util: Create branch.c/.h for common branch functions > perf report: Show branch type statistics for stdio mode > perf report: Show branch type in callchain entry > > arch/x86/events/intel/lbr.c | 53 +- > include/uapi/linux/perf_event.h | 29 +- > tools/include/uapi/linux/perf_event.h| 29 +- > tools/perf/Documentation/perf-record.txt | 1 + > tools/perf/builtin-report.c | 25 + > tools/perf/util/Build| 1 + > tools/perf/util/branch.c | 168 > +++ > tools/perf/util/branch.h | 25 + > tools/perf/util/callchain.c | 140 ++ > tools/perf/util/callchain.h | 5 +- > tools/perf/util/event.h | 3 +- > tools/perf/util/hist.c | 5 +- > tools/perf/util/machine.c| 26 +++-- > tools/perf/util/parse-branch-options.c | 1 + > 14 files changed, 427 insertions(+), 84 deletions(-) > create mode 100644 tools/perf/util/branch.c > create mode 100644 tools/perf/util/branch.h > > -- > 2.7.4 >
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
PPC folks, maddy, does this work for you guys? On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote: > It is often useful to know the branch types while analyzing branch > data. For example, a call is very different from a conditional branch. > > Currently we have to look it up in binary while the binary may later > not be available and even the binary is available but user has to take > some time. It is very useful for user to check it directly in perf > report. > > Perf already has support for disassembling the branch instruction > to get the x86 branch type. > > To keep consistent on kernel and userspace and make the classification > more common, the patch adds the common branch type classification > in perf_event.h. > > PERF_BR_NONE : unknown > PERF_BR_JCC : conditional jump > PERF_BR_JMP : jump > PERF_BR_IND_JMP : indirect jump > PERF_BR_CALL : call > PERF_BR_IND_CALL : indirect call > PERF_BR_RET : return > PERF_BR_SYSCALL : syscall > PERF_BR_SYSRET: syscall return > PERF_BR_IRQ : hw interrupt/trap/fault > PERF_BR_INT : sw interrupt > PERF_BR_IRET : return from interrupt > PERF_BR_FAR_BRANCH: not generic far branch type > > The patch also adds a new field type (4 bits) in perf_branch_entry > to record the branch type. > > Since the disassembling of branch instruction needs some overhead, > a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it > needs to disassemble the branch instruction and record the branch > type. > > Change log > -- > > v6: Not changed. > > v5: Not changed. The v5 patch series just change the userspace. > > v4: Comparing to previous version, the major changes are: > > 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be >computed later in userspace. > > 2. Remove the "cross" field in perf_branch_entry. The cross page >computing will be done later in userspace. > > Signed-off-by: Jin Yao > --- > include/uapi/linux/perf_event.h | 29 - > tools/include/uapi/linux/perf_event.h | 29 - > 2 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > index d09a9cd..69af012 100644 > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift { > PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */ > PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */ > > + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */ > + > PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */ > }; > > @@ -198,9 +200,32 @@ enum perf_branch_sample_type { > PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << > PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT, > PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << > PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT, > > + PERF_SAMPLE_BRANCH_TYPE_SAVE= > + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT, > + > PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT, > }; > > +/* > + * Common flow change classification > + */ > +enum { > + PERF_BR_NONE= 0,/* unknown */ > + PERF_BR_JCC = 1,/* conditional jump */ > + PERF_BR_JMP = 2,/* jump */ > + PERF_BR_IND_JMP = 3,/* indirect jump */ > + PERF_BR_CALL= 4,/* call */ > + PERF_BR_IND_CALL= 5,/* indirect call */ > + PERF_BR_RET = 6,/* return */ > + PERF_BR_SYSCALL = 7,/* syscall */ > + PERF_BR_SYSRET = 8,/* syscall return */ > + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */ > + PERF_BR_INT = 10, /* sw interrupt */ > + PERF_BR_IRET= 11, /* return from interrupt */ > + PERF_BR_FAR_BRANCH = 12, /* not generic far branch type */ > + PERF_BR_MAX, > +}; > + > #define PERF_SAMPLE_BRANCH_PLM_ALL \ > (PERF_SAMPLE_BRANCH_USER|\ >PERF_SAMPLE_BRANCH_KERNEL|\ > @@ -999,6 +1024,7 @@ union perf_mem_data_src { > * in_tx: running in a hardware transaction > * abort: aborting a hardware transaction > *cycles: cycles from last branch (or 0 if not supported) > + * type: branch type > */ > struct perf_branch_entry { > __u64 from; > @@ -1008,7 +1034,8 @@ struct perf_branch_entry { > in_tx:1,/* in transaction */ > abort:1,/* transaction abort */ > cycles:16, /* cycle count to last branch */ > - reserved:44; > + type:4, /* branch type */ > + reserved:40; > }; > > #endif /* _UAPI_LINUX_PERF_EVENT_H */ > diff --git a/tools/include/uapi/linux/perf_event.h > b/tools/include/uapi/linux/perf_event.h > index d09a9cd..69af012 100644 > --- a/tools/include/uapi/linux/perf_event.h > +++ b/tools/include/uapi/linux
Re: [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC
Hi Maddy/Anju, Anju T Sudhakar writes: > From: Madhavan Srinivasan > > Create a new header file to add the data structures and > macros needed for In-Memory Collection (IMC) counter support. > > Signed-off-by: Anju T Sudhakar > Signed-off-by: Hemant Kumar > Signed-off-by: Madhavan Srinivasan > --- > arch/powerpc/include/asm/imc-pmu.h | 99 > ++ > 1 file changed, 99 insertions(+) > create mode 100644 arch/powerpc/include/asm/imc-pmu.h > > diff --git a/arch/powerpc/include/asm/imc-pmu.h > b/arch/powerpc/include/asm/imc-pmu.h > new file mode 100644 > index ..ffaea0b9c13e > --- /dev/null > +++ b/arch/powerpc/include/asm/imc-pmu.h > @@ -0,0 +1,99 @@ > +#ifndef PPC_POWERNV_IMC_PMU_DEF_H > +#define PPC_POWERNV_IMC_PMU_DEF_H > + > +/* > + * IMC Nest Performance Monitor counter support. > + * > + * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation. > + * (C) 2017 Anju T Sudhakar, IBM Corporation. > + * (C) 2017 Hemant K Shaw, IBM Corporation. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +/* > + * For static allocation of some of the structures. > + */ > +#define IMC_MAX_PMUS 32 > + > +/* > + * This macro is used for memory buffer allocation of > + * event names and event string > + */ > +#define IMC_MAX_NAME_VAL_LEN 96 > + > +/* > + * Currently Microcode supports a max of 256KB of counter memory > + * in the reserved memory region. Max pages to mmap (considering 4K > PAGESIZE). > + */ > +#define IMC_MAX_PAGES64 Ideally that sort of detail comes from the device tree. Otherwise old kernels will be unable to run on new hardware which supports more memory. Actually looking at where we use it, it seems like we don't it to come from the device tree. Seems core IMC only ever uses one page. Thread IMC gets the size indirectly via the device tree: if (of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size)) So we should be able to dynamically size vbase. > +/* > + *Compatbility macros for IMC devices > + */ > +#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters" > +#define IMC_DTB_UNIT_COMPAT "ibm,imc-counters" > + > +/* > + * Structure to hold memory address information for imc units. > + */ > +struct imc_mem_info { > + u32 id; > + u64 *vbase[IMC_MAX_PAGES]; > +}; cheers
[PATCH v2 0/5] powerpc/mm: Fix kernel protection and implement STRICT_KERNEL_RWX on PPC32
This patch set implements STRICT_KERNEL_RWX on Powerpc32 after fixing a few issues related to kernel code page protection. At the end we take the opportunity to get rid of some unneccessary/outdated fixmap stuff. Changes from v1 to v2: * Rebased on latest linux-next following including of STRICT_KERNEL_RWX for PPC64 * Removed from the serie the two patches already applied. Christophe Leroy (5): powerpc/mm: Ensure change_page_attr() doesn't invalidate pinned TLBs powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32 powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32 powerpc/mm: declare some local functions static powerpc/mm: Simplify __set_fixmap() arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/book3s/32/pgtable.h | 3 -- arch/powerpc/include/asm/fixmap.h| 10 +++-- arch/powerpc/include/asm/nohash/32/pgtable.h | 3 -- arch/powerpc/kernel/vmlinux.lds.S| 2 +- arch/powerpc/mm/init_32.c| 6 +++ arch/powerpc/mm/mem.c| 1 + arch/powerpc/mm/mmu_decl.h | 3 ++ arch/powerpc/mm/pgtable_32.c | 66 ++-- 9 files changed, 61 insertions(+), 35 deletions(-) -- 2.12.0
[PATCH v2 1/5] powerpc/mm: Ensure change_page_attr() doesn't invalidate pinned TLBs
__change_page_attr() uses flush_tlb_page(). flush_tlb_page() uses tlbie instruction, which also invalidates pinned TLBs, which is not what we expect. This patch modifies the implementation to use flush_tlb_kernel_range() instead. This will make use of tlbia which will preserve pinned TLBs. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable_32.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index a9e4bfc025bc..991036f818bb 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -325,7 +325,7 @@ get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp) #ifdef CONFIG_DEBUG_PAGEALLOC -static int __change_page_attr(struct page *page, pgprot_t prot) +static int __change_page_attr_noflush(struct page *page, pgprot_t prot) { pte_t *kpte; pmd_t *kpmd; @@ -339,8 +339,6 @@ static int __change_page_attr(struct page *page, pgprot_t prot) if (!get_pteptr(&init_mm, address, &kpte, &kpmd)) return -EINVAL; __set_pte_at(&init_mm, address, kpte, mk_pte(page, prot), 0); - wmb(); - flush_tlb_page(NULL, address); pte_unmap(kpte); return 0; @@ -355,13 +353,17 @@ static int change_page_attr(struct page *page, int numpages, pgprot_t prot) { int i, err = 0; unsigned long flags; + struct page *start = page; local_irq_save(flags); for (i = 0; i < numpages; i++, page++) { - err = __change_page_attr(page, prot); + err = __change_page_attr_noflush(page, prot); if (err) break; } + wmb(); + flush_tlb_kernel_range((unsigned long)page_address(start), + (unsigned long)page_address(page)); local_irq_restore(flags); return err; } -- 2.12.0
[PATCH v2 2/5] powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32
As seen below, allthough the init sections have been freed, the associated memory area is still marked as executable in the page tables. ~ dmesg [5.860093] Freeing unused kernel memory: 592K (c057 - c0604000) ~ cat /sys/kernel/debug/kernel_page_tables ---[ Start of kernel VM ]--- 0xc000-0xc0497fff4704K rw X present dirty accessed shared 0xc0498000-0xc056 864K rw present dirty accessed shared 0xc057-0xc059 192K rw X present dirty accessed shared 0xc05a-0xc7ff 125312K rw present dirty accessed shared ---[ vmalloc() Area ]--- This patch fixes that. The implementation is done by reusing the change_page_attr() function implemented for CONFIG_DEBUG_PAGEALLOC Signed-off-by: Christophe Leroy --- arch/powerpc/mm/mem.c| 1 + arch/powerpc/mm/mmu_decl.h | 3 +++ arch/powerpc/mm/pgtable_32.c | 13 ++--- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 8541f18694a4..b8cf4056d0d7 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -403,6 +403,7 @@ void free_initmem(void) { ppc_md.progress = ppc_printk_progress; free_initmem_default(POISON_FREE_INITMEM); + remap_init_ram(); } #ifdef CONFIG_BLK_DEV_INITRD diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index d46128b22150..207af7ad3bda 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -94,6 +94,7 @@ extern void _tlbia(void); #ifdef CONFIG_PPC32 extern void mapin_ram(void); +void remap_init_ram(void); extern void setbat(int index, unsigned long virt, phys_addr_t phys, unsigned int size, pgprot_t prot); @@ -105,6 +106,8 @@ struct hash_pte; extern struct hash_pte *Hash, *Hash_end; extern unsigned long Hash_size, Hash_mask; +#else +static inline void remap_init_ram(void) {} #endif /* CONFIG_PPC32 */ extern unsigned long ioremap_bot; diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 991036f818bb..a87a0b12b032 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -323,8 +323,6 @@ get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp) return(retval); } -#ifdef CONFIG_DEBUG_PAGEALLOC - static int __change_page_attr_noflush(struct page *page, pgprot_t prot) { pte_t *kpte; @@ -347,7 +345,7 @@ static int __change_page_attr_noflush(struct page *page, pgprot_t prot) /* * Change the page attributes of an page in the linear mapping. * - * THIS CONFLICTS WITH BAT MAPPINGS, DEBUG USE ONLY + * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY */ static int change_page_attr(struct page *page, int numpages, pgprot_t prot) { @@ -368,7 +366,16 @@ static int change_page_attr(struct page *page, int numpages, pgprot_t prot) return err; } +void remap_init_ram(void) +{ + struct page *page = virt_to_page(_sinittext); + unsigned long numpages = PFN_UP((unsigned long)_einittext) - +PFN_DOWN((unsigned long)_sinittext); + + change_page_attr(page, numpages, PAGE_KERNEL); +} +#ifdef CONFIG_DEBUG_PAGEALLOC void __kernel_map_pages(struct page *page, int numpages, int enable) { if (PageHighMem(page)) -- 2.12.0
[PATCH v2 3/5] powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32
This patch implements STRICT_KERNEL_RWX on PPC32. As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings in order to allow page protection setup at the level of each page. As BAT/LTLB mappings are deactivated, there might be a performance impact. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/vmlinux.lds.S | 2 +- arch/powerpc/mm/init_32.c | 6 ++ arch/powerpc/mm/pgtable_32.c | 24 4 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 36f858c37ca7..07c51f31f8a6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -165,7 +165,7 @@ config PPC select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK - select ARCH_HAS_STRICT_KERNEL_RWX if (PPC_BOOK3S_64 && !RELOCATABLE && !HIBERNATION) + select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !RELOCATABLE && !HIBERNATION) select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select HAVE_CBPF_JITif !PPC64 select HAVE_CONTEXT_TRACKINGif PPC64 diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index b1a250560198..882628fa6987 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -8,7 +8,7 @@ #include #include -#ifdef CONFIG_STRICT_KERNEL_RWX +#if defined(CONFIG_STRICT_KERNEL_RWX) && !defined(CONFIG_PPC32) #define STRICT_ALIGN_SIZE (1 << 24) #else #define STRICT_ALIGN_SIZE PAGE_SIZE diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c index 8a7c38b8d335..7d5fee1bb116 100644 --- a/arch/powerpc/mm/init_32.c +++ b/arch/powerpc/mm/init_32.c @@ -113,6 +113,12 @@ void __init MMU_setup(void) __map_without_bats = 1; __map_without_ltlbs = 1; } +#ifdef CONFIG_STRICT_KERNEL_RWX + if (rodata_enabled) { + __map_without_bats = 1; + __map_without_ltlbs = 1; + } +#endif } /* diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index a87a0b12b032..0b70ccd53de8 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "mmu_decl.h" @@ -375,6 +376,29 @@ void remap_init_ram(void) change_page_attr(page, numpages, PAGE_KERNEL); } +#ifdef CONFIG_STRICT_KERNEL_RWX +void mark_rodata_ro(void) +{ + struct page *page; + unsigned long numpages; + + page = virt_to_page(_stext); + numpages = PFN_UP((unsigned long)_etext) - + PFN_DOWN((unsigned long)_stext); + + change_page_attr(page, numpages, PAGE_KERNEL_ROX); + /* +* mark .rodata as read only. Use __init_begin rather than __end_rodata +* to cover NOTES and EXCEPTION_TABLE. +*/ + page = virt_to_page(__start_rodata); + numpages = PFN_UP((unsigned long)__init_begin) - + PFN_DOWN((unsigned long)__start_rodata); + + change_page_attr(page, numpages, PAGE_KERNEL_RO); +} +#endif + #ifdef CONFIG_DEBUG_PAGEALLOC void __kernel_map_pages(struct page *page, int numpages, int enable) { -- 2.12.0
[PATCH v2 4/5] powerpc/mm: declare some local functions static
get_pteptr() and __mapin_ram_chunk() are only used locally, so define them static Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/pgtable.h | 3 --- arch/powerpc/include/asm/nohash/32/pgtable.h | 3 --- arch/powerpc/mm/pgtable_32.c | 4 ++-- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 7fb755880409..17c8766777f1 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -294,9 +294,6 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm, #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) -extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, - pmd_t **pmdp); - int map_kernel_page(unsigned long va, phys_addr_t pa, int flags); /* Generic accessors to PTE bits */ diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index 91314268f04f..589206bf0358 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -337,9 +337,6 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm, #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) -extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, - pmd_t **pmdp); - int map_kernel_page(unsigned long va, phys_addr_t pa, int flags); #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 0b70ccd53de8..ca559bbeb659 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -243,7 +243,7 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, int flags) /* * Map in a chunk of physical memory starting at start. */ -void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) +static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) { unsigned long v, s, f; phys_addr_t p; @@ -295,7 +295,7 @@ void __init mapin_ram(void) * Returns true (1) if PTE was found, zero otherwise. The pointer to * the PTE pointer is unmodified if PTE is not found. */ -int +static int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp) { pgd_t *pgd; -- 2.12.0
[PATCH v2 5/5] powerpc/mm: Simplify __set_fixmap()
__set_fixmap() uses __fix_to_virt() then does the boundary checks by it self. Instead, we can use fix_to_virt() which does the verification at build time. For this, we need to use it inline so that GCC can see the real value of idx at buildtime. In the meantime, we remove the 'fixmaps' variable. This variable is set but has never been used from the beginning (commit 2c419bdeca1d9 ("[POWERPC] Port fixmap from x86 and use for kmap_atomic")) Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/fixmap.h | 10 +++--- arch/powerpc/mm/pgtable_32.c | 15 --- 2 files changed, 7 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h index 4508b322f2cd..6c40dfda5912 100644 --- a/arch/powerpc/include/asm/fixmap.h +++ b/arch/powerpc/include/asm/fixmap.h @@ -17,6 +17,7 @@ #ifndef __ASSEMBLY__ #include #include +#include #ifdef CONFIG_HIGHMEM #include #include @@ -62,9 +63,6 @@ enum fixed_addresses { __end_of_fixed_addresses }; -extern void __set_fixmap (enum fixed_addresses idx, - phys_addr_t phys, pgprot_t flags); - #define __FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE) @@ -72,5 +70,11 @@ extern void __set_fixmap (enum fixed_addresses idx, #include +static inline void __set_fixmap(enum fixed_addresses idx, + phys_addr_t phys, pgprot_t flags) +{ + map_kernel_page(fix_to_virt(idx), phys, pgprot_val(flags)); +} + #endif /* !__ASSEMBLY__ */ #endif diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index ca559bbeb659..3418ad469f36 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -408,18 +408,3 @@ void __kernel_map_pages(struct page *page, int numpages, int enable) change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0)); } #endif /* CONFIG_DEBUG_PAGEALLOC */ - -static int fixmaps; - -void __set_fixmap (enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) -{ - unsigned long address = __fix_to_virt(idx); - - if (idx >= __end_of_fixed_addresses) { - BUG(); - return; - } - - map_kernel_page(address, phys, pgprot_val(flags)); - fixmaps++; -} -- 2.12.0
Re: [linux-next] cpus stalls detected few hours after booting next kernel
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote: > On Fri, 30 Jun 2017 10:52:18 +0530 > Abdul Haleem wrote: > > > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote: > > > On Thu, 29 Jun 2017 20:23:05 +1000 > > > Nicholas Piggin wrote: > > > > > > > On Thu, 29 Jun 2017 19:36:14 +1000 > > > > Nicholas Piggin wrote: > > > > > > > > I don't *think* the replay-wakeup-interrupt patch is directly > > > > > involved, but > > > > > it's likely to be one of the idle patches. > > > > > > Okay this turned out to be misconfigured sleep states I added for the > > > simulator, sorry for the false alarm. > > > > > > > Although you have this in the backtrace. I wonder if that's a stuck > > > > lock in rcu_process_callbacks? > > > > > > So this spinlock becomes top of the list of suspects. Can you try > > > enabling lockdep and try to reproduce it? > > > > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set. > > I do not see any difference in trace messages with and without LOCKDEP > > enabled. > > > > Please find the attached log file. > > Can you get an rcu_invoke_callback event trace that Paul suggested? Yes, I was able to collect the perf data for rcu_invoke_callback event on recent next kernel (4.12.0-next-20170705). the issue is rare to hit. After booting the next kernel, I started this command 'perf record -e rcu:rcu_invoke_callback -a -g -- cat' and waited for 30 minutes. five minutes after seeing the stalls messages, I did CTRL-C to end the perf command. @Nicholas : the perf.data report is too huge to attach here, shall I ping you the internal location of file on slack/mail ? Also the machine is in the same state if you want to use it ? > > Does this bug show up with just the powerpc next branch? > > Thanks, > Nick > -- Regard's Abdul Haleem IBM Linux Technology Centre
Re: [PATCH 1/5] powernv:idle: Move device-tree parsing to one place.
Hello Nicholas, On Fri, Jul 07, 2017 at 12:53:40AM +1000, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 22:08:12 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > The details of the platform idle state are exposed by the firmware to > > the kernel via device tree. > > > > In the current code, we parse the device tree twice : > > > > 1) During the boot up in arch/powerpc/platforms/powernv/idle.c Here, > > the device tree is parsed to obtain the details of the > > supported_cpuidle_states which is used to determine the default idle > > state (which would be used when cpuidle is absent) and the deepest > > idle state (which would be used for cpu-hotplug). > > > > 2) During the powernv cpuidle driver initializion > > (drivers/cpuidle/cpuidle-powernv.c). Here we parse the device tree to > > populate the cpuidle driver's states. > > > > This patch moves all the device tree parsing to the platform idle > > code. It defines data-structures for recording the details of the > > parsed idle states. Any other kernel subsystem that is interested in > > the idle states (eg: cpuidle-powernv driver) can just use the > > in-kernel data structure instead of parsing the device tree all over > > again. > > > > Further, this helps to check the validity of states in one place and > > in case of invalid states (eg : stop states whose psscr values are > > errorenous) flag them as invalid, so that the other subsystems can be > > prevented from using those. > > > > Signed-off-by: Gautham R. Shenoy > > Hi, > > I think the overall direction is good. A few small things. Thanks for reviewing the patches. > > > > + > > +#define PNV_IDLE_NAME_LEN 16 > > +struct pnv_idle_state { > > + char name[PNV_IDLE_NAME_LEN]; > > + u32 flags; > > + u32 latency_ns; > > + u32 residency_ns; > > + u64 ctrl_reg_val; /* The ctrl_reg on POWER8 would be pmicr. */ > > + u64 ctrl_reg_mask; /* On POWER9 it is psscr */ > > + bool valid; > > +}; > > Do we use PMICR anywhere in the idle code? What about allowing for some > machine-specific fields? PMICR is not used anywhere so far. I will change to to psscr_val and psscr_mask for now. If there is a use for pmicr n the future, we can change this to the union struct as you suggest. > > union { > struct { /* p9 */ > u64 psscr_val; > u64 psscr_mask; > }; > struct { /* p8 */ > u64 pmicr...; > > > > diff --git a/arch/powerpc/platforms/powernv/idle.c > > b/arch/powerpc/platforms/powernv/idle.c > > index 2abee07..b747bb5 100644 > > --- a/arch/powerpc/platforms/powernv/idle.c > > +++ b/arch/powerpc/platforms/powernv/idle.c > > @@ -58,6 +58,17 @@ > > static u64 pnv_deepest_stop_psscr_mask; > > static bool deepest_stop_found; > > > > +/* > > + * Data structure that stores details of > > + * all the platform idle states. > > + */ > > +struct pnv_idle_states pnv_idle; > > + > > +struct pnv_idle_states *get_pnv_idle_states(void) > > +{ > > + return &pnv_idle; > > +} > > I wouldn't have the wrapper function... but it's your code so it's > up to you. One thing though is that this function you have called get_ > just to return the pointer, but it does not take a reference or > allocate memory or initialize the structure. Other functions with the > same prefix do such things. Can we make something more consistent? I agree with the wrapper function. But then the alternative was to declare this variable as an extern so that cpuidle can access it. Is that preferable ? > > ... > > > +/** > > + * get_idle_prop_u32_array: Returns an array of u32 elements > > + * parsed from the device tree corresponding > > + * to the property provided in variable propname. > > + * > > + * @np: Pointer to device tree node "/ibm,opal/power-mgt" > > + * @nr_states: Expected number of elements. > > + * @propname : Name of the property whose values is an array of > > + * u32 elements > > + * > > + * Returns a pointer to a u32 array of size nr_states on success. > > + * Returns NULL on failure. > > + */ > > +static inline u32 *get_idle_prop_u32_array(struct device_node *np, > > + int nr_states, > > + const char *propname) > > +{ > > + u32 *ret_array; > > + int rc, count; > > + > > + count = of_property_count_u32_elems(np, propname); > > + rc = validate_dt_prop_sizes("ibm,cpu-idle-state-flags", nr_states, > > + propname, count); > > + if (rc) > > + return NULL; > > + > > + ret_array = kcalloc(nr_states, sizeof(*ret_array), GFP_KERNEL); > > + if (!ret_array) > > + return NULL; > > So I would say for this, how about moving the allocations into the caller? > You're still doing most of the error handling freeing there, so I would > say it's more balanced if you do that. Sure, that makes sense. I will move the allocation to the main f
Re: [PATCH 2/5] powernv:idle: Change return type of pnv_probe_idle_states to int
Hello Nicholas, On Fri, Jul 07, 2017 at 01:01:49AM +1000, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 22:08:13 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > In the current idle initialization code, if there are failures in > > pnv_probe_idle_states, then no platform idle state is > > enabled. However, since the error is not propagated to the top-level > > function pnv_init_idle_states, we continue initialization in this > > top-level function even though this will never be used. > > > > Hence change the the return type of pnv_probe_idle_states from void to > > int and in case of failures, bail out early on in > > pnv_init_idle_states. > > > > Signed-off-by: Gautham R. Shenoy > > Looks good to me. > > Reviewed-by: Nicholas Piggin > > I wonder if the warnings are strong enough here to let people know > idle won't be used so power consumption will be high and performance > significantly reduced on SMT machines? Good point. Will try to print an error message to this effect. -- Thanks and Regards gautham.
Re: [PATCH 3/5] powernv:idle: Define idle init function for power8
Hi Nicholas, On Fri, Jul 07, 2017 at 01:06:46AM +1000, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 22:08:14 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > In this patch we define a new function named pnv_power8_idle_init(). > > > > We move the following code from pnv_init_idle_states() into this newly > > defined function. > >a) That patches out pnv_fastsleep_workaround_at_entry/exit when > > no states with OPAL_PM_SLEEP_ENABLED_ER1 are present. > >b) Creating a sysfs control to choose how the workaround has to be > >applied when a OPAL_PM_SLEEP_ENABLED_ER1 state is present. > >c) Set ppc_md.power_save to power7_idle when OPAL_PM_NAP_ENABLED is > >present. > > > > With this, all the power8 specific initializations are in one place. > > > > Signed-off-by: Gautham R. Shenoy > > --- > > arch/powerpc/platforms/powernv/idle.c | 59 > > --- > > 1 file changed, 40 insertions(+), 19 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/idle.c > > b/arch/powerpc/platforms/powernv/idle.c > > index a5990d9..c400ff9 100644 > > --- a/arch/powerpc/platforms/powernv/idle.c > > +++ b/arch/powerpc/platforms/powernv/idle.c > > @@ -564,6 +564,44 @@ static void __init pnv_power9_idle_init(void) > > pnv_first_deep_stop_state); > > } > > > > + > > +static void __init pnv_power8_idle_init(void) > > +{ > > + int i; > > + bool has_nap = false; > > + bool has_sleep_er1 = false; > > + int dt_idle_states = pnv_idle.nr_states; > > + > > + for (i = 0; i < dt_idle_states; i++) { > > + struct pnv_idle_state *state = &pnv_idle.states[i]; > > + > > + if (state->flags & OPAL_PM_NAP_ENABLED) > > + has_nap = true; > > + if (state->flags & OPAL_PM_SLEEP_ENABLED_ER1) > > + has_sleep_er1 = true; > > + } > > + > > + if (!has_sleep_er1) { > > + patch_instruction( > > + (unsigned int *)pnv_fastsleep_workaround_at_entry, > > + PPC_INST_NOP); > > + patch_instruction( > > + (unsigned int *)pnv_fastsleep_workaround_at_exit, > > + PPC_INST_NOP); > > + } else { > > + /* > > +* OPAL_PM_SLEEP_ENABLED_ER1 is set. It indicates that > > +* workaround is needed to use fastsleep. Provide sysfs > > +* control to choose how this workaround has to be applied. > > +*/ > > + device_create_file(cpu_subsys.dev_root, > > + &dev_attr_fastsleep_workaround_applyonce); > > + } > > + > > + if (has_nap) > > + ppc_md.power_save = power7_idle; > > +} > > + > > /* > > * Returns 0 if prop1_len == prop2_len. Else returns -1 > > */ > > @@ -837,6 +875,8 @@ static int __init pnv_probe_idle_states(void) > > > > if (cpu_has_feature(CPU_FTR_ARCH_300)) > > pnv_power9_idle_init(); > > + else > > + pnv_power8_idle_init(); > > > > for (i = 0; i < dt_idle_states; i++) { > > if (!pnv_idle.states[i].valid) > > @@ -858,22 +898,6 @@ static int __init pnv_init_idle_states(void) > > if (pnv_probe_idle_states()) > > goto out; > > > > - if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) { > > - patch_instruction( > > - (unsigned int *)pnv_fastsleep_workaround_at_entry, > > - PPC_INST_NOP); > > - patch_instruction( > > - (unsigned int *)pnv_fastsleep_workaround_at_exit, > > - PPC_INST_NOP); > > So previously this would run on POWER9 and patch out those branches. > But POWER9 never runs that code, so no problem. Good cleanup. And that's what I thought, but on checking the assembly code, I found that pnv_fastsleep_workaround_at_exit is executed on POWER9. Will fix this! > > Reviewed-by: Nicholas Piggin > -- Thanks and Regards gautham.
Re: [git pull] vfs.git part 1
Al Viro writes: > vfs.git topology is rather convoluted this cycle, so > I'm afraid that it'll take more pull requests than usual ;-/ > > The first pile is #work.misc-set_fs. Assorted getting rid > of cargo-culted access_ok(), cargo-culted set_fs() and > field-by-field copyouts. The same description applies to > a lot of stuff in other branches - this is just the stuff that > didn't fit into a more specific topical branch. > > The following changes since commit c86daad2c25bfd4a33d48b7691afaa96d9c5ab46: > > Merge branch 'for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input (2017-05-26 16:45:13 > -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.misc-set_fs > > for you to fetch changes up to 8c6657cb50cb037ff58b3f6a547c6569568f3527: > > Switch flock copyin/copyout primitives to copy_{from,to}_user() (2017-06-26 > 23:52:44 -0400) This commit seems to have broken networking on a bunch of my PPC machines (64-bit kernel, 32-bit userspace). # first bad commit: [8c6657cb50cb037ff58b3f6a547c6569568f3527] Switch flock copyin/copyout primitives to copy_{from,to}_user() The symptom is eth0 doesn't get address via dhcp. Reverting it on top of master (9f45efb928) everything works OK again. Trying to bring networking up manually gives: # ifup eth0 ifup: failed to lock lockfile /run/network/ifstate.eth0: Invalid argument strace shows: 5647 fcntl64(3, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EINVAL (Invalid argument) 5647 write(2, "ifup: failed to lock lockfile /r"..., 74) = 74 vs the working case: 6005 fcntl64(3, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0 Patch coming. cheers
[PATCH] fs/fcntl: Fix F_GET/SETLK etc. for compat processes
Commit 8c6657cb50cb ("Switch flock copyin/copyout primitives to copy_{from,to}_user()") added copy_flock_fields(from, to), but then in all cases called it with arguments of (to, from). eg: static int get_compat_flock(struct flock *kfl, struct compat_flock __user *ufl) { struct compat_flock fl; if (copy_from_user(&fl, ufl, sizeof(struct compat_flock))) return -EFAULT; copy_flock_fields(*kfl, fl); return 0; } We are reading the compat_flock ufl from userspace, into flock kfl. First we copy all of ufl into fl on the stack, and then we want to assign each field of fl to kfl. So we are copying from fl and to kfl. But as written the copy_flock_fields() macro takes the arguments in the other order. copy_to/from_user() take "to" as the first argument, so change the order of arguments in the copy_flock_fields() macro, rather than changing the callers. Fixes: 8c6657cb50cb ("Switch flock copyin/copyout primitives to copy_{from,to}_user()") Signed-off-by: Michael Ellerman --- fs/fcntl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index b6bd89628025..f40e3a9c10a5 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -520,7 +520,7 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, #ifdef CONFIG_COMPAT /* careful - don't use anywhere else */ -#define copy_flock_fields(from, to)\ +#define copy_flock_fields(to, from)\ (to).l_type = (from).l_type;\ (to).l_whence = (from).l_whence;\ (to).l_start = (from).l_start; \ -- 2.7.4
[GIT PULL] Please pull powerpc/linux.git powerpc-4.13-1 tag
Hi Linus, Please pull powerpc updates for 4.13. No conflicts or anything I'm aware of. I did merge my own fixes branch into my next, so I had to generate the diffstat below by hand, but I'm pretty sure it's correct. The following changes since commit 5ed02dbb497422bf225783f46e6eadd237d23d6b: Linux 4.12-rc3 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.13-1 for you to fetch changes up to 1e0fc9d1eb2b0241a03e0a02bcdb9b5b641b9d35: powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs (2017-07-04 11:37:44 +1000) powerpc updates for 4.13 Highlights include: - Support for STRICT_KERNEL_RWX on 64-bit server CPUs. - Platform support for FSP2 (476fpe) board - Enable ZONE_DEVICE on 64-bit server CPUs. - Generic & powerpc spin loop primitives to optimise busy waiting - Convert VDSO update function to use new update_vsyscall() interface - Optimisations to hypercall/syscall/context-switch paths - Improvements to the CPU idle code on Power8 and Power9. As well as many other fixes and improvements. Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung Bauermann, Yang Li. Akshay Adiga (2): powerpc/powernv/idle: Restore SPRs for deep idle states via stop API. powerpc/powernv/idle: Clear r12 on wakeup from stop lite Andrew Donnellan (1): MAINTAINERS: cxl: update maintainership Andrew Jeffery (1): powerpc: Tweak copy selection parameter in __copy_tofrom_user_power7() Anshuman Khandual (2): powerpc/mm: Add comments to the vmemmap layout powerpc/mm: Add comments on vmemmap physical mapping Anton Blanchard (2): powerpc: Add HAVE_IRQ_TIME_ACCOUNTING powerpc/mm: Wire up hpte_removebolted for powernv Balbir Singh (16): powerpc/mm/ptdump: Dump the first entry of the linear mapping as well powerpc/mm/hash: Do a local flush if possible when no batch is active powerpc/mm/book(e)(3s)/64: Add page table accounting powerpc/mm/book(e)(3s)/32: Add page table accounting powerpc/mm/hugetlb: Add support for page accounting powerpc/mm: Trace tlbie(l) instructions powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp() powerpc/mm/radix: Fix execute permissions for interrupt_vectors powerpc/kprobes: Move kprobes over to patch_instruction() powerpc/kprobes/optprobes: Use patch_instruction() powerpc/xmon: Add patch_instruction() support for xmon powerpc/lib/code-patching: Use alternate map for patch_instruction() powerpc/vmlinux.lds: Align __init_begin to 16M powerpc/mm/hash: Implement mark_rodata_ro() for hash powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs Benjamin Herrenschmidt (3): powerpc: Only do ERAT invalidate on radix context switch on P9 DD1 powerpc/64s: Invalidate ERAT on powersave wakeup for POWER9 powerpc/xive: Silence message about VP block allocation Christophe Leroy (13): powerpc/mm: Remove __this_fixmap_does_not_exist() powerpc/mm: Only call store_updates_sp() on stores in do_page_fault() powerpc/mm: Remove a redundant test in do_page_fault() powerpc/mm: Evaluate user_mode(regs) only once in do_page_fault() powerpc/mm: The 8xx doesn't call do_page_fault() for breakpoints powerpc/40x: Clear MSR_DR in one insn instead of two powerpc/8xx: fix mpc8xx_get_irq() return on no irq powerpc: Handle simultaneous interrupts at once powerpc: Discard ffs()/__ffs() function and use builtin functions instead powerpc: Use builtin functions for fls()/__fls()/fls64() powerpc: Replace ffz() by equivalent generic function powerpc: Remove __ilog2()s and use generic ones powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit Christophe Lombard (1): cxl: Export library to support IBM XSL Colin Ian King (1): powerpc: Fix some spelling mistakes Dan Carpenter (1): cxl: Unlock on error in probe Gautham R. Shenoy (5): powerpc/powernv/idle: Correctly initialize core_idle_state_ptr powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop powerpc/powernv/idle: Use Requeste
Applied "ASoC: imx-ssi: add check on platform_get_irq return value" to the asoc tree
The patch ASoC: imx-ssi: add check on platform_get_irq return value has been applied to the asoc tree at git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From ae1fbdff6dbcdfee9daee69fa1e7d26d1f31d1c7 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Fri, 30 Jun 2017 17:17:35 -0500 Subject: [PATCH] ASoC: imx-ssi: add check on platform_get_irq return value Check return value from call to platform_get_irq(), so in case of failure print error message and propagate the return value. Signed-off-by: Gustavo A. R. Silva Acked-by: Nicolin Chen Signed-off-by: Mark Brown --- sound/soc/fsl/imx-ssi.c | 4 1 file changed, 4 insertions(+) diff --git a/sound/soc/fsl/imx-ssi.c b/sound/soc/fsl/imx-ssi.c index b95132e2f9dc..06790615e04e 100644 --- a/sound/soc/fsl/imx-ssi.c +++ b/sound/soc/fsl/imx-ssi.c @@ -527,6 +527,10 @@ static int imx_ssi_probe(struct platform_device *pdev) } ssi->irq = platform_get_irq(pdev, 0); + if (ssi->irq < 0) { + dev_err(&pdev->dev, "Failed to get IRQ: %d\n", ssi->irq); + return ssi->irq; + } ssi->clk = devm_clk_get(&pdev->dev, NULL); if (IS_ERR(ssi->clk)) { -- 2.13.2
Re: [PATCH 4/5] powernv:idle: Move initialization of sibling pacas to pnv_alloc_idle_core_states
On Fri, Jul 07, 2017 at 01:16:09AM +1000, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 22:08:15 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > On POWER9 DD1, in order to get around a hardware issue, we store in > > every CPU thread's paca the paca pointers of all its siblings. > > > > Move this code into pnv_alloc_idle_core_states() soon after the space > > for saving the sibling pacas is allocated. > > > > Signed-off-by: Gautham R. Shenoy > > > - if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { > > - int cpu; > > - > > - pr_info("powernv: idle: Saving PACA pointers of all CPUs in > > their thread sibling PACA\n"); > > - for_each_possible_cpu(cpu) { > > - int base_cpu = cpu_first_thread_sibling(cpu); > > - int idx = cpu_thread_in_core(cpu); > > - int i; > > - > > You could move the thread_sibling_pacas allocation to here? > > Speaking of which... core_idle_state and thread_sibling_pacas are > allocated with kmalloc_node... What happens if we take an SLB miss > in the idle wakeup code on these guys? Nothing good I think. Perhaps > we should put them into the pacas or somewhere in bolted memory. Yes, though the SLB miss hasn't yet been encountered in practise so far! While one can define thread_sibling_pacas in PACA, it doesn't make sense to allocate space for core_idle_state in PACA since the allocated value of the secondary threads will never be used. What is the right way to ensure that these allocations fall in the bolted range ? > > Good cleanup though. > > Reviewed-by: Nicholas Piggin >
Re: [PATCH] fs/fcntl: Fix F_GET/SETLK etc. for compat processes
On Fri, Jul 07, 2017 at 10:48:51PM +1000, Michael Ellerman wrote: > Commit 8c6657cb50cb ("Switch flock copyin/copyout primitives to > copy_{from,to}_user()") added copy_flock_fields(from, to), but then in all > cases > called it with arguments of (to, from). eg: > > static int get_compat_flock(struct flock *kfl, struct compat_flock __user > *ufl) > { > struct compat_flock fl; > > if (copy_from_user(&fl, ufl, sizeof(struct compat_flock))) > return -EFAULT; > copy_flock_fields(*kfl, fl); > return 0; > } > > We are reading the compat_flock ufl from userspace, into flock kfl. First we > copy all of ufl into fl on the stack, and then we want to assign each field of > fl to kfl. So we are copying from fl and to kfl. But as written the > copy_flock_fields() macro takes the arguments in the other order. > > copy_to/from_user() take "to" as the first argument, so change the order of > arguments in the copy_flock_fields() macro, rather than changing the callers. D'oh... Acked-by: Al Viro
Re: [git pull] vfs.git part 1
On Fri, Jul 7, 2017 at 5:46 AM, Michael Ellerman wrote: > Al Viro writes: > >> >> Switch flock copyin/copyout primitives to copy_{from,to}_user() >> (2017-06-26 23:52:44 -0400) > > This commit seems to have broken networking on a bunch of my PPC > machines (64-bit kernel, 32-bit userspace). Bah. I think that commit is entirely broken, due to having the arguments to the "copy_flock_fields()" in the wrong order. The copy_flock_fields() macro has the arguments in order , but all the users seem to do it the other way around. I think it would have been more obvious if the put_compat_flock*() source argument had been "const". > Patch coming. I'm not seeing a patch, so I did my own. But it's _entirely_ untested. Does the attached fix things for you? Linus fs/fcntl.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index b6bd89628025..eeb19e22fd08 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -527,43 +527,43 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, (to).l_len = (from).l_len; \ (to).l_pid = (from).l_pid; -static int get_compat_flock(struct flock *kfl, struct compat_flock __user *ufl) +static int get_compat_flock(struct flock *kfl, const struct compat_flock __user *ufl) { struct compat_flock fl; if (copy_from_user(&fl, ufl, sizeof(struct compat_flock))) return -EFAULT; - copy_flock_fields(*kfl, fl); + copy_flock_fields(fl, *kfl); return 0; } -static int get_compat_flock64(struct flock *kfl, struct compat_flock64 __user *ufl) +static int get_compat_flock64(struct flock *kfl, const struct compat_flock64 __user *ufl) { struct compat_flock64 fl; if (copy_from_user(&fl, ufl, sizeof(struct compat_flock64))) return -EFAULT; - copy_flock_fields(*kfl, fl); + copy_flock_fields(fl, *kfl); return 0; } -static int put_compat_flock(struct flock *kfl, struct compat_flock __user *ufl) +static int put_compat_flock(const struct flock *kfl, struct compat_flock __user *ufl) { struct compat_flock fl; memset(&fl, 0, sizeof(struct compat_flock)); - copy_flock_fields(fl, *kfl); + copy_flock_fields(*kfl, fl); if (copy_to_user(ufl, &fl, sizeof(struct compat_flock))) return -EFAULT; return 0; } -static int put_compat_flock64(struct flock *kfl, struct compat_flock64 __user *ufl) +static int put_compat_flock64(const struct flock *kfl, struct compat_flock64 __user *ufl) { struct compat_flock64 fl; memset(&fl, 0, sizeof(struct compat_flock64)); - copy_flock_fields(fl, *kfl); + copy_flock_fields(*kfl, fl); if (copy_to_user(ufl, &fl, sizeof(struct compat_flock64))) return -EFAULT; return 0;
Re: [PATCH v2 2/4] selftests/ftrace: Add a test to probe module functions
On 2017/07/03 12:51PM, Masami Hiramatsu wrote: > On Mon, 3 Jul 2017 12:27:33 +0900 > Masami Hiramatsu wrote: > > > On Thu, 29 Jun 2017 19:05:37 +0530 > > "Naveen N. Rao" wrote: > > > > > Add a kprobes test to ensure that we are able to add a probe on a > > > module function using 'p :' format, without having to > > > specify a probe name. > > > > > > Suggested-by: Masami Hiramatsu > > > Acked-by: Masami Hiramatsu > > > Signed-off-by: Naveen N. Rao > > > --- > > > .../testing/selftests/ftrace/test.d/kprobe/probe_module.tc | 14 > > > ++ > > > 1 file changed, 14 insertions(+) > > > create mode 100644 > > > tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc > > > > > > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc > > > b/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc > > > new file mode 100644 > > > index ..ea7657041ba6 > > > --- /dev/null > > > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc > > > @@ -0,0 +1,14 @@ > > > +#!/bin/sh > > > +# description: Kprobe dynamic event - probing module > > > + > > > +[ -f kprobe_events ] || exit_unsupported # this is configurable > > > + > > > +echo 0 > events/enable > > > +echo > kprobe_events > > > +export MOD=`lsmod | head -n 2 | tail -n 1 | cut -f1 -d" "` > > > +export FUNC=`grep -m 1 ".* t .*\\[$MOD\\]" /proc/kallsyms | xargs | cut > > > -f3 -d" "` > > > +[ "x" != "x$MOD" -a "y" != "y$FUNC" ] || exit_untested > > > > Could you also add below case? > > > > echo p:probe_$MOD/$FUNC $MOD/$FUNC > kprobe_events > > Oops, it should be something like > > echo "p:test_${MOD}_${FUNC} $MOD/$FUNC" > kprobe_events > > since we would like to avoid adding new group name for it. > > (Adding new group name should be a separated one.) > > Thank you, > > > > > This is for "new event with name on module" case, your one is for "new > > event without name on module (automatic name generation)" > > > > We should have different test case, because those kicks slightly different > > parts in kprobe tracer. Sure. Will make changes to the two tests here and re-spin. Thanks, Naveen
Re: [git pull] vfs.git part 1
On Fri, Jul 7, 2017 at 8:59 AM, Linus Torvalds wrote: > >> Patch coming. > > I'm not seeing a patch, so I did my own. But it's _entirely_ untested. > Does the attached fix things for you? Oh, I see you sent a patch to the list but didn't cc me like in this thread. Hmm. Al - I'd like to add the "const" parts at least. How the ordering gets fixed (I changed it in the users of the macro, Michael changed the macro itself) I don't much care about. Can you get me a pull request soon since this presumably also breaks every other compat case, and it just happened that power was the one that noticed it first.. Or I can just commit my version, but I guess Michael's is at least tested.. Linus
Re: Today's linux-next build fail on powerpc
On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem wrote: > Hi Luis, > > next-20170705 fails to build on powerpc with below errors. Hi, I had sent a fix yesterday. Had you chance to test it? -- With Best Regards, Andy Shevchenko
Re: [git pull] vfs.git part 1
On Fri, Jul 7, 2017 at 8:59 AM, Linus Torvalds wrote: > > The copy_flock_fields() macro has the arguments in order , > but all the users seem to do it the other way around. Looking more at it, I think I'd also like copy_flock_fields() to take pointer arguments, to match all the code around it (both copy_to/from_user and the memset calls. The actual order of arguments I suspect Michael's patch did better - make the copy_flock_fields() just match the order of memcpy() and copy_to/from_user(), both of which have order. So I think my preferred patch would be something like this, even if it is bigger than either. Comments? Michael, does this work for your case? Linus fs/fcntl.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index b6bd89628025..3b01b646e528 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -520,50 +520,50 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, #ifdef CONFIG_COMPAT /* careful - don't use anywhere else */ -#define copy_flock_fields(from, to)\ - (to).l_type = (from).l_type;\ - (to).l_whence = (from).l_whence;\ - (to).l_start = (from).l_start; \ - (to).l_len = (from).l_len; \ - (to).l_pid = (from).l_pid; - -static int get_compat_flock(struct flock *kfl, struct compat_flock __user *ufl) +#define copy_flock_fields(dst, src)\ + (dst)->l_type = (src)->l_type; \ + (dst)->l_whence = (src)->l_whence; \ + (dst)->l_start = (src)->l_start;\ + (dst)->l_len = (src)->l_len;\ + (dst)->l_pid = (src)->l_pid; + +static int get_compat_flock(struct flock *kfl, const struct compat_flock __user *ufl) { struct compat_flock fl; if (copy_from_user(&fl, ufl, sizeof(struct compat_flock))) return -EFAULT; - copy_flock_fields(*kfl, fl); + copy_flock_fields(kfl, &fl); return 0; } -static int get_compat_flock64(struct flock *kfl, struct compat_flock64 __user *ufl) +static int get_compat_flock64(struct flock *kfl, const struct compat_flock64 __user *ufl) { struct compat_flock64 fl; if (copy_from_user(&fl, ufl, sizeof(struct compat_flock64))) return -EFAULT; - copy_flock_fields(*kfl, fl); + copy_flock_fields(kfl, &fl); return 0; } -static int put_compat_flock(struct flock *kfl, struct compat_flock __user *ufl) +static int put_compat_flock(const struct flock *kfl, struct compat_flock __user *ufl) { struct compat_flock fl; memset(&fl, 0, sizeof(struct compat_flock)); - copy_flock_fields(fl, *kfl); + copy_flock_fields(&fl, kfl); if (copy_to_user(ufl, &fl, sizeof(struct compat_flock))) return -EFAULT; return 0; } -static int put_compat_flock64(struct flock *kfl, struct compat_flock64 __user *ufl) +static int put_compat_flock64(const struct flock *kfl, struct compat_flock64 __user *ufl) { struct compat_flock64 fl; memset(&fl, 0, sizeof(struct compat_flock64)); - copy_flock_fields(fl, *kfl); + copy_flock_fields(&fl, kfl); if (copy_to_user(ufl, &fl, sizeof(struct compat_flock64))) return -EFAULT; return 0;
Re: [PATCH 5/5] powernv:idle: Disable LOSE_FULL_CONTEXT states when stop-api fails.
On Fri, Jul 07, 2017 at 01:29:16AM +1000, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 22:08:16 +0530 > "Gautham R. Shenoy" wrote: > > > From: "Gautham R. Shenoy" > > > > Currently, we use the opal call opal_slw_set_reg() to inform the that > > the Sleep-Winkle Engine (SLW) to restore the contents of some of the > > Hypervisor state on wakeup from deep idle states that lose full > > hypervisor context (characterized by the flag > > OPAL_PM_LOSE_FULL_CONTEXT). > > > > However, the current code has a bug in that if opal_slw_set_reg() > > fails, we don't disable the use of these deep states (winkle on > > POWER8, stop4 onwards on POWER9). > > > > This patch fixes this bug by ensuring that if the the sleep winkle > > engine is unable to restore the hypervisor states in > > pnv_save_sprs_for_deep_states(), then we mark as invalid the states > > which lose full context. > > > > As a side-effect, since supported_cpuidle_states in > > pnv_probe_idle_states() consists of flags of only the valid states, > > this patch will ensure that no other subsystem in the kernel can use > > the states which lose full context on stop-api failures. > > Looks good. Is there something minimal we can do for stable here? > > Aside question, do we need to restore LPCR at all with the SLW engine? > It gets set up again when by the idle wakeup code. > > And does POWER9 really need MSR and PSSCR restored by SLW? (going a bit > off topic here, I'm just curious) MSR is needed to be restored so that we wakeup with the right endianness and with the IR,DR disabled. PSSCR is set to a value so that in case of a special wakeup for a deep-stop, the SLW can program the core to go back to the stop level provided by the PSSCR value via the stop-api. > > Thanks, > Nick > -- Thanks and Regards gautham.
Re: [git pull] vfs.git part 1
On Fri, Jul 07, 2017 at 10:35:41AM -0700, Linus Torvalds wrote: > Comments? Michael, does this work for your case? Looks sane... > +++ b/fs/fcntl.c > @@ -520,50 +520,50 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned > int, cmd, > > #ifdef CONFIG_COMPAT > /* careful - don't use anywhere else */ > -#define copy_flock_fields(from, to) \ > - (to).l_type = (from).l_type;\ > - (to).l_whence = (from).l_whence;\ > - (to).l_start = (from).l_start; \ > - (to).l_len = (from).l_len; \ > - (to).l_pid = (from).l_pid; > - > -static int get_compat_flock(struct flock *kfl, struct compat_flock __user > *ufl) > +#define copy_flock_fields(dst, src) \ > + (dst)->l_type = (src)->l_type; \ > + (dst)->l_whence = (src)->l_whence; \ > + (dst)->l_start = (src)->l_start;\ > + (dst)->l_len = (src)->l_len;\ > + (dst)->l_pid = (src)->l_pid; > + > +static int get_compat_flock(struct flock *kfl, const struct compat_flock > __user *ufl) > { > struct compat_flock fl; > > if (copy_from_user(&fl, ufl, sizeof(struct compat_flock))) > return -EFAULT; > - copy_flock_fields(*kfl, fl); > + copy_flock_fields(kfl, &fl); > return 0; > } > > -static int get_compat_flock64(struct flock *kfl, struct compat_flock64 > __user *ufl) > +static int get_compat_flock64(struct flock *kfl, const struct compat_flock64 > __user *ufl) > { > struct compat_flock64 fl; > > if (copy_from_user(&fl, ufl, sizeof(struct compat_flock64))) > return -EFAULT; > - copy_flock_fields(*kfl, fl); > + copy_flock_fields(kfl, &fl); > return 0; > } > > -static int put_compat_flock(struct flock *kfl, struct compat_flock __user > *ufl) > +static int put_compat_flock(const struct flock *kfl, struct compat_flock > __user *ufl) > { > struct compat_flock fl; > > memset(&fl, 0, sizeof(struct compat_flock)); > - copy_flock_fields(fl, *kfl); > + copy_flock_fields(&fl, kfl); > if (copy_to_user(ufl, &fl, sizeof(struct compat_flock))) > return -EFAULT; > return 0; > } > > -static int put_compat_flock64(struct flock *kfl, struct compat_flock64 > __user *ufl) > +static int put_compat_flock64(const struct flock *kfl, struct compat_flock64 > __user *ufl) > { > struct compat_flock64 fl; > > memset(&fl, 0, sizeof(struct compat_flock64)); > - copy_flock_fields(fl, *kfl); > + copy_flock_fields(&fl, kfl); > if (copy_to_user(ufl, &fl, sizeof(struct compat_flock64))) > return -EFAULT; > return 0;
[PATCH 2/2] powerpc/mm/radix: Synchronize updates to the process table
When writing to the process table, we need to ensure the store is visible to a subsequent access by the MMU. We assume we never have the PID active while doing the update, so a ptesync/isync pair should hopefully be a big enough hammer for our purpose. Signed-off-by: Benjamin Herrenschmidt --- Note: Architecturally, we also need to use a tlbie(l) with RIC=2 to flush the process table cache. However this is (very) expensive and we know that POWER9 will invalidate its cache when hitting the mtpid instruction. To be safe, we should add the tlbie for any ARCH300 processor we don't know about though. (Aneesh, Nick do we need a ftr bit ?) arch/powerpc/mm/mmu_context_book3s64.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index 9404b5e..e3e2803 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -138,6 +138,14 @@ static int radix__init_new_context(struct mm_struct *mm) rts_field = radix__get_tree_size(); process_tb[index].prtb0 = cpu_to_be64(rts_field | __pa(mm->pgd) | RADIX_PGD_INDEX_SIZE); + /* +* Order the above store with subsequent update of the PID +* register (at which point HW can start loading/caching +* the entry) and the corresponding load by the MMU from +* the L2 cache. +*/ + asm volatile("ptesync;isync" : : : "memory"); + mm->context.npu_context = NULL; return index;
Re: [PATCH] powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
Hi Nicholas, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.12 next-20170707] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-powernv-Fix-local-TLB-flush-for-boot-and-MCE-on-POWER9/20170708-011225 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/kernel/dt_cpu_ftrs.c: In function 'cpufeatures_flush_tlb': >> arch/powerpc/kernel/dt_cpu_ftrs.c:98:18: error: unused variable 'num_sets' >> [-Werror=unused-variable] unsigned int i, num_sets; ^~~~ >> arch/powerpc/kernel/dt_cpu_ftrs.c:98:15: error: unused variable 'i' >> [-Werror=unused-variable] unsigned int i, num_sets; ^ >> arch/powerpc/kernel/dt_cpu_ftrs.c:97:16: error: unused variable 'rb' >> [-Werror=unused-variable] unsigned long rb; ^~ cc1: all warnings being treated as errors vim +/num_sets +98 arch/powerpc/kernel/dt_cpu_ftrs.c 5a61ef74 Nicholas Piggin 2017-05-09 91 } system_registers; 5a61ef74 Nicholas Piggin 2017-05-09 92 5a61ef74 Nicholas Piggin 2017-05-09 93 static void (*init_pmu_registers)(void); 5a61ef74 Nicholas Piggin 2017-05-09 94 5a61ef74 Nicholas Piggin 2017-05-09 95 static void cpufeatures_flush_tlb(void) 5a61ef74 Nicholas Piggin 2017-05-09 96 { 5a61ef74 Nicholas Piggin 2017-05-09 @97unsigned long rb; 5a61ef74 Nicholas Piggin 2017-05-09 @98unsigned int i, num_sets; 5a61ef74 Nicholas Piggin 2017-05-09 99 5a61ef74 Nicholas Piggin 2017-05-09 100/* 5a61ef74 Nicholas Piggin 2017-05-09 101 * This is a temporary measure to keep equivalent TLB flush as the :: The code at line 98 was first introduced by commit :: 5a61ef74f269f2573f48fa53607a8911216c3326 powerpc/64s: Support new device tree binding for discovering CPU features :: TO: Nicholas Piggin :: CC: Michael Ellerman --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [git pull] vfs.git part 1
Linus Torvalds writes: > On Fri, Jul 7, 2017 at 8:59 AM, Linus Torvalds > wrote: >> >> The copy_flock_fields() macro has the arguments in order , >> but all the users seem to do it the other way around. > > Looking more at it, I think I'd also like copy_flock_fields() to take > pointer arguments, to match all the code around it (both > copy_to/from_user and the memset calls. > > The actual order of arguments I suspect Michael's patch did better - > make the copy_flock_fields() just match the order of memcpy() and > copy_to/from_user(), both of which have order. > > So I think my preferred patch would be something like this, even if it > is bigger than either. > > Comments? Michael, does this work for your case? Yeah that works, as committed in your tree. Sorry for the slow reply, our time zones don't line up all that well :) cheers
Re: [git pull] vfs.git part 1
Linus Torvalds writes: > On Fri, Jul 7, 2017 at 8:59 AM, Linus Torvalds > wrote: >> >>> Patch coming. >> >> I'm not seeing a patch, so I did my own. But it's _entirely_ untested. >> Does the attached fix things for you? > > Oh, I see you sent a patch to the list but didn't cc me like in this thread. Oops, I sent it To you, but I forgot to make it a reply to this thread which was daft. cheers
Re: [PATCH 5/5] mtd: powernv_flash: Use opal_async_wait_response_interruptible()
On Thu, Jun 29, 2017 at 04:54:13PM +1000, Cyril Bur wrote: > The OPAL calls performed in this driver shouldn't be using > opal_async_wait_response() as this performs a wait_event() which, on > long running OPAL calls could result in hung task warnings. wait_event() > also prevents timely signal delivery which is also undesirable. > > This patch also attempts to quieten down the use of dev_err() when > errors haven't actually occurred and also to return better information up > the stack rather than always -EIO. > > Signed-off-by: Cyril Bur > --- > drivers/mtd/devices/powernv_flash.c | 18 +++--- > 1 file changed, 11 insertions(+), 7 deletions(-) Seems OK: Acked-by: Brian Norris