[PATCH V2 0/3] Add new PowerPC specific ELF core notes
This patch series adds five new ELF core note sections which can be used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing various transactional memory and miscellaneous register sets on PowerPC platform. Please find a test program exploiting these new ELF core note types on a POWER8 system. RFC: https://lkml.org/lkml/2014/4/1/292 V1: https://lkml.org/lkml/2014/4/2/43 Changes in V2 = (1) Removed all the power specific ptrace requests corresponding to new NT_PPC_* elf core note types. Now all the register sets can be accessed from ptrace through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* core note type instead (2) Fixed couple of attribute values for REGSET_TM_CGPR register set (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread (4) Fixed 32 bit checkpointed GPR support (5) Changed commit messages accordingly Outstanding Issues == (1) Running DSCR register value inside a transaction does not seem to be saved at thread.dscr when the process stops for ptrace examination. Test programs = #include #include #include #include #include #include #include #include #include #include #include #include #include typedef long long u64; typedef unsigned int u32; typedef __vector128 vector128; /* TM CFPR */ struct tm_cfpr { u64 fpr[32]; u64 fpscr; }; /* TM CVMX */ struct tm_cvmx { vector128 vr[32] __attribute__((aligned(16))); vector128 vscr __attribute__((aligned(16))); u32 vrsave; }; /* TM SPR */ struct tm_spr_regs { u64 tm_tfhar; u64 tm_texasr; u64 tm_tfiar; u64 tm_orig_msr; u64 tm_tar; u64 tm_ppr; u64 tm_dscr; }; /* Miscellaneous registers */ struct misc_regs { u64 dscr; u64 ppr; u64 tar; }; /* TM instructions */ #define TBEGIN ".long 0x7C00051D ;" #define TEND".long 0x7C00055D ;" /* SPR number */ #define SPRN_DSCR 0x3 #define SPRN_TAR815 /* ELF core notes */ #define NT_PPC_TM_SPR 0x103 /* PowerPC transactional memory special registers */ #define NT_PPC_TM_CGPR 0x104 /* PowerpC transactional memory checkpointed GPR */ #define NT_PPC_TM_CFPR 0x105 /* PowerPC transactional memory checkpointed FPR */ #define NT_PPC_TM_CVMX 0x106 /* PowerPC transactional memory checkpointed VMX */ #define NT_PPC_MISC0x107 /* PowerPC miscellaneous registers */ #define VAL1 1 #define VAL2 2 #define VAL3 3 #define VAL4 4 int main(int argc, char *argv[]) { struct tm_spr_regs *tmr1; struct pt_regs *pregs1, *pregs2; struct tm_cfpr *fpr, *fpr1; struct misc_regs *dbr1; struct iovec iov; pid_t child; int ret = 0, status = 0, i = 0, flag = 1; pregs2 = (struct pt_regs *) malloc(sizeof(struct pt_regs)); fpr = (struct tm_cfpr *) malloc(sizeof(struct tm_cfpr)); child = fork(); if (child < 0) { printf("fork() failed \n"); exit(-1); } /* Child code */ if (child == 0) { asm __volatile__( "6: ;" /* TM checkpointed values */ "li 1, %[val1];"/* GPR[1] */ ".long 0x7C210166;" /* FPR[1] */ "li 2, %[val2];"/* GPR[2] */ ".long 0x7C420166;" /* FPR[2] */ "mtspr %[tar], 1;" /* TAR */ "mtspr %[dscr], 2;" /* DSCR */ "1: ;" TBEGIN /* TM running values */ "beq 2f ;" "li 1, %[val3];"/* GPR[1] */ ".long 0x7C210166;" /* FPR[1] */ "li 2, %[val4];"/* GPR[2] */ ".long 0x7C420166;" /* FPR[2] */ "mtspr %[tar], 1;" /* TAR */ "mtspr %[dscr], 2;" /* DSCR */ "b .;" TEND "2: ;" /* Abort handler */ "b 1b;" /* Start from TBEGIN */ "3: ;" "b 6b;" /* Start all over again */ :: [dscr]"i"(SPRN_DSCR), [tar]"i"(SPRN_TAR), [val1]"i"(VAL1), [val2]"i"(VAL2), [val3]"i"(VAL3), [val4]"i"(VAL4) : "memory", "r7"); } /* Parent */ if (child) { do { memset(pregs2, 0 , sizeof(struct pt_regs)); memset(fpr, 0 , sizeof(struct tm_cfpr)); /* Wait till child hits "b ." instruction */
[PATCH V2 1/3] elf: Add some new PowerPC specifc note sections
This patch adds four new note sections for transactional memory and one note section for some miscellaneous registers. This addition of new elf note sections extends the existing elf ABI without affecting it in any manner. Signed-off-by: Anshuman Khandual --- include/uapi/linux/elf.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ef6103b..4040124 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -379,6 +379,11 @@ typedef struct elf64_shdr { #define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */ #define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */ #define NT_PPC_VSX 0x102 /* PowerPC VSX registers */ +#define NT_PPC_TM_SPR 0x103 /* PowerPC TM special registers */ +#define NT_PPC_TM_CGPR 0x104 /* PowerpC TM checkpointed GPR */ +#define NT_PPC_TM_CFPR 0x105 /* PowerPC TM checkpointed FPR */ +#define NT_PPC_TM_CVMX 0x106 /* PowerPC TM checkpointed VMX */ +#define NT_PPC_MISC0x107 /* PowerPC miscellaneous registers */ #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
This patch enables get and set of transactional memory related register sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new ELF core note types added previously in this regard. (1) NT_PPC_TM_SPR (2) NT_PPC_TM_CGPR (3) NT_PPC_TM_CFPR (4) NT_PPC_TM_CVMX Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/switch_to.h | 8 + arch/powerpc/kernel/process.c| 24 ++ arch/powerpc/kernel/ptrace.c | 683 +-- 3 files changed, 687 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 0e83e7d..2737f46 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t) } #endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern void flush_tmregs_to_thread(struct task_struct *); +#else +static inline void flush_tmregs_to_thread(struct task_struct *t) +{ +} +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ + static inline void clear_task_ebb(struct task_struct *t) { #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 31d0215..e247898 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct *prev) } } +void flush_tmregs_to_thread(struct task_struct *tsk) +{ + /* +* If task is not current, it should have been flushed +* already to it's thread_struct during __switch_to(). +*/ + if (tsk != current) + return; + + preempt_disable(); + if (tsk->thread.regs) { + /* +* If we are still current, the TM state need to +* be flushed to thread_struct as it will be still +* present in the current cpu. +*/ + if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) { + __switch_to_tm(tsk); + tm_recheckpoint_new_task(tsk); + } + } + preempt_enable(); +} + /* * This is called if we are on the way out to userspace and the * TIF_RESTORE_TM flag is set. It checks if we need to reload diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 2e3d2bf..92faded 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, return ret; } +/* + * When any transaction is active, "thread_struct->transact_fp" holds + * the current running value of all FPR registers and "thread_struct-> + * fp_state" holds the last checkpointed FPR registers state for the + * current transaction. + * + * struct data { + * u64 fpr[32]; + * u64 fpscr; + * }; + */ static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset, u64 buf[33]; int i; #endif - flush_fp_to_thread(target); + if (MSR_TM_ACTIVE(target->thread.regs->msr)) { + flush_fp_to_thread(target); + flush_altivec_to_thread(target); + flush_tmregs_to_thread(target); + } else { + flush_fp_to_thread(target); + } #ifdef CONFIG_VSX /* copy to local buffer then write that out */ - for (i = 0; i < 32 ; i++) - buf[i] = target->thread.TS_FPR(i); - buf[32] = target->thread.fp_state.fpscr; + if (MSR_TM_ACTIVE(target->thread.regs->msr)) { + for (i = 0; i < 32 ; i++) + buf[i] = target->thread.TS_TRANS_FPR(i); + buf[32] = target->thread.transact_fp.fpscr; + } else { + for (i = 0; i < 32 ; i++) + buf[i] = target->thread.TS_FPR(i); + buf[32] = target->thread.fp_state.fpscr; + } return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1); #else - BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) != -offsetof(struct thread_fp_state, fpr[32][0])); + if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) { + BUILD_BUG_ON(offsetof(struct transact_fp, fpscr) != + offsetof(struct transact_fp, fpr[32][0])); - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.transact_fp, 0, -1); +
[PATCH V2 3/3] powerpc, ptrace: Enable support for miscellaneous registers
This patch enables get and set of miscellaneous registers through ptrace PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing new powerpc specific register set REGSET_MISC support corresponding to the new ELF core note NT_PPC_MISC added previously in this regard. Signed-off-by: Anshuman Khandual --- arch/powerpc/kernel/ptrace.c | 81 1 file changed, 81 insertions(+) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 92faded..3332dd8 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -1054,6 +1054,76 @@ static int tm_cvmx_set(struct task_struct *target, const struct user_regset *reg #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ /* + * Miscellaneous Registers + * + * struct { + * unsigned long dscr; + * unsigned long ppr; + * unsigned long tar; + * }; + */ +static int misc_get(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + int ret; + + /* DSCR register */ + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.dscr, 0, + sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); + + /* PPR register */ + if (!ret) + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.ppr, sizeof(unsigned long), + 2 * sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) + != offsetof(struct thread_struct, tar)); + /* TAR register */ + if (!ret) + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.tar, 2 * sizeof(unsigned long), + 3 * sizeof(unsigned long)); + return ret; +} + +static int misc_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + + /* DSCR register */ + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.dscr, 0, + sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); + + /* PPR register */ + if (!ret) + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.ppr, sizeof(unsigned long), + 2 * sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) + != offsetof(struct thread_struct, tar)); + + /* TAR register */ + if (!ret) + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.tar, 2 * sizeof(unsigned long), + 3 * sizeof(unsigned long)); + return ret; +} + +/* * These are our native regset flavors. */ enum powerpc_regset { @@ -1074,6 +1144,7 @@ enum powerpc_regset { REGSET_TM_CFPR, /* TM checkpointed FPR */ REGSET_TM_CVMX, /* TM checkpointed VMX */ #endif + REGSET_MISC /* Miscellaneous */ }; static const struct user_regset native_regsets[] = { @@ -1130,6 +1201,11 @@ static const struct user_regset native_regsets[] = { .get = tm_cvmx_get, .set = tm_cvmx_set }, #endif + [REGSET_MISC] = { + .core_note_type = NT_PPC_MISC, .n = 3, + .size = sizeof(u64), .align = sizeof(u64), + .get = misc_get, .set = misc_set + }, }; static const struct user_regset_view user_ppc_native_view = { @@ -1459,6 +1535,11 @@ static const struct user_regset compat_regsets[] = { .get = tm_cvmx_get, .set = tm_cvmx_set }, #endif + [REGSET_MISC] = { + .core_note_type = NT_PPC_MISC, .n = 3, + .size = sizeof(u64), .align = sizeof(u64), + .get = misc_get, .set = misc_set + }, }; static const struct user_regset_view user_ppc_compat_view = { -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 08/11] powerpc, lib: Add new branch analysis support functions
Generic powerpc branch analysis support added in the code patching library which will help the subsequent patch on SW based filtering of branch records in perf. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/code-patching.h | 16 +++ arch/powerpc/lib/code-patching.c | 80 2 files changed, 96 insertions(+) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 97e02f9..39919d4 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,6 +22,16 @@ #define BRANCH_SET_LINK0x1 #define BRANCH_ABSOLUTE0x2 +#define XL_FORM_LR 0x4C20 +#define XL_FORM_CTR 0x4C000420 +#define XL_FORM_TAR 0x4C000460 + +#define BO_ALWAYS0x0280 +#define BO_CTR 0x0200 +#define BO_CRBI_OFF 0x0080 +#define BO_CRBI_ON 0x0180 +#define BO_CRBI_HINT 0x0040 + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags); unsigned int create_cond_branch(const unsigned int *addr, @@ -56,4 +66,10 @@ static inline unsigned long ppc_function_entry(void *func) #endif } +/* Perf branch filters */ +bool instr_is_return_branch(unsigned int instr); +bool instr_is_conditional_branch(unsigned int instr); +bool instr_is_func_call(unsigned int instr); +bool instr_is_indirect_func_call(unsigned int instr); + #endif /* _ASM_POWERPC_CODE_PATCHING_H */ diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index d5edbeb..a06f8b3 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr) return (instr >> 26) & 0x3F; } +/* Forms of branch instruction */ static int instr_is_branch_iform(unsigned int instr) { return branch_opcode(instr) == 18; @@ -87,6 +88,85 @@ static int instr_is_branch_bform(unsigned int instr) return branch_opcode(instr) == 16; } +static int instr_is_branch_xlform(unsigned int instr) +{ + return branch_opcode(instr) == 19; +} + +/* Classification of XL-form instruction */ +static int is_xlform_lr(unsigned int instr) +{ + return (instr & XL_FORM_LR) == XL_FORM_LR; +} + +/* BO field analysis (B-form or XL-form) */ +static int is_bo_always(unsigned int instr) +{ + return (instr & BO_ALWAYS) == BO_ALWAYS; +} + +/* Link bit is set */ +static int is_branch_link_set(unsigned int instr) +{ + return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK; +} + +/* + * Generic software implemented branch filters used + * by perf branch stack sampling when PMU does not + * process them for some reason. + */ + +/* PERF_SAMPLE_BRANCH_ANY_RETURN */ +bool instr_is_return_branch(unsigned int instr) +{ + /* +* Conditional and unconditional branch to LR register +* without seting the link register. +*/ + if (is_xlform_lr(instr) && !is_branch_link_set(instr)) + return true; + + return false; +} + +/* PERF_SAMPLE_BRANCH_COND */ +bool instr_is_conditional_branch(unsigned int instr) +{ + /* I-form instruction - excluded */ + if (instr_is_branch_iform(instr)) + return false; + + /* B-form or XL-form instruction */ + if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr)) { + + /* Not branch always */ + if (!is_bo_always(instr)) + return true; + } + return false; +} + +/* PERF_SAMPLE_BRANCH_ANY_CALL */ +bool instr_is_func_call(unsigned int instr) +{ + /* LR should be set */ + if (is_branch_link_set(instr)) + return true; + + return false; +} + +/* PERF_SAMPLE_BRANCH_IND_CALL */ +bool instr_is_indirect_func_call(unsigned int instr) +{ + /* XL-form instruction with LR set */ + if (instr_is_branch_xlform(instr) && is_branch_link_set(instr)) + return true; + + return false; +} + int instr_is_relative_branch(unsigned int instr) { if (instr & BRANCH_ABSOLUTE) -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 8ce62ef..dfe6b9d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 03/11] x86, perf: Add conditional branch filtering support
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d82d155..9dd2459 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_NO_TX) mask |= X86_BR_NO_TX; + if (br_type & PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework
This patch enables SW based post processing of BHRB captured branches to be able to meet more user defined branch filtration criteria in perf branch stack sampling framework. These changes increase the number of branch filters and their valid combinations on any powerpc64 server platform with BHRB support. Find the summary of code changes here. (1) struct cpu_hw_events Introduced two new variables track various filter values and mask (a) bhrb_sw_filter Tracks SW implemented branch filter flags (b) bhrb_filter Tracks both (SW and HW) branch filter flags (2) Event creation Kernel will figure out supported BHRB branch filters through a PMU call back 'bhrb_filter_map'. This function will find out how many of the requested branch filters can be supported in the PMU HW. It will not try to invalidate any branch filter combinations. Event creation will not error out because of lack of HW based branch filters. Meanwhile it will track the overall supported branch filters in the 'bhrb_filter' variable. Once the PMU call back returns kernel will process the user branch filter request against available SW filters (bhrb_sw_filter_map) while looking at the 'bhrb_filter'. During this phase all the branch filters which are still pending from the user requested list will have to be supported in SW failing which the event creation will error out. (3) SW branch filter During the BHRB data capture inside the PMU interrupt context, each of the captured 'perf_branch_entry.from' will be checked for compliance with applicable SW branch filters. If the entry does not conform to the filter requirements, it will be discarded from the final perf branch stack buffer. (4) Supported SW based branch filters (a) PERF_SAMPLE_BRANCH_ANY_RETURN (b) PERF_SAMPLE_BRANCH_IND_CALL (c) PERF_SAMPLE_BRANCH_ANY_CALL (d) PERF_SAMPLE_BRANCH_COND Please refer the patch to understand the classification of instructions into these branch filter categories. (5) Multiple branch filter semantics Book3 sever implementation follows the same OR semantics (as implemented in x86) while dealing with multiple branch filters at any point of time. SW branch filter analysis is carried on the data set captured in the PMU HW. So the resulting set of data (after applying the SW filters) will inherently be an AND with the HW captured set. Hence any combination of HW and SW branch filters will be invalid. HW based branch filters are more efficient and faster compared to SW implemented branch filters. So at first the PMU should decide whether it can support all the requested branch filters itself or not. In case it can support all the branch filters in an OR manner, we dont apply any SW branch filter on top of the HW captured set (which is the final set). This preserves the OR semantic of multiple branch filters as required. But in case where the PMU cannot support all the requested branch filters in an OR manner, it should not apply any it's filters and leave it upto the SW to handle them all. Its the PMU code's responsibility to uphold this protocol to be able to conform to the overall OR semantic of perf branch stack sampling framework. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/perf_event_server.h | 6 +- arch/powerpc/perf/core-book3s.c | 188 ++- arch/powerpc/perf/power8-pmu.c | 2 +- 3 files changed, 187 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 9ed73714..93a9a8a 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -19,6 +19,10 @@ #define MAX_EVENT_ALTERNATIVES 8 #define MAX_LIMITED_HWCOUNTERS 2 +#define for_each_branch_sample_type(x) \ +for ((x) = PERF_SAMPLE_BRANCH_USER; \ + (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1) + /* * This struct provides the constants and functions needed to * describe the PMU on a particular POWER-family CPU. @@ -35,7 +39,7 @@ struct power_pmu { unsigned long *valp); int (*get_alternatives)(u64 event_id, unsigned int flags, u64 alt[]); - u64 (*bhrb_filter_map)(u64 branch_sample_type); + u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 *bhrb_filter); void(*config_bhrb)(u64 pmu_bhrb_filter); void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]); int (*limited_pmc_event)(u64 event_id); diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/pe
[V6 00/11] perf: New conditional branch filter
This patchset is the re-spin of the original branch stack sampling patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset also enables SW based branch filtering support for book3s powerpc platforms which have PMU HW backed branch stack sampling support. Summary of code changes in this patchset: (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter (2) Add the "cond" branch filter options in the "perf record" tool (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform (5) Update the documentation regarding "perf record" tool (6) Add some new powerpc instruction analysis functions in code-patching library (7) Enable SW based branch filter support for powerpc book3s (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters With this new SW enablement, the branch filter support for book3s platforms have been extended to include all these combinations discussed below with a sample test application program (included here). Changes in V2 = (1) Enabled PPC64 SW branch filtering support (2) Incorporated changes required for all previous comments Changes in V3 = (1) Split the SW branch filter enablement into multiple patches (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code (3) Added new instruction analysis functionality into powerpc code-patching library (4) Changed name for some of the functions (5) Fixed couple of spelling mistakes (6) Changed code documentation in multiple places Changes in V4 = (1) Changed the commit message for patch (01/10) (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman (3) Rebased the patchset against latest Linus's tree Changes in V5 = (1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity (3) Merged the previous 10th patch into the 8th patch (4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael (5) Simplified the logic in branch analysis library (6) Fixed some ambiguities in documentation at various places (7) Added some more in-code documentation blocks at various places (8) Renamed some local variable and function names (9) Fixed some indentation and white space errors in the code (10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset (11) Enabled privilege mode SW branch filter (12) Simplified and generalized the SW implemented conditional branch filter (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation (14) Adjusted other patches to deal with the above changes Changes in V6 = (1) Rebased the patchset against the master (2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series which changes the generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130] HW implemented branch filters = (1) perf record -j any_call -e branch-misses:u ./cprog # Overhead Command Source Shared ObjectSource Symbol Target Shared Object Target Symbol # ... ... # 7.85%cprog cprog [.] sw_3_1 cprog [.] success_3_1_2 5.66%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_2 5.65%cprog cprog [.] hw_1_1 cprog [.] symbol1 5.42%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_3 5.40%cprog cprog [.] callme cprog [.] hw_1_1 5.40%cprog cprog [.] sw_3_1 cprog [.] success_3_1_1 5.40%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_1 5.39%cprog cprog [.] sw_4_2 cprog [.] lr_addr 5.39%cprog cprog [.] callme cprog [.] sw_4_2 5.39%cprog [unknown] [.] cprog [.] ctr_addr 5.38%cprog cprog [.] hw_1_2 cprog [.] symbol2 5.38%cprog cprog [.] callme cprog [.] hw_1_2 5.16%cprog cprog [.] sw_3_1 cprog [.] success_3_1_3 5.15%cprog cprog [.] callme cprog [.] sw_3_2 5.14%
[V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable
This patch simply changes the name of the variable from 'bhrb_filter' to 'bhrb_hw_filter' in order to add one more variable which will track SW filters in generic powerpc book3s code which will be implemented in the subsequent patch. This patch does not change any functionality. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 66bea54..1d7e909 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -47,7 +47,7 @@ struct cpu_hw_events { int n_txn_start; /* BHRB bits */ - u64 bhrb_filter;/* BHRB HW branch filter */ + u64 bhrb_hw_filter; /* BHRB HW branch filter */ int bhrb_users; void*bhrb_context; struct perf_branch_stack bhrb_stack; @@ -1298,7 +1298,7 @@ static void power_pmu_enable(struct pmu *pmu) mb(); if (cpuhw->bhrb_users) - ppmu->config_bhrb(cpuhw->bhrb_filter); + ppmu->config_bhrb(cpuhw->bhrb_hw_filter); write_mmcr0(cpuhw, mmcr0); @@ -1405,7 +1405,7 @@ nocheck: out: if (has_branch_stack(event)) { power_pmu_bhrb_enable(event); - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); } @@ -1788,10 +1788,10 @@ static int power_pmu_event_init(struct perf_event *event) err = power_check_constraints(cpuhw, events, cflags, n + 1); if (has_branch_stack(event)) { - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); - if(cpuhw->bhrb_filter == -1) + if(cpuhw->bhrb_hw_filter == -1) return -EOPNOTSUPP; } -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 04/11] perf, documentation: Description for conditional branch filter
Adding documentation support for conditional branch filter. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c71b0f3..d460049 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -184,9 +184,10 @@ following filters are defined: - in_tx: only when the target is in a hardware transaction - no_tx: only when the target is not in a hardware transaction - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 11/11] powerpc, perf: Enable privilege mode SW branch filters
This patch enables privilege mode SW branch filters. Also modifies POWER8 PMU branch filter configuration so that the privilege mode branch filter implemented as part of base PMU event configuration is reflected in bhrb filter mask. As a result, the SW will skip and not try to process the privilege mode branch filters itself. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 53 +++-- arch/powerpc/perf/power8-pmu.c | 13 -- 2 files changed, 52 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index a94cc43..297cddb 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -26,6 +26,9 @@ #define BHRB_PREDICTION0x0001 #define BHRB_EA0xFFFCUL +#define POWER_ADDR_USER0 +#define POWER_ADDR_KERNEL 1 + struct cpu_hw_events { int n_events; int n_percpu; @@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 sw_filter) * Access the instruction contained in the address and check * whether it complies with the applicable SW branch filters. */ -static bool keep_branch(u64 from, u64 sw_filter) +static bool keep_branch(u64 from, u64 to, u64 sw_filter) { unsigned int instr; - bool ret; + bool to_plm, ret, flag; /* * The "from" branch for every branch record has to go @@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter) if (sw_filter == 0) return true; + to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER; + + /* +* Applying privilege mode SW branch filters first on the +* 'to' address makes an AND semantic with the SW generic +* branch filters (OR with each other) being applied on the +* from address there after. +*/ + + /* Ignore PERF_SAMPLE_BRANCH_HV */ + sw_filter &= ~PERF_SAMPLE_BRANCH_HV; + + /* Privilege mode branch filters for "TO" address */ + if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) { + flag = false; + + if (sw_filter & PERF_SAMPLE_BRANCH_USER) { + if(to_plm == POWER_ADDR_USER) + flag = true; + } + + if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) { + if(to_plm == POWER_ADDR_KERNEL) + flag = true; + } + + if (!flag) + return false; + } + + /* Generic branch filters for "FROM" address */ if (is_kernel_addr(from)) { return check_instruction((unsigned int *) from, sw_filter); } else { @@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter) if (!(branch_sample_type & x)) continue; /* -* Privilege filter requests have been already -* taken care during the base PMU configuration. -*/ - if ((x == PERF_SAMPLE_BRANCH_USER) - || (x == PERF_SAMPLE_BRANCH_KERNEL) - || (x == PERF_SAMPLE_BRANCH_HV)) - continue; - - /* * Requested filter not available either * in PMU or in SW. */ @@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter) } /* SW implemented branch filters */ -static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL, +static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER, + PERF_SAMPLE_BRANCH_KERNEL, + PERF_SAMPLE_BRANCH_HV, + PERF_SAMPLE_BRANCH_ANY_CALL, PERF_SAMPLE_BRANCH_COND, PERF_SAMPLE_BRANCH_ANY_RETURN, PERF_SAMPLE_BRANCH_IND_CALL }; @@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) /* Apply SW branch filters and drop the entry if required */ if (!keep_branch(cpuhw->bhrb_entries[u_index].from, + cpuhw->bhrb_entries[u_index].to, cpuhw->bhrb_sw_filter)) u_index--; u_index++; diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 4743bde..b6e21da 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter) * filter configuration. BHRB is always recorded along with a * r
[V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters
Powerpc kernel now supports SW based branch filters for book3s systems with some specifc requirements while dealing with HW supported branch filters in order to achieve overall OR semantics prevailing in perf branch stack sampling framework. This patch adapts the BHRB branch filter configuration to meet those protocols. POWER8 PMU can only handle one HW based branch filter request at any point of time. For all other combinations PMU will pass it on to the SW. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 50 -- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 699b1dd..4743bde 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -635,6 +635,16 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter) { + u64 x, pmu_bhrb_filter; + pmu_bhrb_filter = 0; + *bhrb_filter = 0; + + /* No branch filter requested */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { + *bhrb_filter = PERF_SAMPLE_BRANCH_ANY; + return pmu_bhrb_filter; + } + /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a * regular PMU event. As the privilege state filter is handled @@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter) /* Ignore user, kernel, hv bits */ branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; - /* No branch filter requested */ - if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) - return 0; + /* +* P8 does not support oring of PMU HW branch filters. Hence +* if multiple branch filters are requested which includes filters +* supported in PMU, still go ahead and clear the PMU based HW branch +* filter component as in this case all the filters will be processed +* in SW. +*/ - if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) { - return POWER8_MMCRA_IFM1; + for_each_branch_sample_type(x) { + /* Ignore privilege branch filters */ + if ((x == PERF_SAMPLE_BRANCH_USER) + || (x == PERF_SAMPLE_BRANCH_KERNEL) + || (x == PERF_SAMPLE_BRANCH_HV)) + continue; + + if (!(branch_sample_type & x)) + continue; + + /* Supported individual PMU branch filters */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { + branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL; + if (branch_sample_type) { + /* Multiple branch filters will be processed in SW */ + pmu_bhrb_filter = 0; + *bhrb_filter = 0; + return pmu_bhrb_filter; + } else { + /* Individual branch filter will be processed in PMU */ + pmu_bhrb_filter |= POWER8_MMCRA_IFM1; + *bhrb_filter|= PERF_SAMPLE_BRANCH_ANY_CALL; + return pmu_bhrb_filter; + } + } } - /* Every thing else is unsupported */ - return -1; + return pmu_bhrb_filter; } static void power8_config_bhrb(u64 pmu_bhrb_filter) -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Various architectures can provide this functionality with either with HW filtering support (if present) or with SW filtering of captured branch instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 853bc1c..696f69b4 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[V6 05/11] powerpc, perf: Re-arrange BHRB processing
This patch cleans up some existing indentation problem and re-organizes the BHRB processing code with an helper function named `update_branch_entry` making it more readable. This patch does not change any functionality. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 102 1 file changed, 52 insertions(+), 50 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 4520c93..66bea54 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr) return target - (unsigned long)&instr + addr; } +/* Update individual branch entry */ +void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred) +{ + cpuhw->bhrb_entries[u_index].from = from; + cpuhw->bhrb_entries[u_index].to = to; + cpuhw->bhrb_entries[u_index].mispred = pred; + cpuhw->bhrb_entries[u_index].predicted = ~pred; + return; +} + /* Processing BHRB entries */ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) { u64 val; - u64 addr; + u64 addr, tmp; int r_index, u_index, pred; r_index = 0; @@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) if (!val) /* Terminal marker: End of valid BHRB entries */ break; - else { - addr = val & BHRB_EA; - pred = val & BHRB_PREDICTION; - if (!addr) - /* invalid entry */ - continue; + addr = val & BHRB_EA; + pred = val & BHRB_PREDICTION; - /* Branches are read most recent first (ie. mfbhrb 0 is -* the most recent branch). -* There are two types of valid entries: -* 1) a target entry which is the to address of a -*computed goto like a blr,bctr,btar. The next -*entry read from the bhrb will be branch -*corresponding to this target (ie. the actual -*blr/bctr/btar instruction). -* 2) a from address which is an actual branch. If a -*target entry proceeds this, then this is the -*matching branch for that target. If this is not -*following a target entry, then this is a branch -*where the target is given as an immediate field -*in the instruction (ie. an i or b form branch). -*In this case we need to read the instruction from -*memory to determine the target/to address. + if (!addr) + /* invalid entry */ + continue; + + /* Branches are read most recent first (ie. mfbhrb 0 is +* the most recent branch). +* There are two types of valid entries: +* 1) a target entry which is the to address of a +*computed goto like a blr,bctr,btar. The next +*entry read from the bhrb will be branch +*corresponding to this target (ie. the actual +*blr/bctr/btar instruction). +* 2) a from address which is an actual branch. If a +*target entry proceeds this, then this is the +*matching branch for that target. If this is not +*following a target entry, then this is a branch +*where the target is given as an immediate field +*in the instruction (ie. an i or b form branch). +*In this case we need to read the instruction from +*memory to determine the target/to address. +*/ + if (val & BHRB_TARGET) { + /* Target branches use two entries +* (ie. computed gotos/XL form) */ + tmp = addr; + /* Get from address in next entry */ + val = read_bhrb(r_index++); + addr = val & BHRB_EA; if (val & BHRB_TARGET) { - /* Target branches use two entries -* (ie. computed gotos/XL form) -*/ - cpuhw->bhrb_entries[u_index].to = addr; - cpuhw->bhrb_entries[u_index].mispred = pred; - cpuhw->bhrb_entries[u_index].predicted = ~pred; - -
[V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
This patch does some code re-arrangements to make it clear that it ignores any separate privilege level branch filter request and does not support any combinations of HW PMU branch filters. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index fe2763b..13f47f5 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -635,8 +635,6 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type) { - u64 pmu_bhrb_filter = 0; - /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a * regular PMU event. As the privilege state filter is handled @@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) * PMU event, we ignore any separate BHRB specific request. */ - /* No branch filter requested */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) - return pmu_bhrb_filter; - - /* Invalid branch filter options - HW does not support */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) - return -1; + /* Ignore user, kernel, hv bits */ + branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) - return -1; + /* No branch filter requested */ + if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) + return 0; - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { - pmu_bhrb_filter |= POWER8_MMCRA_IFM1; - return pmu_bhrb_filter; + if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) { + return POWER8_MMCRA_IFM1; } /* Every thing else is unsupported */ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot problems with a PA6T board
Hi Michael, Thanks a lot for your answer. They reasoned that "starting cpu hw idx 0... failed" is reported because that core of the CPU is already up and running. I have built a git kernel from 2014-04-02. -> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git -> git show 3e75c6de1ac33fe3500f44573d9212dc82c99f59 -> git checkout -f 3e75c6de1ac33fe3500f44573d9212dc82c99f59; git clean -fdx This kernel booted and showed a Kernel Panic with the following error message: Oops: Machine check, sig: 7 [#1] Rgds, Christian On 05.05.2014 07:48, Michael Ellerman wrote: On Sun, 2014-05-04 at 18:02 +0200, Christian Zigotzky wrote: Hi All, The RC 1, 2, and 3 of the kernel 3.15 don't boot on my PA6T board with a Radeon HD 6870 graphics card. Screenshot: http://forum.hyperion-entertainment.biz/download/file.php?id=1060&mode=view The kernel 3.14 starts without any problems. Has anyone a tip for me, please? The line that says "starting cpu hw idx 0... failed" looks a little worrying. Do you see that on 3.14 as well? Otherwise bisection is probably your best bet. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V5] KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest
On 05/05/2014 05:09 AM, Aneesh Kumar K.V wrote: This patch make sure we inherit the LE bit correctly in different case so that we can run Little Endian distro in PR mode Signed-off-by: Aneesh Kumar K.V Thanks, applied to kvm-ppc-queue. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not arch/powerpc/include/asm/disassemble.h | 34 +++ arch/powerpc/kernel/align.c| 34 +-- arch/powerpc/kvm/book3s_emulate.c | 43 -- 3 files changed, 40 insertions(+), 71 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 856f8deb557a..6330a61b875a 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst) { return (inst >> 11) & 0x7fff; } + +#define IS_XFORM(inst) (get_op(inst) == 31) +#define IS_DSFORM(inst)(get_op(inst) >= 56) + +/* + * Create a DSISR value from the instruction + */ +static inline unsigned make_dsisr(unsigned instr) +{ + unsigned dsisr; + + + /* bits 6:15 --> 22:31 */ + dsisr = (instr & 0x03ff) >> 16; + + if (IS_XFORM(instr)) { + /* bits 29:30 --> 15:16 */ + dsisr |= (instr & 0x0006) << 14; + /* bit 25 -->17 */ + dsisr |= (instr & 0x0040) << 8; + /* bits 21:24 --> 18:21 */ + dsisr |= (instr & 0x0780) << 3; + } else { + /* bit 5 -->17 */ + dsisr |= (instr & 0x0400) >> 12; + /* bits 1: 4 --> 18:21 */ + dsisr |= (instr & 0x7800) >> 17; + /* bits 30:31 --> 12:13 */ + if (IS_DSFORM(instr)) + dsisr |= (instr & 0x0003) << 18; + } + + return dsisr; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 94908af308d8..34f55524d456 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -25,14 +25,13 @@ #include #include #include +#include struct aligninfo { unsigned char len; unsigned char flags; }; -#define IS_XFORM(inst) (((inst) >> 26) == 31) -#define IS_DSFORM(inst)(((inst) >> 26) >= 56) #define INVALID { 0, 0 } @@ -192,37 +191,6 @@ static struct aligninfo aligninfo[128] = { }; /* - * Create a DSISR value from the instruction - */ -static inline unsigned make_dsisr(unsigned instr) -{ - unsigned dsisr; - - - /* bits 6:15 --> 22:31 */ - dsisr = (instr & 0x03ff) >> 16; - - if (IS_XFORM(instr)) { - /* bits 29:30 --> 15:16 */ - dsisr |= (instr & 0x0006) << 14; - /* bit 25 -->17 */ - dsisr |= (instr & 0x0040) << 8; - /* bits 21:24 --> 18:21 */ - dsisr |= (instr & 0x0780) << 3; - } else { - /* bit 5 -->17 */ - dsisr |= (instr & 0x0400) >> 12; - /* bits 1: 4 --> 18:21 */ - dsisr |= (instr & 0x7800) >> 17; - /* bits 30:31 --> 12:13 */ - if (IS_DSFORM(instr)) - dsisr |= (instr & 0x0003) << 18; - } - - return dsisr; -} - -/* * The dcbz (data cache block zero) instruction * gives an alignment fault if used on non-cacheable * memory. We handle the fault mainly for the diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 99d40f8977e8..04c38f049dfd 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -569,48 +569,14 @@ unprivileged: u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst) { - u32 dsisr = 0; - - /* -* This is what the spec says about DSISR bits (not mentioned = 0): -* -* 12:13[DS]Set to bits 30:31 -* 15:16[X] Set to bits 29:30 -* 17 [X] Set to bit 25 -* [D/DS] Set to bit 5 -* 18:21[X] Set to bits 21:24 -* [D/DS] Set to bits 1:4 -* 22:26Set to bits 6:10 (RT/RS/FRT/FRS) -* 27:31Set to bits 11:15 (RA) -*/ - - switch (get_op(inst)) { - /* D-form */ - case OP_LFS: - case OP_LFD: - case OP_STFD: - case OP_STFS: - dsisr |= (inst >> 12) & 0x4000; /* bit 17 */ - dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */ - break; - /* X-form */ - case 31: - dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */ - dsisr |= (inst <
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: We reserve 5% of total ram for CMA allocation and not using that can result in us running out of numa node memory with specific configuration. One caveat is we may not have node local hpt with pinned vcpu configuration. But currently libvirt also pins the vcpu to cpuset after creating hash page table. I don't understand the problem. Can you please elaborate? Alex Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index fb25ebc0af0c..f32896ffd784 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm); long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) { - unsigned long hpt; + unsigned long hpt = 0; struct revmap_entry *rev; struct page *page = NULL; long order = KVM_DEFAULT_HPT_ORDER; @@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm->arch.hpt_cma_alloc = 0; - /* -* try first to allocate it from the kernel page allocator. -* We keep the CMA reserved for failed allocation. -*/ - hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT | - __GFP_NOWARN, order - PAGE_SHIFT); - - /* Next try to allocate from the preallocated pool */ - if (!hpt) { - VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER); - page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT)); - if (page) { - hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); - kvm->arch.hpt_cma_alloc = 1; - } else - --order; + VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER); + page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT)); + if (page) { + hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); + kvm->arch.hpt_cma_alloc = 1; } /* Lastly try successively smaller sizes from the page allocator */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
On 05/04/2014 07:26 PM, Aneesh Kumar K.V wrote: With debug option "sleep inside atomic section checking" enabled we get the below WARN_ON during a PR KVM boot. This is because upstream now have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the warning by adding preempt_disable/enable around floating point and altivec enable. WARNING: at arch/powerpc/kernel/process.c:156 Modules linked in: kvm_pr kvm CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: GW 3.15.0-rc1+ #4 task: c000eb85b3a0 ti: c000ec59c000 task.ti: c000ec59c000 NIP: c0015c84 LR: d3334644 CTR: c0015c00 REGS: c000ec59f140 TRAP: 0700 Tainted: GW (3.15.0-rc1+) MSR: 80029032 CR: 4224 XER: 2000 CFAR: c0015c24 SOFTE: 1 GPR00: d3334644 c000ec59f3c0 c0e2fa40 c000e2f8 GPR04: 0800 2000 0001 8000 GPR08: 0001 0001 2000 c0015c00 GPR12: d333da18 cfb80900 GPR16: 3fffce4e0fa1 GPR20: 0010 0001 0002 100b9a38 GPR24: 0002 0013 GPR28: c000eb85b3a0 2000 c000e2f8 NIP [c0015c84] .enable_kernel_fp+0x84/0x90 LR [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] Call Trace: [c000ec59f3c0] [0010] 0x10 (unreliable) [c000ec59f430] [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] [c000ec59f4c0] [d324b380] .kvmppc_set_msr+0x30/0x50 [kvm] [c000ec59f530] [d3337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr] [c000ec59f5f0] [d324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm] [c000ec59f6c0] [d3336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr] [c000ec59f790] [d3338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr] [c000ec59f960] [d3336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr] [c000ec59f9f0] [d324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm] [c000ec59fa60] [d3249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm] [c000ec59faf0] [d3244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm] [c000ec59fcb0] [c0224e34] .do_vfs_ioctl+0x4d4/0x790 [c000ec59fd90] [c0225148] .SyS_ioctl+0x58/0xb0 [c000ec59fe30] [c000a1e4] syscall_exit+0x0/0x98 Signed-off-by: Aneesh Kumar K.V Thanks, applied to kvm-ppc-queue. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: Signed-off-by: Aneesh Kumar K.V No patch description, no proper explanations anywhere why you're doing what. All of that in a pretty sensitive piece of code. There's no way this patch can go upstream in its current form. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On 05/05/2014 03:27 AM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which have been passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure to emulate XICS. Without introducing new mechanism, we just extend that existing infrastructure to support EEH RTAS emulation. EEH RTAS requests initiated from guest are posted to host where the requests get handled or delivered to underly firmware for further handling. For that, the host kerenl has to maintain the PCI address (host domain/bus/slot/function to guest's PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping will be built when initializing VFIO device in QEMU and destroied when the VFIO device in QEMU is going to offline, or VM is destroy. Do you also expose all those interfaces to user space? VFIO is as much about user space device drivers as it is about device assignment. I would like to first see an implementation that doesn't touch KVM emulation code at all but instead routes everything through QEMU. As a second step we can then accelerate performance critical paths inside of KVM. That way we ensure that user space device drivers have all the power over a device they need to drive it. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
On 04/30/2014 11:09 PM, Alexander Graf wrote: On 30.04.14 22:03, Stuart Yoder wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, April 30, 2014 2:56 PM To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall On 30.04.14 21:54, Stuart Yoder wrote: From: Stuart Yoder some restructuring of epapr paravirt init resulted in ppc_md.power_save being set, and then overwritten to NULL during machine_init. This patch splits the initialization of ppc_md.power_save out into a postcore init call. Signed-off-by: Stuart Yoder --- arch/powerpc/kernel/epapr_paravirt.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c index 6300c13..c49b69c 100644 --- a/arch/powerpc/kernel/epapr_paravirt.c +++ b/arch/powerpc/kernel/epapr_paravirt.c @@ -52,11 +52,6 @@ static int __init early_init_dt_scan_epapr(unsigned long node, #endif } -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) -if (of_get_flat_dt_prop(node, "has-idle", NULL)) -ppc_md.power_save = epapr_ev_idle; -#endif - epapr_paravirt_enabled = true; return 1; @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void) return 0; } +static int __init epapr_idle_init_dt_scan(unsigned long node, + const char *uname, + int depth, void *data) +{ +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) +if (of_get_flat_dt_prop(node, "has-idle", NULL)) +ppc_md.power_save = epapr_ev_idle; +#endif +return 0; +} + +static int __init epapr_idle_init(void) +{ +if (epapr_paravirt_enabled) +of_scan_flat_dt(epapr_idle_init_dt_scan, NULL); Doesn't this scan all nodes? We only want to match on /hypervisor/has-idle, no? I cut/pasted from the approach the existing code in that file took, but yes you're right we just need the one property. Let me respin that to look at the hypervisor node only. Yeah, the same commit that introduced the breakage on has-idle also removed the explicit check for /hypervisor. Laurentiu, was this change on purpose? Alex, IIRC, at that time i had to switch from the normal "of" functions to a completely different api that's available in early init stage. This early "of" api is pretty limited (e.g. doesn't have a way to address a specific node) and i had to use that function that scans the whole tree. --- Best Regards, Laurentiu ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
On 05/05/2014 02:17 PM, Tudor Laurentiu wrote: On 04/30/2014 11:09 PM, Alexander Graf wrote: On 30.04.14 22:03, Stuart Yoder wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, April 30, 2014 2:56 PM To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall On 30.04.14 21:54, Stuart Yoder wrote: From: Stuart Yoder some restructuring of epapr paravirt init resulted in ppc_md.power_save being set, and then overwritten to NULL during machine_init. This patch splits the initialization of ppc_md.power_save out into a postcore init call. Signed-off-by: Stuart Yoder --- arch/powerpc/kernel/epapr_paravirt.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c index 6300c13..c49b69c 100644 --- a/arch/powerpc/kernel/epapr_paravirt.c +++ b/arch/powerpc/kernel/epapr_paravirt.c @@ -52,11 +52,6 @@ static int __init early_init_dt_scan_epapr(unsigned long node, #endif } -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) -if (of_get_flat_dt_prop(node, "has-idle", NULL)) -ppc_md.power_save = epapr_ev_idle; -#endif - epapr_paravirt_enabled = true; return 1; @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void) return 0; } +static int __init epapr_idle_init_dt_scan(unsigned long node, + const char *uname, + int depth, void *data) +{ +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) +if (of_get_flat_dt_prop(node, "has-idle", NULL)) +ppc_md.power_save = epapr_ev_idle; +#endif +return 0; +} + +static int __init epapr_idle_init(void) +{ +if (epapr_paravirt_enabled) +of_scan_flat_dt(epapr_idle_init_dt_scan, NULL); Doesn't this scan all nodes? We only want to match on /hypervisor/has-idle, no? I cut/pasted from the approach the existing code in that file took, but yes you're right we just need the one property. Let me respin that to look at the hypervisor node only. Yeah, the same commit that introduced the breakage on has-idle also removed the explicit check for /hypervisor. Laurentiu, was this change on purpose? Alex, IIRC, at that time i had to switch from the normal "of" functions to a completely different api that's available in early init stage. This early "of" api is pretty limited (e.g. doesn't have a way to address a specific node) and i had to use that function that scans the whole tree. Ok, so it is an accident. Could you please post a patch that checks that the node we're looking at is called "hypervisor"? The simple API should give you enough information for that at least. Maybe you could even check that the parent node is the root node. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
On 05/05/2014 03:21 PM, Alexander Graf wrote: On 05/05/2014 02:17 PM, Tudor Laurentiu wrote: On 04/30/2014 11:09 PM, Alexander Graf wrote: On 30.04.14 22:03, Stuart Yoder wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, April 30, 2014 2:56 PM To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall On 30.04.14 21:54, Stuart Yoder wrote: From: Stuart Yoder some restructuring of epapr paravirt init resulted in ppc_md.power_save being set, and then overwritten to NULL during machine_init. This patch splits the initialization of ppc_md.power_save out into a postcore init call. Signed-off-by: Stuart Yoder --- arch/powerpc/kernel/epapr_paravirt.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c index 6300c13..c49b69c 100644 --- a/arch/powerpc/kernel/epapr_paravirt.c +++ b/arch/powerpc/kernel/epapr_paravirt.c @@ -52,11 +52,6 @@ static int __init early_init_dt_scan_epapr(unsigned long node, #endif } -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) -if (of_get_flat_dt_prop(node, "has-idle", NULL)) -ppc_md.power_save = epapr_ev_idle; -#endif - epapr_paravirt_enabled = true; return 1; @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void) return 0; } +static int __init epapr_idle_init_dt_scan(unsigned long node, + const char *uname, + int depth, void *data) +{ +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64) +if (of_get_flat_dt_prop(node, "has-idle", NULL)) +ppc_md.power_save = epapr_ev_idle; +#endif +return 0; +} + +static int __init epapr_idle_init(void) +{ +if (epapr_paravirt_enabled) +of_scan_flat_dt(epapr_idle_init_dt_scan, NULL); Doesn't this scan all nodes? We only want to match on /hypervisor/has-idle, no? I cut/pasted from the approach the existing code in that file took, but yes you're right we just need the one property. Let me respin that to look at the hypervisor node only. Yeah, the same commit that introduced the breakage on has-idle also removed the explicit check for /hypervisor. Laurentiu, was this change on purpose? Alex, IIRC, at that time i had to switch from the normal "of" functions to a completely different api that's available in early init stage. This early "of" api is pretty limited (e.g. doesn't have a way to address a specific node) and i had to use that function that scans the whole tree. Ok, so it is an accident. Could you please post a patch that checks that the node we're looking at is called "hypervisor"? The simple API should give you enough information for that at least. Maybe you could even check that the parent node is the root node. Just had a quick look and it looks that that early fdt api was improved with a function that allows specifying a starting path for the scan (of_scan_flat_dt_by_path() added in commit 57d74bcf3072b65bde5aa540cedc976a75c48e5c). So i think we can simply use that instead. --- Best Regards, Laurentiu ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: memcpy optimization for 64bit LE
Anton Blanchard wrote: Unaligned stores take alignment exceptions on POWER7 running in little-endian. This is a dumb little-endian base memcpy that prevents unaligned stores. Once booted the feature fixup code switches over to the VMX copy loops (which are already endian safe). The question is what we do before that switch over. The base 64bit memcpy takes alignment exceptions on POWER7 so we can't use it as is. Fixing the causes of alignment exception would slow it down, because we'd need to ensure all loads and stores are aligned either through rotate tricks or bytewise loads and stores. Either would be bad for all other 64bit platforms. [ I simplified the loop a bit - Anton ] Got it. The 3 instructions that you have removed were modifying r5 for no reason, as the last instruction was always resetting r5 to its initial value. Philippe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: > On 05/05/2014 03:27 AM, Gavin Shan wrote: > > The series of patches intends to support EEH for PCI devices, which have > > been > > passed through to PowerKVM based guest via VFIO. The implementation is > > straightforward based on the issues or problems we have to resolve to > > support > > EEH for PowerKVM based guest. > > > > - Emulation for EEH RTAS requests. Thanksfully, we already have > > infrastructure > >to emulate XICS. Without introducing new mechanism, we just extend that > >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests > >initiated from guest are posted to host where the requests get handled or > >delivered to underly firmware for further handling. For that, the host > > kerenl > >has to maintain the PCI address (host domain/bus/slot/function to guest's > >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address > > mapping > >will be built when initializing VFIO device in QEMU and destroied when > > the > >VFIO device in QEMU is going to offline, or VM is destroy. > > Do you also expose all those interfaces to user space? VFIO is as much > about user space device drivers as it is about device assignment. > > I would like to first see an implementation that doesn't touch KVM > emulation code at all but instead routes everything through QEMU. As a > second step we can then accelerate performance critical paths inside of KVM. > > That way we ensure that user space device drivers have all the power > over a device they need to drive it. +1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Alexander Graf writes: > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: >> Although it's optional IBM POWER cpus always had DAR value set on >> alignment interrupt. So don't try to compute these values. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> Changes from V3: >> * Use make_dsisr instead of checking feature flag to decide whether to use >>saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >> { >> +#ifdef CONFIG_PPC_BOOK3S_64 >> +return vcpu->arch.fault_dar; > > How about PA6T and G5s? > > Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Alexander Graf writes: > On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: >> We reserve 5% of total ram for CMA allocation and not using that can >> result in us running out of numa node memory with specific >> configuration. One caveat is we may not have node local hpt with pinned >> vcpu configuration. But currently libvirt also pins the vcpu to cpuset >> after creating hash page table. > > I don't understand the problem. Can you please elaborate? > > Lets take a system with 100GB RAM. We reserve around 5GB for htab allocation. Now if we use rest of available memory for hugetlbfs (because we want all the guest to be backed by huge pages), we would end up in a situation where we have a few GB of free RAM and 5GB of CMA reserve area. Now if we allow hash page table allocation to consume the free space, we would end up hitting page allocation failure for other non movable kernel allocation even though we still have 5GB CMA reserve space free. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. Yes, and I'm asking whether we know that this statement holds true for PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least developed by IBM, I'd assume its semantics here are similar to POWER4, but for PA6T I wouldn't be so sure. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Alexander Graf writes: > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: >> Signed-off-by: Aneesh Kumar K.V > > No patch description, no proper explanations anywhere why you're doing > what. All of that in a pretty sensitive piece of code. There's no way > this patch can go upstream in its current form. > Sorry about being vague. Will add a better commit message. The goal is to export MPSS support to guest if the host support the same. MPSS support is exported via penc encoding in "ibm,segment-page-sizes". The actual format can be found at htab_dt_scan_page_sizes. When the guest memory is backed by hugetlbfs we expose the penc encoding the host support to guest via kvmppc_add_seg_page_size. Now the challenge to THP support is to make sure that our henter, hremove etc decode base page size and actual page size correctly from the hash table entry values. Most of the changes is to do that. Rest of the stuff is already handled by kvm. NOTE: It is much easier to read the code after applying the patch rather than reading the diff. I have added comments around each steps in the code. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Alexander Graf writes: > On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> Alexander Graf writes: >> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; >>> How about PA6T and G5s? >>> >>> >> Paul mentioned that BOOK3S always had DAR value set on alignment >> interrupt. And the patch is to enable/collect correct DAR value when >> running with Little Endian PR guest. Now to limit the impact and to >> enable Little Endian PR guest, I ended up doing the conditional code >> only for book3s 64 for which we know for sure that we set DAR value. > > Yes, and I'm asking whether we know that this statement holds true for > PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is > at least developed by IBM, I'd assume its semantics here are similar to > POWER4, but for PA6T I wouldn't be so sure. I will have to defer to Paul on that question. But that should not prevent this patch from going upstream right ? -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Olof Johansson writes: > 2014-05-05 7:43 GMT-07:00 Alexander Graf : > >> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> >>> Alexander Graf writes: >>> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to > use > saved dsisr or not > > >>> >>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >{ > +#ifdef CONFIG_PPC_BOOK3S_64 > + return vcpu->arch.fault_dar; > How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >>> >> >> Yes, and I'm asking whether we know that this statement holds true for >> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at >> least developed by IBM, I'd assume its semantics here are similar to >> POWER4, but for PA6T I wouldn't be so sure. >> >> > Thanks for looking out for us, obviously IBM doesn't (based on the reply a > minute ago). The reason I deferred the question to Paul is really because I don't know enough about PA6T and G5 to comment. I intentionally restricted the changes to BOOK3S_64 because I wanted to make sure I don't break anything else. It is in no way to hint that others don't care. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
[Now without HTML email -- it's what you get for cc:ing me at work instead of my upstream email :)] 2014-05-05 7:43 GMT-07:00 Alexander Graf : > > On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> >> Alexander Graf writes: >> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; >>> >>> How about PA6T and G5s? >>> >>> >> Paul mentioned that BOOK3S always had DAR value set on alignment >> interrupt. And the patch is to enable/collect correct DAR value when >> running with Little Endian PR guest. Now to limit the impact and to >> enable Little Endian PR guest, I ended up doing the conditional code >> only for book3s 64 for which we know for sure that we set DAR value. > > > Yes, and I'm asking whether we know that this statement holds true for PA6T > and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least > developed by IBM, I'd assume its semantics here are similar to POWER4, but > for PA6T I wouldn't be so sure. > Thanks for looking out for us, obviously IBM doesn't (based on the reply a minute ago). In the end, since there's been no work to enable KVM on PA6T, I'm not too worried. I guess it's one more thing to sort out (and check for) whenever someone does that. I definitely don't have cycles to deal with that myself at this time. I can help find hardware for someone who wants to, but even then I'm guessing the interest is pretty limited. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
> Am 05.05.2014 um 16:57 schrieb Olof Johansson : > > [Now without HTML email -- it's what you get for cc:ing me at work > instead of my upstream email :)] > > 2014-05-05 7:43 GMT-07:00 Alexander Graf : >> >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> >>> Alexander Graf writes: >>> > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to use >saved dsisr or not >>> >>> > ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) > { > +#ifdef CONFIG_PPC_BOOK3S_64 > + return vcpu->arch.fault_dar; How about PA6T and G5s? >>> Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >> >> >> Yes, and I'm asking whether we know that this statement holds true for PA6T >> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least >> developed by IBM, I'd assume its semantics here are similar to POWER4, but >> for PA6T I wouldn't be so sure. > > Thanks for looking out for us, obviously IBM doesn't (based on the > reply a minute ago). > > In the end, since there's been no work to enable KVM on PA6T, I'm not > too worried. I guess it's one more thing to sort out (and check for) > whenever someone does that. > > I definitely don't have cycles to deal with that myself at this time. > I can help find hardware for someone who wants to, but even then I'm > guessing the interest is pretty limited. I know of at least 1 person who successfully runs PR KVM on a PA6T, so it's neither neglected nor non-working. If you can get me access to a pa6t system I can easily check whether alignment interrupts generate dar and dsisr properly :). Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
> Am 05.05.2014 um 16:50 schrieb "Aneesh Kumar K.V" > : > > Alexander Graf writes: > >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> Alexander Graf writes: >>> > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to use >saved dsisr or not >>> >>> > ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) > { > +#ifdef CONFIG_PPC_BOOK3S_64 > +return vcpu->arch.fault_dar; How about PA6T and G5s? >>> Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >> >> Yes, and I'm asking whether we know that this statement holds true for >> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is >> at least developed by IBM, I'd assume its semantics here are similar to >> POWER4, but for PA6T I wouldn't be so sure. > > I will have to defer to Paul on that question. But that should not > prevent this patch from going upstream right ? Regressions are big no-gos. Alex > > -aneesh > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" > : > > Alexander Graf writes: > >>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: >>> We reserve 5% of total ram for CMA allocation and not using that can >>> result in us running out of numa node memory with specific >>> configuration. One caveat is we may not have node local hpt with pinned >>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset >>> after creating hash page table. >> >> I don't understand the problem. Can you please elaborate? > > Lets take a system with 100GB RAM. We reserve around 5GB for htab > allocation. Now if we use rest of available memory for hugetlbfs > (because we want all the guest to be backed by huge pages), we would > end up in a situation where we have a few GB of free RAM and 5GB of CMA > reserve area. Now if we allow hash page table allocation to consume the > free space, we would end up hitting page allocation failure for other > non movable kernel allocation even though we still have 5GB CMA reserve > space free. Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails, no? The fact that KVM uses a good number of normal kernel pages is maybe suboptimal, but shouldn't be a critical problem. Alex > > -aneesh > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Alexander Graf writes: >> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" >> : >> >> Alexander Graf writes: >> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: We reserve 5% of total ram for CMA allocation and not using that can result in us running out of numa node memory with specific configuration. One caveat is we may not have node local hpt with pinned vcpu configuration. But currently libvirt also pins the vcpu to cpuset after creating hash page table. >>> >>> I don't understand the problem. Can you please elaborate? >> >> Lets take a system with 100GB RAM. We reserve around 5GB for htab >> allocation. Now if we use rest of available memory for hugetlbfs >> (because we want all the guest to be backed by huge pages), we would >> end up in a situation where we have a few GB of free RAM and 5GB of CMA >> reserve area. Now if we allow hash page table allocation to consume the >> free space, we would end up hitting page allocation failure for other >> non movable kernel allocation even though we still have 5GB CMA reserve >> space free. > > Isn't this a greater problem? We should start swapping before we hit > the point where non movable kernel allocation fails, no? But there is nothing much to swap. Because most of the memory is reserved for guest RAM via hugetlbfs. > > The fact that KVM uses a good number of normal kernel pages is maybe > suboptimal, but shouldn't be a critical problem. Yes. But then in this case we could do better isn't it ? We already have a large part of guest RAM kept aside for htab allocation which cannot be used for non movable allocation. And we ignore that reserve space and use other areas for hash page table allocation with the current code. We actually hit this case in one of the test box. KVM guest htab at c01e5000 (order 30), LPID 1 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0 libvirtd cpuset=/ mems_allowed=0,16 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1 Call Trace: [c01e3b63f150] [c0017330] .show_stack+0x130/0x200(unreliable) [c01e3b63f220] [c087a888] .dump_stack+0x28/0x3c [c01e3b63f290] [c0876a4c] .dump_header+0xbc/0x228 [c01e3b63f360] [c01dd838].oom_kill_process+0x318/0x4c0 [c01e3b63f440] [c01de258] .out_of_memory+0x518/0x550 [c01e3b63f520] [c01e5aac].__alloc_pages_nodemask+0xb3c/0xbf0 [c01e3b63f700] [c0243580] .new_slab+0x440/0x490 [c01e3b63f7a0] [c08781fc] .__slab_alloc+0x17c/0x618 [c01e3b63f8d0] [c02467fc].kmem_cache_alloc_node_trace+0xcc/0x300 [c01e3b63f990] [c010f62c].alloc_fair_sched_group+0xfc/0x200 [c01e3b63fa60] [c0104f00].sched_create_group+0x50/0xe0 [c01e3b63fae0] [c0104fc0].cpu_cgroup_css_alloc+0x30/0x80 [c01e3b63fb60] [c01513ec] .cgroup_mkdir+0x2bc/0x6e0 [c01e3b63fc50] [c0275aec] .vfs_mkdir+0x14c/0x220 [c01e3b63fcf0] [c027a734] .SyS_mkdirat+0x94/0x110 [c01e3b63fdb0] [c027a7e4] .SyS_mkdir+0x34/0x50 [c01e3b63fe30] [c0009f54] syscall_exit+0x0/0x98 Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
From: Diana Craciun The CoreNet coherency fabric is a fabric-oriented, conectivity infrastructure that enables the implementation of coherent, multicore systems. The CCF acts as a central interconnect for cores, platform-level caches, memory subsystem, peripheral devices and I/O host bridges in the system. Signed-off-by: Diana Craciun --- v3: - added port ID mapping - removed fsl,corenetx-cf .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 ++ .../devicetree/bindings/powerpc/fsl/cpus.txt | 8 + .../devicetree/bindings/powerpc/fsl/pamu.txt | 8 + 3 files changed, 58 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt new file mode 100644 index 000..1263c29 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt @@ -0,0 +1,42 @@ +Freescale CoreNet Coherency Fabric(CCF) Device Tree Binding + +DESCRIPTION + +The CoreNet coherency fabric is a fabric-oriented, connectivity infrastructure +that enables the implementation of coherent, multicore systems. + +Required properties: + +- compatible : + fsl,corenet1-cf - CoreNet coherency fabric version 1. Example chips: T4240, + B4860 + fsl,corenet2-cf - CoreNet coherency fabric version 2. Example chips: P5020, + P4080, P3041, P2041 + fsl,corenet-cf - It is used to represent the common registers between + CCF version 1 and CCF version 2. This compatible is retained for + compatibility reasons as it was already used for both CCF version 1 chips + and CCF version 2 chips. + +- reg : + A standard property. Represents the CCF registers. + +- interrupts : + Interrupt mapping for CCF error interrupt. + +- fsl,ccf-num-csdids: + Specifies the number of Coherency Subdomain ID Port Mapping + Registers that are supported by the CCF. + +- fsl,ccf-num-snoopids: + Specifies the number of Snoop ID Port Mapping Registers that + are supported by CCF. + +Example: + + corenet-cf@18000 { + compatible = "fsl,corenet2-cf", "fsl,corenet-cf"; + reg = <0x18000 0x1000>; + interrupts = <16 2 1 31>; + fsl,ccf-num-csdids = <32>; + fsl,ccf-num-snoopids = <32>; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt index 922c30a..09dbc5f 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt @@ -20,3 +20,11 @@ PROPERTIES a property named fsl,eref-[CAT], where [CAT] is the abbreviated category name with all uppercase letters converted to lowercase, indicates that the category is supported by the implementation. + + - fsl,portid-mapping : + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port Mapping + registers which are part of the CoreNet Coherency fabric (CCF) provide a + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping functions. + Certain bits from these registers should be set if the coresponding CPU + should be snooped. This property defines a bitmask which selects the bit that + should be set if this cpu should be snooped. diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt index 1f5e329..827c637 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt @@ -26,6 +26,13 @@ Required properties: A standard property. - #size-cells : A standard property. +- fsl,portid-mapping : + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port Mapping + registers which are part of the CoreNet Coherency fabric (CCF) provide a + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping functions. + Certain bits from these registers should be set if PAMUs should be snooped. + This property defines a bitmask which selects the bits that should be set + if PAMUs should be snooped. Optional properties: - reg : @@ -88,6 +95,7 @@ Example: compatible = "fsl,pamu-v1.0", "fsl,pamu"; reg = <0x2 0x5000>; ranges = <0 0x2 0x5000>; + fsl,portid-mapping = <0xf8>; #address-cells = <1>; #size-cells = <1>; interrupts = < -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-de
[PATCH v2] powerpc/fsl: Updated device trees for platforms with corenet version 2
From: Diana Craciun Updated the device trees according to the corenet-cf binding definition. Signed-off-by: Diana Craciun --- arch/powerpc/boot/dts/b4860emu.dts | 7 ++- arch/powerpc/boot/dts/fsl/b4420si-post.dtsi | 4 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi | 2 ++ arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 4 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi | 4 arch/powerpc/boot/dts/fsl/b4si-post.dtsi| 3 ++- arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/t4240si-pre.dtsi | 12 arch/powerpc/boot/dts/t4240emu.dts | 15 ++- 9 files changed, 42 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/boot/dts/b4860emu.dts b/arch/powerpc/boot/dts/b4860emu.dts index 7290021..85646b4 100644 --- a/arch/powerpc/boot/dts/b4860emu.dts +++ b/arch/powerpc/boot/dts/b4860emu.dts @@ -61,21 +61,25 @@ device_type = "cpu"; reg = <0 1>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; cpu1: PowerPC,e6500@2 { device_type = "cpu"; reg = <2 3>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; cpu2: PowerPC,e6500@4 { device_type = "cpu"; reg = <4 5>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; cpu3: PowerPC,e6500@6 { device_type = "cpu"; reg = <6 7>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; }; }; @@ -157,7 +161,7 @@ }; corenet-cf@18000 { - compatible = "fsl,b4-corenet-cf"; + compatible = "fsl,corenet2-cf", "fsl,corenet-cf"; reg = <0x18000 0x1000>; interrupts = <16 2 1 0>; fsl,ccf-num-csdids = <32>; @@ -167,6 +171,7 @@ iommu@2 { compatible = "fsl,pamu-v1.0", "fsl,pamu"; reg = <0x2 0x4000>; + fsl,portid-mapping = <0x8000>; #address-cells = <1>; #size-cells = <1>; interrupts = < diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi index 60566f99..d678944 100644 --- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi @@ -76,10 +76,6 @@ compatible = "fsl,b4420-l3-cache-controller", "cache"; }; - corenet-cf@18000 { - compatible = "fsl,b4420-corenet-cf"; - }; - guts: global-utilities@e { compatible = "fsl,b4420-device-config", "fsl,qoriq-device-config-2.0"; }; diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi index 2419731..338af7e 100644 --- a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi @@ -66,12 +66,14 @@ reg = <0 1>; clocks = <&mux0>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; cpu1: PowerPC,e6500@2 { device_type = "cpu"; reg = <2 3>; clocks = <&mux0>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; }; }; diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi index cbc354b..582381d 100644 --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi @@ -120,10 +120,6 @@ compatible = "fsl,b4860-l3-cache-controller", "cache"; }; - corenet-cf@18000 { - compatible = "fsl,b4860-corenet-cf"; - }; - guts: global-utilities@e { compatible = "fsl,b4860-device-config", "fsl,qoriq-device-config-2.0"; }; diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi index 142ac86..1948f73 100644 --- a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi @@ -66,24 +66,28 @@ reg = <0 1>; clocks = <&mux0>; next-level-cache = <&L2>; + fsl,portid-mapping = <0x8000>; }; cpu1: PowerPC,e6500@2 { device_type = "cpu"; reg = <2 3>; clocks = <&mu
[PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
P1023RDS is no longer supported/manufactured by Freescale while P1023RDB is. Signed-off-by: Lijun Pan --- arch/powerpc/boot/dts/p1023rds.dts | 219 - arch/powerpc/configs/mpc85xx_defconfig | 1 - arch/powerpc/configs/mpc85xx_smp_defconfig | 1 - arch/powerpc/platforms/85xx/Kconfig| 6 +- arch/powerpc/platforms/85xx/Makefile | 2 +- .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}| 36 +--- 6 files changed, 10 insertions(+), 255 deletions(-) delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%) diff --git a/arch/powerpc/boot/dts/p1023rds.dts b/arch/powerpc/boot/dts/p1023rds.dts deleted file mode 100644 index beb6cb1..000 --- a/arch/powerpc/boot/dts/p1023rds.dts +++ /dev/null @@ -1,219 +0,0 @@ -/* - * P1023 RDS Device Tree Source - * - * Copyright 2010-2011 Freescale Semiconductor Inc. - * - * Author: Roy Zang - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are met: - * * Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * * Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * * Neither the name of Freescale Semiconductor nor the - * names of its contributors may be used to endorse or promote products - * derived from this software without specific prior written permission. - * - * - * ALTERNATIVELY, this software may be distributed under the terms of the - * GNU General Public License ("GPL") as published by the Free Software - * Foundation, either version 2 of that License or (at your option) any - * later version. - * - * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY - * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE - * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY - * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES - * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; - * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND - * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS - * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -/include/ "fsl/p1023si-pre.dtsi" - -/ { - model = "fsl,P1023"; - compatible = "fsl,P1023RDS"; - #address-cells = <2>; - #size-cells = <2>; - interrupt-parent = <&mpic>; - - memory { - device_type = "memory"; - }; - - soc: soc@ff60 { - ranges = <0x0 0x0 0xff60 0x20>; - - i2c@3000 { - rtc@68 { - compatible = "dallas,ds1374"; - reg = <0x68>; - }; - }; - - spi@7000 { - fsl_dataflash@0 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "atmel,at45db081d"; - reg = <0>; - spi-max-frequency = <4000>; /* input clock */ - partition@u-boot { - /* 512KB for u-boot Bootloader Image */ - label = "u-boot-spi"; - reg = <0x 0x0008>; - read-only; - }; - partition@dtb { - /* 512KB for DTB Image */ - label = "dtb-spi"; - reg = <0x0008 0x0008>; - read-only; - }; - }; - }; - - usb@22000 { - dr_mode = "host"; - phy_type = "ulpi"; - }; - }; - - lbc: localbus@ff605000 { - reg = <0 0xff605000 0 0x1000>; - - /* NOR Flash, BCSR */ - ranges = <0x0 0x0 0x0 0xee00 0x0200 - 0x1 0x0 0x0 0xe000 0x8000>; - - nor@0,0 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "cfi-fl
Re: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
On Mon, 2014-05-05 at 13:23 -0500, Lijun Pan wrote: > P1023RDS is no longer supported/manufactured by Freescale while P1023RDB is. > > Signed-off-by: Lijun Pan > --- > arch/powerpc/boot/dts/p1023rds.dts | 219 > - > arch/powerpc/configs/mpc85xx_defconfig | 1 - > arch/powerpc/configs/mpc85xx_smp_defconfig | 1 - > arch/powerpc/platforms/85xx/Kconfig| 6 +- > arch/powerpc/platforms/85xx/Makefile | 2 +- > .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}| 36 +--- > 6 files changed, 10 insertions(+), 255 deletions(-) > delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts > rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%) What changed from v1? If you want this patch merged, please respond to the comments on v1. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
> -Original Message- > From: Wood Scott-B07421 > Sent: Monday, May 05, 2014 2:05 PM > To: Pan Lijun-B44306 > Cc: linuxppc-...@ozlabs.org; Medve Emilian-EMMEDVE1 > Subject: Re: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support > > On Mon, 2014-05-05 at 13:23 -0500, Lijun Pan wrote: > > P1023RDS is no longer supported/manufactured by Freescale while > P1023RDB is. > > > > Signed-off-by: Lijun Pan > > --- > > arch/powerpc/boot/dts/p1023rds.dts | 219 - > > > arch/powerpc/configs/mpc85xx_defconfig | 1 - > > arch/powerpc/configs/mpc85xx_smp_defconfig | 1 - > > arch/powerpc/platforms/85xx/Kconfig| 6 +- > > arch/powerpc/platforms/85xx/Makefile | 2 +- > > .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}| 36 +--- > > 6 files changed, 10 insertions(+), 255 deletions(-) > > delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts > > rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%) > > What changed from v1? "Please wrap changelogs at no more than 75 columns." > If you want this patch merged, please respond to the comments on v1. > > -Scott > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
On Mon, 2014-05-05 at 18:58 +0300, Diana Craciun wrote: > diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > index 922c30a..09dbc5f 100644 > --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > @@ -20,3 +20,11 @@ PROPERTIES > a property named fsl,eref-[CAT], where [CAT] is the abbreviated category > name with all uppercase letters converted to lowercase, indicates that > the category is supported by the implementation. > + > + - fsl,portid-mapping : > + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port > Mapping > + registers which are part of the CoreNet Coherency fabric (CCF) provide a > + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping > functions. > + Certain bits from these registers should be set if the coresponding CPU > + should be snooped. This property defines a bitmask which selects the > bit that > + should be set if this cpu should be snooped. Please follow existing formatting in this file. > diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > index 1f5e329..827c637 100644 > --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > @@ -26,6 +26,13 @@ Required properties: > A standard property. > - #size-cells: > A standard property. > +- fsl,portid-mapping : > + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port > Mapping > + registers which are part of the CoreNet Coherency fabric (CCF) provide a > + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping > functions. > + Certain bits from these registers should be set if PAMUs should be > snooped. > + This property defines a bitmask which selects the bits that should be > set > + if PAMUs should be snooped. This can't be a required property since existing trees don't have it -- in addition to allowing for the possibility of a PAMU where the snoop ID is not known or where the snoop domain mechanism does not exist. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Am 05.05.14 16:57, schrieb Olof Johansson: [Now without HTML email -- it's what you get for cc:ing me at work instead of my upstream email :)] 2014-05-05 7:43 GMT-07:00 Alexander Graf : On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. Yes, and I'm asking whether we know that this statement holds true for PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least developed by IBM, I'd assume its semantics here are similar to POWER4, but for PA6T I wouldn't be so sure. Thanks for looking out for us, obviously IBM doesn't (based on the reply a minute ago). In the end, since there's been no work to enable KVM on PA6T, I'm not too worried. I guess it's one more thing to sort out (and check for) whenever someone does that. I definitely don't have cycles to deal with that myself at this time. I can help find hardware for someone who wants to, but even then I'm guessing the interest is pretty limited. -Olof -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Just for info: "PR" KVM works great on my PA6T machine. I booted the Lubuntu 14.04 PowerPC live DVD on a QEMU virtual machine with "PR" KVM successfully. But Mac OS X Jaguar, Panther, and Tiger don't boot with KVM on Mac-on-Linux and QEMU. See http://forum.hyperion-entertainment.biz/viewtopic.php?f=35&t=1747. -- Christian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
2014-05-05 8:03 GMT-07:00 Aneesh Kumar K.V : > Olof Johansson writes: > >> 2014-05-05 7:43 GMT-07:00 Alexander Graf : >> >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > >> Although it's optional IBM POWER cpus always had DAR value set on >> alignment interrupt. So don't try to compute these values. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> Changes from V3: >> * Use make_dsisr instead of checking feature flag to decide whether to >> use >> saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >>{ >> +#ifdef CONFIG_PPC_BOOK3S_64 >> + return vcpu->arch.fault_dar; >> > How about PA6T and G5s? > > > Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. >>> >>> Yes, and I'm asking whether we know that this statement holds true for >>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at >>> least developed by IBM, I'd assume its semantics here are similar to >>> POWER4, but for PA6T I wouldn't be so sure. >>> >>> >> Thanks for looking out for us, obviously IBM doesn't (based on the reply a >> minute ago). > > The reason I deferred the question to Paul is really because I don't > know enough about PA6T and G5 to comment. I intentionally restricted the > changes to BOOK3S_64 because I wanted to make sure I don't break > anything else. It is in no way to hint that others don't care. Ah, I see -- the disconnect is that you don't think PA6T and 970 are 64-bit book3s CPUs. They are. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc: Fix comment around arch specific definition of RECLAIM_DISTANCE
> -Original Message- > From: Preeti U Murthy [mailto:pre...@linux.vnet.ibm.com] > Sent: Monday, May 05, 2014 1:17 AM > To: linuxppc-dev@lists.ozlabs.org > Cc: b...@kernel.crashing.org; an...@samba.org; Motohiro Kosaki JP > Subject: [PATCH] powerpc: Fix comment around arch specific definition of > RECLAIM_DISTANCE > > Commit 32e45ff43eaf5c17f changed the default value of RECLAIM_DISTANCE to 30. > However the comment around arch specifc > definition of RECLAIM_DISTANCE is not updated to reflect the same. Correct > the value mentioned in the comment. > > Signed-off-by: Preeti U Murthy > Cc: Anton Blanchard > Cc: Benjamin Herrenschmidt > Cc: KOSAKI Motohiro Acked-by: KOSAKI Motohiro ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan
On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote: > Hello Scott, > > > On 04/21/2014 05:11 PM, Scott Wood wrote: > > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote: > >> +fman@40 { > >> + mdio@f1000 { > >> + #address-cells = <1>; > >> + #size-cells = <0>; > >> + compatible = "fsl,fman-xmdio"; > >> + reg = <0xf1000 0x1000>; > >> + }; > >> +}; > > > > I'd like to see a complete fman binding before we start adding pieces. > > The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years > ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted > without a binding writeup. Pushing driver code through the netdev tree does not establish device tree ABI. Binding documents and dts files do. > This patch series should probably include a > binding blurb. However, let's not gate this patchset on a complete > binding for the FMan I at least want to see enough of the FMan binding to have confidence that what we're adding now is correct. > As you know we don't own the FMan work and the FMan work is... not ready > for upstreaming. I'm not asking for a driver, just a binding that describes hardware. Is there any reason why the fman node needs to be anywhere near as complicated as it is in the SDK, if we're limiting it to actual hardware description? Do we really need to have nodes for all the sub-blocks? > In an attempt to make some sort of progress we've > decided to upstream the pieces that are less controversial and MDIO is > an obvious candidate > > >> +fman@40 { > >> + mdio0: mdio@e1120 { > >> + #address-cells = <1>; > >> + #size-cells = <0>; > >> + compatible = "fsl,fman-mdio"; > >> + reg = <0xe1120 0xee0>; > >> + }; > >> +}; > > > > What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"? I > > don't see the latter on the list of compatibles in patch 3/6. > > 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is > the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi > "respin this patch wi..."? > I believe 'fsl,fman-mdio' (and others on that list) was added > gratuitously as the FMan MDIO is completely compatible with the > eTSEC/gianfar MDIO driver, but we can deal with that later It's still good to identify the specific device, even if it's believed to be 100% compatible. Plus, IIRC there's been enough badness in the eTSEC MDIO binding that it'd be good to steer clear of it. > > Within each category, is the exact fman version discoverable from the > > mdio registers? > > No, but that's irrelevant as that's not the difference between the two > compatibles It's relevant because it means the compatible string should have a block version number in it, or at least some other way in the MDIO node to indicate the block version. > >> +fman@50 { > >> + #address-cells = <1>; > >> + #size-cells = <1>; > >> + compatible = "simple-bus"; > > > > Why is this simple-bus? > > Because that's the translation type for the FMan sub-nodes. What do you mean by "translation type"? > We need it now to get the MDIO nodes probed No. "simple-bus" is stating an attribute of the hardware, that the child nodes represent simple memory-mapped devices that can be used without special bus knowledge. I don't think that applies here. You can get the MDIO node probed without misusing simple-bus by adding the fman node's compatible to the probe list in the kernel code. This sort of thing is why I want to see what the rest of the fman binding will look like. > and we'll needed later to probe other nodes/devices that will have > standalone drivers: MAC, MURAM. etc. How are they truly standalone? The exist in service to the greater entity that is fman. They presumably work together in some fashion. > >> + /* mdio nodes for fman v3 @ 0x50 */ > >> + mdio@fc000 { > >> + #address-cells = <1>; > >> + #size-cells = <0>; > >> + reg = <0xfc000 0x1000>; > >> + }; > >> + > >> + mdio@fd000 { > >> + #address-cells = <1>; > >> + #size-cells = <0>; > >> + reg = <0xfd000 0x1000>; > >> + }; > >> +}; > > > > Where's the compatible? Why is this file different from all the others? > > The FMan v3 MDIO block (supports both Clause 22/45) is compatible with > the FMan v2 10 Gb/s MDIO (the xgmac-mdio driver). However, the driver > needs a small clean-up patch (still in internal review) that will get it > working for FMan v3 MDIO. This suggests that it is not 100% backwards compatible. > With that patch will add the compatible to these nodes. However, we > need these nodes now for the board level MDIO bus muxing support > (included in this patchset) If you need these nodes now then add the compatible property now. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)
On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote: > Hello Scott, > > > On 04/21/2014 05:14 PM, Scott Wood wrote: > > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote: > >> FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs. > >> Add support for the internal SerDes TBI PHYs > >> > >> Based on prior work by Andy Fleming > >> > >> Signed-off-by: Shruti Kanetkar > >> --- > >> arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 28 + > >> arch/powerpc/boot/dts/fsl/b4si-post.dtsi| 51 + > >> arch/powerpc/boot/dts/fsl/p1023si-post.dtsi | 14 +++ > >> arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 64 > >> arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 64 > >> arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++ > >> arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 64 > >> arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++ > >> arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 > >> > >> 9 files changed, 671 insertions(+) > >> > >> diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi > >> b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi > >> index cbc354b..45b0ff5 100644 > >> --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi > >> +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi > >> @@ -172,6 +172,34 @@ > >>compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0"; > >>}; > >> > >> +/include/ "qoriq-fman3-0-1g-4.dtsi" > >> +/include/ "qoriq-fman3-0-1g-5.dtsi" > >> +/include/ "qoriq-fman3-0-10g-0.dtsi" > >> +/include/ "qoriq-fman3-0-10g-1.dtsi" > >> + fman@40 { > >> + ethernet@e8000 { > >> + tbi-handle = <&tbi4>; > >> + }; > > > > Binding needed > > > > Where is the "reg" for these unit addresses? > > As I said, the bulk of the FMan work comes from another team. Here we > need just enough to hook up the MDIO and PHY nodes. Unit addresses must match reg. No reg, no unit address. > I'd really like to be able to make progress on this without waiting for that > moment in time > we can get the entire FMan binding in place Why is the fman binding such a big deal? > >> + mdio@e9000 { > >> + tbi4: tbi-phy@8 { > >> + reg = <0x8>; > >> + device_type = "tbi-phy"; > >> + }; > >> + }; > > > > Binding needed for tbi-phy device_type > > I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before > without a binding) It's existing practice on eTSEC. FMan seemed like an opportunity to avoid carrying cruft forward. > > Why are we using device_type at all for this? > > That's what the upstream driver is looking for. Drivers should look for what the binding says -- not the other way around. > Anyway, most days PHYs can be discovered so they don't use/need > compatible properties. That's I guess part of the reason we don't have > bindings for them PHY nodes I don't see why there couldn't be a compatible that describes the standard programming interface. > However, what you can't discover is how they are wired to the MAC(s) so > we still need some nodes in the device tree to convey that. Also, when > looking for a specific kind of PHY, such as TBI, device_type works > easier then parsing compatibles from various vendors or so Don't you find the TBI by following the tbi-handle property? That said, I don't object to having a way to label a PHY as attached via TBI if that's useful. I'm giving a mild, non-nacking (given the history) objection to using device_type for that (given other history). -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote: > > Paul mentioned that BOOK3S always had DAR value set on alignment > interrupt. And the patch is to enable/collect correct DAR value when > running with Little Endian PR guest. Now to limit the impact and to > enable Little Endian PR guest, I ended up doing the conditional code > only for book3s 64 for which we know for sure that we set DAR value. Only BookS ? Afaik, the kernel align.c unconditionally uses DAR on every processor type. It's DSISR that may or may not be populated but afaik DAR always is. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote: > Isn't this a greater problem? We should start swapping before we hit > the point where non movable kernel allocation fails, no? Possibly but the fact remains, this can be avoided by making sure that if we create a CMA reserve for KVM, then it uses it rather than using the rest of main memory for hash tables. > The fact that KVM uses a good number of normal kernel pages is maybe > suboptimal, but shouldn't be a critical problem. The point is that we explicitly reserve those pages in CMA for use by KVM for that specific purpose, but the current code tries first to get them out of the normal pool. This is not an optimal behaviour and is what Aneesh patches are trying to fix. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, 2014-05-05 at 16:43 +0200, Alexander Graf wrote: > > Paul mentioned that BOOK3S always had DAR value set on alignment > > interrupt. And the patch is to enable/collect correct DAR value when > > running with Little Endian PR guest. Now to limit the impact and to > > enable Little Endian PR guest, I ended up doing the conditional code > > only for book3s 64 for which we know for sure that we set DAR value. > > Yes, and I'm asking whether we know that this statement holds true for > PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is > at least developed by IBM, I'd assume its semantics here are similar to > POWER4, but for PA6T I wouldn't be so sure. I am not aware of any PowerPC processor that does not set DAR on alignment interrupts. Paul, are you ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote: > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > >+#ifdef CONFIG_PPC_BOOK3S_64 > >+return vcpu->arch.fault_dar; > > How about PA6T and G5s? G5 sets DAR on an alignment interrupt. As for PA6T, I don't know for sure, but if it doesn't, ordinary alignment interrupts wouldn't be handled properly, since the code in arch/powerpc/kernel/align.c assumes DAR contains the address being accessed on all PowerPC CPUs. Did PA Semi ever publish a user manual for the PA6T, I wonder? Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/fsl-booke64: Set vmemmap_psize to 4K
The only way Freescale booke chips support mappings larger than 4K is via TLB1. The only way we support (direct) TLB1 entries is via hugetlb, which is not what map_kernel_page() does when given a large page size. Without this, a kernel with CONFIG_SPARSEMEM_VMEMMAP enabled crashes on boot with messages such as: PID hash table entries: 4096 (order: 3, 32768 bytes) Sorting __ex_table... BUG: Bad page state in process swapper pfn:00a2f page:84023a48 count:0 mapcount:0 mapping:04ffce48 index:0x4ffbe50 page flags: 0x4ffda40(active|arch_1|private|private_2|head|tail|swapcache|mappedtodisk|reclaim|swapbacked|unevictable|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: page flags: 0x311840(active|private|private_2|swapcache|unevictable|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-3-g7fa250c #299 Call Trace: [c098ba20] [c0008b3c] .show_stack+0x7c/0x1cc (unreliable) [c098baf0] [c060aa50] .dump_stack+0x88/0xb4 [c098bb70] [c00c0468] .bad_page+0x144/0x1a0 [c098bc10] [c00c0628] .free_pages_prepare+0x164/0x17c [c098bcc0] [c00c24cc] .free_hot_cold_page+0x48/0x214 [c098bd60] [c086c318] .free_all_bootmem+0x1fc/0x354 [c098be70] [c085da84] .mem_init+0xac/0xdc [c098bef0] [c08547b0] .start_kernel+0x21c/0x4d4 [c098bf90] [c448] .start_here_common+0x20/0x58 Signed-off-by: Scott Wood --- arch/powerpc/mm/tlb_nohash.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c index ae3d5b7..92cb18d 100644 --- a/arch/powerpc/mm/tlb_nohash.c +++ b/arch/powerpc/mm/tlb_nohash.c @@ -596,8 +596,13 @@ static void __early_init_mmu(int boot_cpu) /* XXX This should be decided at runtime based on supported * page sizes in the TLB, but for now let's assume 16M is * always there and a good fit (which it probably is) +* +* Freescale booke only supports 4K pages in TLB0, so use that. */ - mmu_vmemmap_psize = MMU_PAGE_16M; + if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) + mmu_vmemmap_psize = MMU_PAGE_4K; + else + mmu_vmemmap_psize = MMU_PAGE_16M; /* XXX This code only checks for TLB 0 capabilities and doesn't * check what page size combos are supported by the HW. It -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4] powerpc/fsl: Add binding for Freescale CCF
From: Diana Craciun The CoreNet coherency fabric is a fabric-oriented, conectivity infrastructure that enables the implementation of coherent, multicore systems. The CCF acts as a central interconnect for cores, platform-level caches, memory subsystem, peripheral devices and I/O host bridges in the system. Signed-off-by: Diana Craciun [scottw...@freescale.com: formatting and minor changes] Signed-off-by: Scott Wood --- v4: Fixed various formatting issues, minor edits for clarity, and made fsl,portid-mapping an optional property. .../devicetree/bindings/powerpc/fsl/ccf.txt| 46 ++ .../devicetree/bindings/powerpc/fsl/cpus.txt | 11 ++ .../devicetree/bindings/powerpc/fsl/pamu.txt | 10 + 3 files changed, 67 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt new file mode 100644 index 000..454da7e --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt @@ -0,0 +1,46 @@ +Freescale CoreNet Coherency Fabric(CCF) Device Tree Binding + +DESCRIPTION + +The CoreNet coherency fabric is a fabric-oriented, connectivity infrastructure +that enables the implementation of coherent, multicore systems. + +Required properties: + +- compatible: + fsl,corenet1-cf - CoreNet coherency fabric version 1. + Example chips: T4240, B4860 + + fsl,corenet2-cf - CoreNet coherency fabric version 2. + Example chips: P5040, P5020, P4080, P3041, P2041 + + fsl,corenet-cf - Used to represent the common registers + between CCF version 1 and CCF version 2. This compatible + is retained for compatibility reasons, as it was already + used for both CCF version 1 chips and CCF version 2 + chips. It should be specified after either + "fsl,corenet1-cf" or "fsl,corenet2-cf". + +- reg: + A standard property. Represents the CCF registers. + +- interrupts: + Interrupt mapping for CCF error interrupt. + +- fsl,ccf-num-csdids: + Specifies the number of Coherency Subdomain ID Port Mapping + Registers that are supported by the CCF. + +- fsl,ccf-num-snoopids: + Specifies the number of Snoop ID Port Mapping Registers that + are supported by CCF. + +Example: + + corenet-cf@18000 { + compatible = "fsl,corenet2-cf", "fsl,corenet-cf"; + reg = <0x18000 0x1000>; + interrupts = <16 2 1 31>; + fsl,ccf-num-csdids = <32>; + fsl,ccf-num-snoopids = <32>; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt index 922c30a..f8cd239 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt @@ -20,3 +20,14 @@ PROPERTIES a property named fsl,eref-[CAT], where [CAT] is the abbreviated category name with all uppercase letters converted to lowercase, indicates that the category is supported by the implementation. + +- fsl,portid-mapping + Usage: optional + Value type: + Definition: The Coherency Subdomain ID Port Mapping Registers and + Snoop ID Port Mapping registers, which are part of the CoreNet + Coherency fabric (CCF), provide a CoreNet Coherency Subdomain + ID/CoreNet Snoop ID to cpu mapping functions. Certain bits from + these registers should be set if the coresponding CPU should be + snooped. This property defines a bitmask which selects the bit + that should be set if this cpu should be snooped. diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt index 1f5e329..c2b2899 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt @@ -34,6 +34,15 @@ Optional properties: for legacy drivers. - interrupt-parent : Phandle to interrupt controller +- fsl,portid-mapping : + The Coherency Subdomain ID Port Mapping Registers and + Snoop ID Port Mapping registers, which are part of the + CoreNet Coherency fabric (CCF), provide a CoreNet + Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping + functions. Certain bits from these registers should be + set if PAMUs should be snooped. This property defines + a bitmask which selects the bits that should be set if + PAMUs should be snooped. Child nodes: @@ -88,6 +97,7 @@ Example: compatible = "fsl,pamu-v1.0", "fsl,pamu"; reg =
[PATCH] powerpc/fsl: Add fsl,portid-mapping to corenet1-cf chips
Signed-off-by: Scott Wood Cc: Diana Craciun --- arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 1 + arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi | 4 arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 1 + arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi | 4 arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 1 + arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi | 8 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 1 + arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi | 2 ++ arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 1 + arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi | 4 10 files changed, 27 insertions(+) diff --git a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi index b5daa4c..5290df8 100644 --- a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi @@ -262,6 +262,7 @@ interrupts = < 24 2 0 0 16 2 1 30>; + fsl,portid-mapping = <0x0f00>; pamu0: pamu@0 { reg = <0 0x1000>; diff --git a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi index 22f3b14..b1ea147 100644 --- a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi +++ b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi @@ -83,6 +83,7 @@ reg = <0>; clocks = <&mux0>; next-level-cache = <&L2_0>; + fsl,portid-mapping = <0x8000>; L2_0: l2-cache { next-level-cache = <&cpc>; }; @@ -92,6 +93,7 @@ reg = <1>; clocks = <&mux1>; next-level-cache = <&L2_1>; + fsl,portid-mapping = <0x4000>; L2_1: l2-cache { next-level-cache = <&cpc>; }; @@ -101,6 +103,7 @@ reg = <2>; clocks = <&mux2>; next-level-cache = <&L2_2>; + fsl,portid-mapping = <0x2000>; L2_2: l2-cache { next-level-cache = <&cpc>; }; @@ -110,6 +113,7 @@ reg = <3>; clocks = <&mux3>; next-level-cache = <&L2_3>; + fsl,portid-mapping = <0x1000>; L2_3: l2-cache { next-level-cache = <&cpc>; }; diff --git a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi index 5abd1fc..cd63cb1 100644 --- a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi @@ -289,6 +289,7 @@ interrupts = < 24 2 0 0 16 2 1 30>; + fsl,portid-mapping = <0x0f00>; pamu0: pamu@0 { reg = <0 0x1000>; diff --git a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi index 468e8be..dc5f4b3 100644 --- a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi +++ b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi @@ -84,6 +84,7 @@ reg = <0>; clocks = <&mux0>; next-level-cache = <&L2_0>; + fsl,portid-mapping = <0x8000>; L2_0: l2-cache { next-level-cache = <&cpc>; }; @@ -93,6 +94,7 @@ reg = <1>; clocks = <&mux1>; next-level-cache = <&L2_1>; + fsl,portid-mapping = <0x4000>; L2_1: l2-cache { next-level-cache = <&cpc>; }; @@ -102,6 +104,7 @@ reg = <2>; clocks = <&mux2>; next-level-cache = <&L2_2>; + fsl,portid-mapping = <0x2000>; L2_2: l2-cache { next-level-cache = <&cpc>; }; @@ -111,6 +114,7 @@ reg = <3>; clocks = <&mux3>; next-level-cache = <&L2_3>; + fsl,portid-mapping = <0x1000>; L2_3: l2-cache { next-level-cache = <&cpc>; }; diff --git a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi index bf0e7c9..12947cc 100644 --- a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi @@ -297,6 +297,7 @@ interrupts = <
Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
On May 5, 2014, at 10:58 AM, Diana Craciun wrote: > From: Diana Craciun > > The CoreNet coherency fabric is a fabric-oriented, conectivity > infrastructure that enables the implementation of coherent, multicore > systems. The CCF acts as a central interconnect for cores, > platform-level caches, memory subsystem, peripheral devices and I/O host > bridges in the system. > > Signed-off-by: Diana Craciun > --- > v3: > - added port ID mapping > - removed fsl,corenetx-cf > > .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 ++ > .../devicetree/bindings/powerpc/fsl/cpus.txt | 8 + > .../devicetree/bindings/powerpc/fsl/pamu.txt | 8 + > 3 files changed, 58 insertions(+) > create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt [snip] > --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > @@ -20,3 +20,11 @@ PROPERTIES > a property named fsl,eref-[CAT], where [CAT] is the abbreviated category > name with all uppercase letters converted to lowercase, indicates that > the category is supported by the implementation. > + > + - fsl,portid-mapping : > + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port > Mapping > + registers which are part of the CoreNet Coherency fabric (CCF) provide a > + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping > functions. > + Certain bits from these registers should be set if the coresponding CPU > + should be snooped. This property defines a bitmask which selects the > bit that > + should be set if this cpu should be snooped. Under what cases can software not figure out how to set this based on the PAMUs in the DT? > diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > index 1f5e329..827c637 100644 > --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt > @@ -26,6 +26,13 @@ Required properties: > A standard property. > - #size-cells : > A standard property. > +- fsl,portid-mapping : > + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port > Mapping > + registers which are part of the CoreNet Coherency fabric (CCF) provide a > + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping > functions. > + Certain bits from these registers should be set if PAMUs should be > snooped. > + This property defines a bitmask which selects the bits that should be > set > + if PAMUs should be snooped. > > Optional properties: > - reg : > @@ -88,6 +95,7 @@ Example: > compatible = "fsl,pamu-v1.0", "fsl,pamu"; > reg = <0x2 0x5000>; > ranges = <0 0x2 0x5000>; > + fsl,portid-mapping = <0xf8>; > #address-cells = <1>; > #size-cells = <1>; > interrupts = < > -- > 1.7.11.7 > > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
On Mon, 2014-05-05 at 21:12 -0500, Kumar Gala wrote: > On May 5, 2014, at 10:58 AM, Diana Craciun > wrote: > > > From: Diana Craciun > > > > The CoreNet coherency fabric is a fabric-oriented, conectivity > > infrastructure that enables the implementation of coherent, multicore > > systems. The CCF acts as a central interconnect for cores, > > platform-level caches, memory subsystem, peripheral devices and I/O host > > bridges in the system. > > > > Signed-off-by: Diana Craciun > > --- > > v3: > > - added port ID mapping > > - removed fsl,corenetx-cf > > > > .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 > > ++ > > .../devicetree/bindings/powerpc/fsl/cpus.txt | 8 + > > .../devicetree/bindings/powerpc/fsl/pamu.txt | 8 + > > 3 files changed, 58 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt > > [snip] > > > --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt > > @@ -20,3 +20,11 @@ PROPERTIES > > a property named fsl,eref-[CAT], where [CAT] is the abbreviated category > > name with all uppercase letters converted to lowercase, indicates that > > the category is supported by the implementation. > > + > > + - fsl,portid-mapping : > > + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port > > Mapping > > + registers which are part of the CoreNet Coherency fabric (CCF) provide a > > + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping > > functions. > > + Certain bits from these registers should be set if the coresponding CPU > > + should be snooped. This property defines a bitmask which selects the > > bit that > > + should be set if this cpu should be snooped. > > Under what cases can software not figure out how to set this based on the > PAMUs in the DT? How would it go about doing that? Besides the difference between corenet1-cf and corenet2-cf, on corenet1-cf the position of the PAMU bits depends on the number of CPUs that the chip was designed for. This may be different from the number of CPUs that are actually present (e.g. p4040, or AMP). It's also a complication that IMHO is asking for trouble, versus straightforwardly recording information that is present in a table in the manual. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
On Mon, May 05, 2014 at 08:17:00PM +0530, Aneesh Kumar K.V wrote: > Alexander Graf writes: > > > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: > >> Signed-off-by: Aneesh Kumar K.V > > > > No patch description, no proper explanations anywhere why you're doing > > what. All of that in a pretty sensitive piece of code. There's no way > > this patch can go upstream in its current form. > > > > Sorry about being vague. Will add a better commit message. The goal is > to export MPSS support to guest if the host support the same. MPSS > support is exported via penc encoding in "ibm,segment-page-sizes". The > actual format can be found at htab_dt_scan_page_sizes. When the guest > memory is backed by hugetlbfs we expose the penc encoding the host > support to guest via kvmppc_add_seg_page_size. In a case like this it's good to assume the reader doesn't know very much about Power CPUs, and probably isn't familiar with acronyms such as MPSS. The patch needs an introductory paragraph explaining that on recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. Note that the actual page size is always >= base page size. > Now the challenge to THP support is to make sure that our henter, > hremove etc decode base page size and actual page size correctly > from the hash table entry values. Most of the changes is to do that. > Rest of the stuff is already handled by kvm. > > NOTE: It is much easier to read the code after applying the patch rather > than reading the diff. I have added comments around each steps in the > code. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote: >On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: >> On 05/05/2014 03:27 AM, Gavin Shan wrote: >> > The series of patches intends to support EEH for PCI devices, which have >> > been >> > passed through to PowerKVM based guest via VFIO. The implementation is >> > straightforward based on the issues or problems we have to resolve to >> > support >> > EEH for PowerKVM based guest. >> > >> > - Emulation for EEH RTAS requests. Thanksfully, we already have >> > infrastructure >> >to emulate XICS. Without introducing new mechanism, we just extend that >> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests >> >initiated from guest are posted to host where the requests get handled >> > or >> >delivered to underly firmware for further handling. For that, the host >> > kerenl >> >has to maintain the PCI address (host domain/bus/slot/function to >> > guest's >> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address >> > mapping >> >will be built when initializing VFIO device in QEMU and destroied when >> > the >> >VFIO device in QEMU is going to offline, or VM is destroy. >> >> Do you also expose all those interfaces to user space? VFIO is as much >> about user space device drivers as it is about device assignment. >> Yep, all the interfaces are exported to user space. >> I would like to first see an implementation that doesn't touch KVM >> emulation code at all but instead routes everything through QEMU. As a >> second step we can then accelerate performance critical paths inside of KVM. >> Ok. I'll change the implementation. However, the QEMU still has to poll/push information from/to host kerenl. So the best place for that would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature. For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. >> That way we ensure that user space device drivers have all the power >> over a device they need to drive it. > >+1 > Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan
Hello Scott, On 05/05/2014 06:25 PM, Scott Wood wrote: > On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote: >> Hello Scott, >> >> >> On 04/21/2014 05:11 PM, Scott Wood wrote: >>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote: +fman@40 { + mdio@f1000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,fman-xmdio"; + reg = <0xf1000 0x1000>; + }; +}; >>> >>> I'd like to see a complete fman binding before we start adding pieces. >> >> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years >> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted >> without a binding writeup. > > Pushing driver code through the netdev tree does not establish device > tree ABI. Binding documents and dts files do. Sure, ideally and formally. But upstreaming a driver represents, if nothing else, a statement of intent to observe a device tree ABI. Via the SDK, FSL customers are using the device tree ABI the driver de facto establishes. I guess a driver that makes it upstream can establish an device tree ABI We'll re-spin adding the binding document >> This patch series should probably include a >> binding blurb. However, let's not gate this patchset on a complete >> binding for the FMan > > I at least want to see enough of the FMan binding to have confidence > that what we're adding now is correct. I'm not sure what you're looking for. The nodes we're adding are describing a very common CCSR space interface for quite common device blocks >> As you know we don't own the FMan work and the FMan work is... not ready >> for upstreaming. > > I'm not asking for a driver, just a binding that describes hardware. Is > there any reason why the fman node needs to be anywhere near as > complicated as it is in the SDK, if we're limiting it to actual hardware > description? Is this a trick question? :-) Of course it doesn't need to be more complicated than actual hardware. But, to repeat myself, said description is not... ready and I don't know when it will be. Somebody else owns pushing the bulk of FMan upstream and I'd rather not step on their turf quite like this > Do we really need to have nodes for all the sub-blocks? Definitely no, and internally I'm pushing to clean that up. However, you surely remember we've been pushing from the early days of P4080 and it's been, to put it optimistically, slow >> In an attempt to make some sort of progress we've >> decided to upstream the pieces that are less controversial and MDIO is >> an obvious candidate >> +fman@40 { + mdio0: mdio@e1120 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,fman-mdio"; + reg = <0xe1120 0xee0>; + }; +}; >>> >>> What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"? I >>> don't see the latter on the list of compatibles in patch 3/6. >> >> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is >> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi >> > > "respin this patch wi..."? Not sure where the end of that sentence went. I meant we'll re-spin with a binding for the 10 Gb/s MDIO block >> I believe 'fsl,fman-mdio' (and others on that list) was added >> gratuitously as the FMan MDIO is completely compatible with the >> eTSEC/gianfar MDIO driver, but we can deal with that later > > It's still good to identify the specific device, even if it's believed > to be 100% compatible. You suggesting we create new compatibles for every instance/integration of a hardware block even though is identical with an earlier hardware integration? Well, I guess that's been done that and now we have about 8 different compatibles that convey no real difference at all > Plus, IIRC there's been enough badness in the > eTSEC MDIO binding that it'd be good to steer clear of it. Hmm... I guess we can leave things as they are. I wasn't going to touch this just now anyway >>> Within each category, is the exact fman version discoverable from the >>> mdio registers? >> >> No, but that's irrelevant as that's not the difference between the two >> compatibles > > It's relevant because it means the compatible string should have a block > version number in it, or at least some other way in the MDIO node to > indicate the block version. The 1 Gb/s MDIO block doesn't track a version of its own and from a programming interface perspective it has no visible difference since eTSEC. The 10 Gb/s MDIO doesn't track a version of its own either and across the existing FMan versions is identical from a programming interface perspective I guess we can append a 'v1.0' to the MDIO compatible(s). However, given the SDK we'll have to support the compatibles the (already upstream) drivers support. Dealing with all that legacy is going to be so tedious +fman@50 { + #address-cells = <1>; + #size-cells
Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)
Hello Scott, On 05/05/2014 06:34 PM, Scott Wood wrote: > On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote: >> Hello Scott, >> >> >> On 04/21/2014 05:14 PM, Scott Wood wrote: >>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote: FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs. Add support for the internal SerDes TBI PHYs Based on prior work by Andy Fleming Signed-off-by: Shruti Kanetkar --- arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 28 + arch/powerpc/boot/dts/fsl/b4si-post.dtsi| 51 + arch/powerpc/boot/dts/fsl/p1023si-post.dtsi | 14 +++ arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 64 arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 64 arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++ arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 64 arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++ arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 9 files changed, 671 insertions(+) diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi index cbc354b..45b0ff5 100644 --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi @@ -172,6 +172,34 @@ compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0"; }; +/include/ "qoriq-fman3-0-1g-4.dtsi" +/include/ "qoriq-fman3-0-1g-5.dtsi" +/include/ "qoriq-fman3-0-10g-0.dtsi" +/include/ "qoriq-fman3-0-10g-1.dtsi" + fman@40 { + ethernet@e8000 { + tbi-handle = <&tbi4>; + }; >>> >>> Binding needed >>> >>> Where is the "reg" for these unit addresses? >> >> As I said, the bulk of the FMan work comes from another team. Here we >> need just enough to hook up the MDIO and PHY nodes. > > Unit addresses must match reg. No reg, no unit address. We can add a 'reg' property, but we really don't want to clash with the team that is working on upstreaming the FMan/MAC bindings and drivers >> I'd really like to be able to make progress on this without waiting for that >> moment in time >> we can get the entire FMan binding in place > > Why is the fman binding such a big deal? > + mdio@e9000 { + tbi4: tbi-phy@8 { + reg = <0x8>; + device_type = "tbi-phy"; + }; + }; >>> >>> Binding needed for tbi-phy device_type >> >> I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before >> without a binding) > > It's existing practice on eTSEC. FMan seemed like an opportunity to > avoid carrying cruft forward. The 1 Gb/s MDIO block is not FMan specific. As I said is the same block from eTSEC. That's part of the reason we're trying upstreaming this independent of the FMan stuff. So, don't think FMan, think MDIO >>> Why are we using device_type at all for this? >> >> That's what the upstream driver is looking for. > > Drivers should look for what the binding says -- not the other way > around. Yeah yeah. Nobody likes it, but the driver is/describes the de facto binding On a constructive note, the Ethernet PHY code doesn't do device tree based probing so no compatibles are used at all. So device_type is used to convey a TBI PHY >> Anyway, most days PHYs can be discovered so they don't use/need >> compatible properties. That's I guess part of the reason we don't have >> bindings for them PHY nodes > > I don't see why there couldn't be a compatible that describes the > standard programming interface. Because it can be detected at runtime and I guess stuff like that should stay out of the device tree. I'm using PCI as an analogy here >> However, what you can't discover is how they are wired to the MAC(s) so >> we still need some nodes in the device tree to convey that. Also, when >> looking for a specific kind of PHY, such as TBI, device_type works >> easier then parsing compatibles from various vendors or so > > Don't you find the TBI by following the tbi-handle property? When the MAC "attaches" to the PHY the tbi-handle is followed. But the MDIO/PHY code/driver(s) doesn't quite "see" the tbi-handle as it's outside the MDIO/PHY nodes > That said, > I don't object to having a way to label a PHY as attached via TBI if > that's useful. I'm giving a mild, non-nacking (given the history) > objection to using device_type for that (given other history). Personally, I think that TBI PHY support is a bit messy but I don't have bandwidth to deal with that. The TBI PHY should be handled as a regular PHY and right now is a special case Cheers, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/l
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On 06.05.14 06:26, Gavin Shan wrote: On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote: On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: On 05/05/2014 03:27 AM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which have been passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure to emulate XICS. Without introducing new mechanism, we just extend that existing infrastructure to support EEH RTAS emulation. EEH RTAS requests initiated from guest are posted to host where the requests get handled or delivered to underly firmware for further handling. For that, the host kerenl has to maintain the PCI address (host domain/bus/slot/function to guest's PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping will be built when initializing VFIO device in QEMU and destroied when the VFIO device in QEMU is going to offline, or VM is destroy. Do you also expose all those interfaces to user space? VFIO is as much about user space device drivers as it is about device assignment. Yep, all the interfaces are exported to user space. I would like to first see an implementation that doesn't touch KVM emulation code at all but instead routes everything through QEMU. As a second step we can then accelerate performance critical paths inside of KVM. Ok. I'll change the implementation. However, the QEMU still has to poll/push information from/to host kerenl. So the best place for that would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature. For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. Yes, start off without in-kernel XICS so everything simply lives in QEMU. Then add callbacks into the in-kernel XICS to inject these interrupts if we don't have wide enough interfaces already. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 06.05.14 02:41, Paul Mackerras wrote: On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? G5 sets DAR on an alignment interrupt. As for PA6T, I don't know for sure, but if it doesn't, ordinary alignment interrupts wouldn't be handled properly, since the code in arch/powerpc/kernel/align.c assumes DAR contains the address being accessed on all PowerPC CPUs. Now that's a good point. If we simply behave like Linux, I'm fine. This definitely deserves a comment on the #ifdef in the code. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev