[PATCH V2 0/3] Add new PowerPC specific ELF core notes

2014-05-05 Thread Anshuman Khandual
This patch series adds five new ELF core note sections which can be
used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing
various transactional memory and miscellaneous register sets on PowerPC
platform. Please find a test program exploiting these new ELF core note
types on a POWER8 system.

RFC: https://lkml.org/lkml/2014/4/1/292
V1:  https://lkml.org/lkml/2014/4/2/43

Changes in V2
=
(1) Removed all the power specific ptrace requests corresponding to new NT_PPC_*
elf core note types. Now all the register sets can be accessed from ptrace
through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* core
note type instead
(2) Fixed couple of attribute values for REGSET_TM_CGPR register set
(3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
(4) Fixed 32 bit checkpointed GPR support
(5) Changed commit messages accordingly

Outstanding Issues
==
(1) Running DSCR register value inside a transaction does not seem to be saved
at thread.dscr when the process stops for ptrace examination.

Test programs
=
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

typedef long long u64;
typedef unsigned int u32;
typedef __vector128 vector128;

/* TM CFPR */
struct tm_cfpr {
u64 fpr[32];
u64 fpscr;
};

/* TM CVMX */
struct tm_cvmx {
vector128   vr[32] __attribute__((aligned(16)));
vector128   vscr __attribute__((aligned(16)));
u32 vrsave; 
};

/* TM SPR */
struct tm_spr_regs {
u64 tm_tfhar;
u64 tm_texasr;
u64 tm_tfiar;
u64 tm_orig_msr;
u64 tm_tar;
u64 tm_ppr;
u64 tm_dscr;
};

/* Miscellaneous registers */
struct misc_regs {
u64 dscr;
u64 ppr;
u64 tar;
};

/* TM instructions */
#define TBEGIN  ".long 0x7C00051D ;"
#define TEND".long 0x7C00055D ;"

/* SPR number */
#define SPRN_DSCR   0x3
#define SPRN_TAR815

/* ELF core notes */
#define NT_PPC_TM_SPR  0x103   /* PowerPC transactional memory special 
registers */
#define NT_PPC_TM_CGPR 0x104   /* PowerpC transactional memory 
checkpointed GPR */
#define NT_PPC_TM_CFPR 0x105   /* PowerPC transactional memory 
checkpointed FPR */
#define NT_PPC_TM_CVMX 0x106   /* PowerPC transactional memory 
checkpointed VMX */
#define NT_PPC_MISC0x107   /* PowerPC miscellaneous registers */

#define VAL1 1
#define VAL2 2
#define VAL3 3
#define VAL4 4

int main(int argc, char *argv[])
{
struct tm_spr_regs *tmr1;
struct pt_regs *pregs1, *pregs2;
struct tm_cfpr *fpr, *fpr1;
struct misc_regs *dbr1;
struct iovec iov;

pid_t child;
int ret = 0, status = 0, i = 0, flag = 1;

pregs2 = (struct pt_regs *) malloc(sizeof(struct pt_regs));
fpr = (struct tm_cfpr *) malloc(sizeof(struct tm_cfpr));

child = fork();
if (child < 0) {
printf("fork() failed \n");
exit(-1);
}

/* Child code */
if (child == 0) {
asm __volatile__(
"6: ;"  /* TM checkpointed values */
"li 1, %[val1];"/* GPR[1] */
".long 0x7C210166;" /* FPR[1] */
"li 2, %[val2];"/* GPR[2] */
".long 0x7C420166;" /* FPR[2] */
"mtspr %[tar], 1;"  /* TAR */
"mtspr %[dscr], 2;" /* DSCR */
"1: ;"
TBEGIN  /* TM running values */
"beq 2f ;"
"li 1, %[val3];"/* GPR[1] */
".long 0x7C210166;" /* FPR[1] */
"li 2, %[val4];"/* GPR[2] */
".long 0x7C420166;" /* FPR[2] */
"mtspr %[tar], 1;"  /* TAR */
"mtspr %[dscr], 2;" /* DSCR */
"b .;"
TEND
"2: ;"  /* Abort handler */
"b 1b;" /* Start from TBEGIN */

"3: ;"
"b 6b;" /* Start all over again */
:: [dscr]"i"(SPRN_DSCR), [tar]"i"(SPRN_TAR), 
[val1]"i"(VAL1), [val2]"i"(VAL2), [val3]"i"(VAL3), [val4]"i"(VAL4)
: "memory", "r7");
}

/* Parent */
if (child) {
do {
memset(pregs2, 0 , sizeof(struct pt_regs));
memset(fpr, 0 , sizeof(struct tm_cfpr));

/* Wait till child hits "b ." instruction */
 

[PATCH V2 1/3] elf: Add some new PowerPC specifc note sections

2014-05-05 Thread Anshuman Khandual
This patch adds four new note sections for transactional memory
and one note section for some miscellaneous registers. This addition
of new elf note sections extends the existing elf ABI without affecting
it in any manner.

Signed-off-by: Anshuman Khandual 
---
 include/uapi/linux/elf.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index ef6103b..4040124 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -379,6 +379,11 @@ typedef struct elf64_shdr {
 #define NT_PPC_VMX 0x100   /* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE 0x101   /* PowerPC SPE/EVR registers */
 #define NT_PPC_VSX 0x102   /* PowerPC VSX registers */
+#define NT_PPC_TM_SPR  0x103   /* PowerPC TM special registers */
+#define NT_PPC_TM_CGPR 0x104   /* PowerpC TM checkpointed GPR */
+#define NT_PPC_TM_CFPR 0x105   /* PowerPC TM checkpointed FPR */
+#define NT_PPC_TM_CVMX 0x106   /* PowerPC TM checkpointed VMX */
+#define NT_PPC_MISC0x107   /* PowerPC miscellaneous registers */
 #define NT_386_TLS 0x200   /* i386 TLS slots (struct user_desc) */
 #define NT_386_IOPERM  0x201   /* x86 io permission bitmap (1=deny) */
 #define NT_X86_XSTATE  0x202   /* x86 extended state using xsave */
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-05 Thread Anshuman Khandual
This patch enables get and set of transactional memory related register
sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
ELF core note types added previously in this regard.

(1) NT_PPC_TM_SPR
(2) NT_PPC_TM_CGPR
(3) NT_PPC_TM_CFPR
(4) NT_PPC_TM_CVMX

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/switch_to.h |   8 +
 arch/powerpc/kernel/process.c|  24 ++
 arch/powerpc/kernel/ptrace.c | 683 +--
 3 files changed, 687 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 0e83e7d..2737f46 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t)
 }
 #endif
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+extern void flush_tmregs_to_thread(struct task_struct *);
+#else
+static inline void flush_tmregs_to_thread(struct task_struct *t)
+{
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
 static inline void clear_task_ebb(struct task_struct *t)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 31d0215..e247898 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct *prev)
}
 }
 
+void flush_tmregs_to_thread(struct task_struct *tsk)
+{
+   /*
+* If task is not current, it should have been flushed
+* already to it's thread_struct during __switch_to().
+*/
+   if (tsk != current)
+   return;
+
+   preempt_disable();
+   if (tsk->thread.regs) {
+   /*
+* If we are still current, the TM state need to
+* be flushed to thread_struct as it will be still
+* present in the current cpu.
+*/
+   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
+   __switch_to_tm(tsk);
+   tm_recheckpoint_new_task(tsk);
+   }
+   }
+   preempt_enable();
+}
+
 /*
  * This is called if we are on the way out to userspace and the
  * TIF_RESTORE_TM flag is set.  It checks if we need to reload
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 2e3d2bf..92faded 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const 
struct user_regset *regset,
return ret;
 }
 
+/*
+ * When any transaction is active, "thread_struct->transact_fp" holds
+ * the current running value of all FPR registers and "thread_struct->
+ * fp_state" holds the last checkpointed FPR registers state for the
+ * current transaction.
+ *
+ * struct data {
+ * u64 fpr[32];
+ * u64 fpscr;
+ * };
+ */
 static int fpr_get(struct task_struct *target, const struct user_regset 
*regset,
   unsigned int pos, unsigned int count,
   void *kbuf, void __user *ubuf)
@@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const 
struct user_regset *regset,
u64 buf[33];
int i;
 #endif
-   flush_fp_to_thread(target);
+   if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
+   flush_fp_to_thread(target);
+   flush_altivec_to_thread(target);
+   flush_tmregs_to_thread(target);
+   } else {
+   flush_fp_to_thread(target);
+   }
 
 #ifdef CONFIG_VSX
/* copy to local buffer then write that out */
-   for (i = 0; i < 32 ; i++)
-   buf[i] = target->thread.TS_FPR(i);
-   buf[32] = target->thread.fp_state.fpscr;
+   if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
+   for (i = 0; i < 32 ; i++)
+   buf[i] = target->thread.TS_TRANS_FPR(i);
+   buf[32] = target->thread.transact_fp.fpscr;
+   } else {
+   for (i = 0; i < 32 ; i++)
+   buf[i] = target->thread.TS_FPR(i);
+   buf[32] = target->thread.fp_state.fpscr;
+   }
return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
 
 #else
-   BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
-offsetof(struct thread_fp_state, fpr[32][0]));
+   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
+   BUILD_BUG_ON(offsetof(struct transact_fp, fpscr) !=
+   offsetof(struct transact_fp, fpr[32][0]));
 
-   return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+   return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+  &target->thread.transact_fp, 0, -1);
+   

[PATCH V2 3/3] powerpc, ptrace: Enable support for miscellaneous registers

2014-05-05 Thread Anshuman Khandual
This patch enables get and set of miscellaneous registers through ptrace
PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing new powerpc
specific register set REGSET_MISC support corresponding to the new ELF
core note NT_PPC_MISC added previously in this regard.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/kernel/ptrace.c | 81 
 1 file changed, 81 insertions(+)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 92faded..3332dd8 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1054,6 +1054,76 @@ static int tm_cvmx_set(struct task_struct *target, const 
struct user_regset *reg
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 /*
+ * Miscellaneous Registers
+ *
+ * struct {
+ * unsigned long dscr;
+ * unsigned long ppr;
+ * unsigned long tar;
+ * };
+ */
+static int misc_get(struct task_struct *target, const struct user_regset 
*regset,
+  unsigned int pos, unsigned int count,
+  void *kbuf, void __user *ubuf)
+{
+   int ret;
+
+   /* DSCR register */
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+   &target->thread.dscr, 0,
+   sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned 
long) +
+   sizeof(unsigned long) != offsetof(struct 
thread_struct, ppr));
+
+   /* PPR register */
+   if (!ret)
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.ppr, sizeof(unsigned 
long),
+ 2 * sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long)
+   != offsetof(struct 
thread_struct, tar));
+   /* TAR register */
+   if (!ret)
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.tar, 2 * 
sizeof(unsigned long),
+ 3 * sizeof(unsigned long));
+   return ret;
+}
+
+static int misc_set(struct task_struct *target, const struct user_regset 
*regset,
+  unsigned int pos, unsigned int count,
+  const void *kbuf, const void __user *ubuf)
+{
+   int ret;
+
+   /* DSCR register */
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.dscr, 0,
+   sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned 
long) +
+   sizeof(unsigned long) != offsetof(struct thread_struct, 
ppr));
+
+   /* PPR register */
+   if (!ret)
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.ppr, 
sizeof(unsigned long),
+   2 * sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long)
+   != offsetof(struct 
thread_struct, tar));
+
+   /* TAR register */
+   if (!ret)
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.tar, 2 * 
sizeof(unsigned long),
+   3 * sizeof(unsigned long));
+   return ret;
+}
+
+/*
  * These are our native regset flavors.
  */
 enum powerpc_regset {
@@ -1074,6 +1144,7 @@ enum powerpc_regset {
REGSET_TM_CFPR, /* TM checkpointed FPR */
REGSET_TM_CVMX, /* TM checkpointed VMX */
 #endif
+   REGSET_MISC /* Miscellaneous */
 };
 
 static const struct user_regset native_regsets[] = {
@@ -1130,6 +1201,11 @@ static const struct user_regset native_regsets[] = {
.get = tm_cvmx_get, .set = tm_cvmx_set
},
 #endif
+   [REGSET_MISC] = {
+   .core_note_type = NT_PPC_MISC, .n = 3,
+   .size = sizeof(u64), .align = sizeof(u64),
+   .get = misc_get, .set = misc_set
+   },
 };
 
 static const struct user_regset_view user_ppc_native_view = {
@@ -1459,6 +1535,11 @@ static const struct user_regset compat_regsets[] = {
.get = tm_cvmx_get, .set = tm_cvmx_set
},
 #endif
+   [REGSET_MISC] = {
+   .core_note_type = NT_PPC_MISC, .n = 3,
+   .size = sizeof(u64), .align = sizeof(u64),
+   .get = misc_get, .set = misc_set
+   },
 };
 
 static const struct user_regset_view user_ppc_compat_view = {
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 08/11] powerpc, lib: Add new branch analysis support functions

2014-05-05 Thread Anshuman Khandual
Generic powerpc branch analysis support added in the code patching
library which will help the subsequent patch on SW based filtering
of branch records in perf.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/code-patching.h | 16 +++
 arch/powerpc/lib/code-patching.c | 80 
 2 files changed, 96 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 97e02f9..39919d4 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,16 @@
 #define BRANCH_SET_LINK0x1
 #define BRANCH_ABSOLUTE0x2
 
+#define XL_FORM_LR  0x4C20
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS0x0280
+#define BO_CTR   0x0200
+#define BO_CRBI_OFF  0x0080
+#define BO_CRBI_ON   0x0180
+#define BO_CRBI_HINT 0x0040
+
 unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
@@ -56,4 +66,10 @@ static inline unsigned long ppc_function_entry(void *func)
 #endif
 }
 
+/* Perf branch filters */
+bool instr_is_return_branch(unsigned int instr);
+bool instr_is_conditional_branch(unsigned int instr);
+bool instr_is_func_call(unsigned int instr);
+bool instr_is_indirect_func_call(unsigned int instr);
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d5edbeb..a06f8b3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr)
return (instr >> 26) & 0x3F;
 }
 
+/* Forms of branch instruction */
 static int instr_is_branch_iform(unsigned int instr)
 {
return branch_opcode(instr) == 18;
@@ -87,6 +88,85 @@ static int instr_is_branch_bform(unsigned int instr)
return branch_opcode(instr) == 16;
 }
 
+static int instr_is_branch_xlform(unsigned int instr)
+{
+   return branch_opcode(instr) == 19;
+}
+
+/* Classification of XL-form instruction */
+static int is_xlform_lr(unsigned int instr)
+{
+   return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+/* BO field analysis (B-form or XL-form) */
+static int is_bo_always(unsigned int instr)
+{
+   return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+/* Link bit is set */
+static int is_branch_link_set(unsigned int instr)
+{
+   return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+/* 
+ * Generic software implemented branch filters used
+ * by perf branch stack sampling when PMU does not
+ * process them for some reason.
+ */
+
+/* PERF_SAMPLE_BRANCH_ANY_RETURN */
+bool instr_is_return_branch(unsigned int instr)
+{
+   /*
+* Conditional and unconditional branch to LR register
+* without seting the link register.
+*/
+   if (is_xlform_lr(instr) && !is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_COND */
+bool instr_is_conditional_branch(unsigned int instr)
+{
+   /* I-form instruction - excluded */
+   if (instr_is_branch_iform(instr))
+   return false;
+
+   /* B-form or XL-form instruction */
+   if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr))  {
+
+   /* Not branch always */
+   if (!is_bo_always(instr))
+   return true;
+   }
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_ANY_CALL */
+bool instr_is_func_call(unsigned int instr)
+{
+   /* LR should be set */
+   if (is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_IND_CALL */
+bool instr_is_indirect_func_call(unsigned int instr)
+{
+   /* XL-form instruction with LR set */
+   if (instr_is_branch_xlform(instr) && is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record

2014-05-05 Thread Anshuman Khandual
Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ce62ef..dfe6b9d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 03/11] x86, perf: Add conditional branch filtering support

2014-05-05 Thread Anshuman Khandual
This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event 
*event)
if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
mask |= X86_BR_NO_TX;
 
+   if (br_type & PERF_SAMPLE_BRANCH_COND)
+   mask |= X86_BR_JCC;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+   [PERF_SAMPLE_BRANCH_COND]   = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework

2014-05-05 Thread Anshuman Khandual
This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

Introduced two new variables track various filter values and mask

(a) bhrb_sw_filter  Tracks SW implemented branch filter flags
(b) bhrb_filter Tracks both (SW and HW) branch filter flags

(2) Event creation

Kernel will figure out supported BHRB branch filters through a PMU call
back 'bhrb_filter_map'. This function will find out how many of the
requested branch filters can be supported in the PMU HW. It will not
try to invalidate any branch filter combinations. Event creation will 
not
error out because of lack of HW based branch filters. Meanwhile it will
track the overall supported branch filters in the 'bhrb_filter' 
variable.

Once the PMU call back returns kernel will process the user branch 
filter
request against available SW filters (bhrb_sw_filter_map) while looking
at the 'bhrb_filter'. During this phase all the branch filters which are
still pending from the user requested list will have to be supported in
SW failing which the event creation will error out.

(3) SW branch filter

During the BHRB data capture inside the PMU interrupt context, each
of the captured 'perf_branch_entry.from' will be checked for compliance
with applicable SW branch filters. If the entry does not conform to the
filter requirements, it will be discarded from the final perf branch
stack buffer.

(4) Supported SW based branch filters

(a) PERF_SAMPLE_BRANCH_ANY_RETURN
(b) PERF_SAMPLE_BRANCH_IND_CALL
(c) PERF_SAMPLE_BRANCH_ANY_CALL
(d) PERF_SAMPLE_BRANCH_COND

Please refer the patch to understand the classification of instructions
into these branch filter categories.

(5) Multiple branch filter semantics

Book3 sever implementation follows the same OR semantics (as 
implemented in
x86) while dealing with multiple branch filters at any point of time. SW
branch filter analysis is carried on the data set captured in the PMU 
HW.
So the resulting set of data (after applying the SW filters) will 
inherently
be an AND with the HW captured set. Hence any combination of HW and SW 
branch
filters will be invalid. HW based branch filters are more efficient and 
faster
compared to SW implemented branch filters. So at first the PMU should 
decide
whether it can support all the requested branch filters itself or not. 
In case
it can support all the branch filters in an OR manner, we dont apply 
any SW
branch filter on top of the HW captured set (which is the final set). 
This
preserves the OR semantic of multiple branch filters as required. But 
in case
where the PMU cannot support all the requested branch filters in an OR 
manner,
it should not apply any it's filters and leave it upto the SW to handle 
them
all. Its the PMU code's responsibility to uphold this protocol to be 
able to
conform to the overall OR semantic of perf branch stack sampling 
framework.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c  | 188 ++-
 arch/powerpc/perf/power8-pmu.c   |   2 +-
 3 files changed, 187 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 9ed73714..93a9a8a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -19,6 +19,10 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+#define for_each_branch_sample_type(x) \
+for ((x) = PERF_SAMPLE_BRANCH_USER; \
+ (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -35,7 +39,7 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
-   u64 (*bhrb_filter_map)(u64 branch_sample_type);
+   u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 
*bhrb_filter);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/pe

[V6 00/11] perf: New conditional branch filter

2014-05-05 Thread Anshuman Khandual
This patchset is the re-spin of the original branch stack 
sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This 
patchset
also enables SW based branch filtering support for book3s powerpc platforms 
which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample 
test
application program (included here).

Changes in V2
=
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch 
filtering code
(3) Added new instruction analysis functionality into powerpc code-patching 
library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael 
Ellerman
(3) Rebased the patchset against latest Linus's tree

Changes in V5
=
(1) Added a precursor patch to cleanup the indentation problem in 
power_pmu_bhrb_read
(2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which 
improved the clarity
(3) Merged the previous 10th patch into the 8th patch
(4) Moved SW based branch analysis code from core perf into code-patching 
library as suggested by Michael
(5) Simplified the logic in branch analysis library
(6) Fixed some ambiguities in documentation at various places
(7) Added some more in-code documentation blocks at various places
(8) Renamed some local variable and function names
(9) Fixed some indentation and white space errors in the code
(10) Implemented almost all the review comments and suggestions made by Michael 
Ellerman on V4 patchset
(11) Enabled privilege mode SW branch filter
(12) Simplified and generalized the SW implemented conditional branch filter
(13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW 
implementation
(14) Adjusted other patches to deal with the above changes

Changes in V6
=
(1) Rebased the patchset against the master
(2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series 
which changes the
generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130]

HW implemented branch filters
=

(1) perf record -j any_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared ObjectSource Symbol  Target 
Shared Object Target Symbol
#   ...    ...  
  
#
 7.85%cprog  cprog [.] sw_3_1   cprog   
  [.] success_3_1_2   
 5.66%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_2
 5.65%cprog  cprog [.] hw_1_1   cprog   
  [.] symbol1 
 5.42%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_3
 5.40%cprog  cprog [.] callme   cprog   
  [.] hw_1_1  
 5.40%cprog  cprog [.] sw_3_1   cprog   
  [.] success_3_1_1   
 5.40%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_1
 5.39%cprog  cprog [.] sw_4_2   cprog   
  [.] lr_addr 
 5.39%cprog  cprog [.] callme   cprog   
  [.] sw_4_2  
 5.39%cprog  [unknown] [.]  cprog   
  [.] ctr_addr
 5.38%cprog  cprog [.] hw_1_2   cprog   
  [.] symbol2 
 5.38%cprog  cprog [.] callme   cprog   
  [.] hw_1_2  
 5.16%cprog  cprog [.] sw_3_1   cprog   
  [.] success_3_1_3   
 5.15%cprog  cprog [.] callme   cprog   
  [.] sw_3_2  
 5.14% 

[V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable

2014-05-05 Thread Anshuman Khandual
This patch simply changes the name of the variable from 'bhrb_filter' to
'bhrb_hw_filter' in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch. This patch does not change any functionality.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 66bea54..1d7e909 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
int n_txn_start;
 
/* BHRB bits */
-   u64 bhrb_filter;/* BHRB HW branch 
filter */
+   u64 bhrb_hw_filter; /* BHRB HW branch 
filter */
int bhrb_users;
void*bhrb_context;
struct  perf_branch_stack   bhrb_stack;
@@ -1298,7 +1298,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
mb();
if (cpuhw->bhrb_users)
-   ppmu->config_bhrb(cpuhw->bhrb_filter);
+   ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
write_mmcr0(cpuhw, mmcr0);
 
@@ -1405,7 +1405,7 @@ nocheck:
  out:
if (has_branch_stack(event)) {
power_pmu_bhrb_enable(event);
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
}
 
@@ -1788,10 +1788,10 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if(cpuhw->bhrb_filter == -1)
+   if(cpuhw->bhrb_hw_filter == -1)
return -EOPNOTSUPP;
}
 
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 04/11] perf, documentation: Description for conditional branch filter

2014-05-05 Thread Anshuman Khandual
Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c71b0f3..d460049 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -184,9 +184,10 @@ following filters are defined:
- in_tx: only when the target is in a hardware transaction
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
+   - cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
+The option requires at least one branch type among any, any_call, any_ret, 
ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of 
the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) 
privilege
 levels are subject to permissions.  When sampling on multiple events, branch 
stack sampling
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 11/11] powerpc, perf: Enable privilege mode SW branch filters

2014-05-05 Thread Anshuman Khandual
This patch enables privilege mode SW branch filters. Also modifies
POWER8 PMU branch filter configuration so that the privilege mode
branch filter implemented as part of base PMU event configuration
is reflected in bhrb filter mask. As a result, the SW will skip and
not try to process the privilege mode branch filters itself.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 53 +++--
 arch/powerpc/perf/power8-pmu.c  | 13 --
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a94cc43..297cddb 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -26,6 +26,9 @@
 #define BHRB_PREDICTION0x0001
 #define BHRB_EA0xFFFCUL
 
+#define POWER_ADDR_USER0
+#define POWER_ADDR_KERNEL  1
+
 struct cpu_hw_events {
int n_events;
int n_percpu;
@@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 
sw_filter)
  * Access the instruction contained in the address and check
  * whether it complies with the applicable SW branch filters.
  */
-static bool keep_branch(u64 from, u64 sw_filter)
+static bool keep_branch(u64 from, u64 to, u64 sw_filter)
 {
unsigned int instr;
-   bool ret;
+   bool to_plm, ret, flag;
 
/*
 * The "from" branch for every branch record has to go
@@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter)
if (sw_filter == 0)
return true;
 
+   to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER;
+
+   /*
+* Applying privilege mode SW branch filters first on the
+* 'to' address makes an AND semantic with the SW generic
+* branch filters (OR with each other) being applied on the
+* from address there after.
+*/
+
+   /* Ignore PERF_SAMPLE_BRANCH_HV */
+   sw_filter &= ~PERF_SAMPLE_BRANCH_HV;
+
+   /* Privilege mode branch filters for "TO" address */
+   if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) {
+   flag = false;
+
+   if (sw_filter & PERF_SAMPLE_BRANCH_USER) {
+   if(to_plm == POWER_ADDR_USER)
+   flag = true;
+   }
+
+   if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) {
+   if(to_plm == POWER_ADDR_KERNEL)
+   flag = true;
+   }
+
+   if (!flag)
+   return false;
+   }
+
+   /* Generic branch filters for "FROM" address */
if (is_kernel_addr(from)) {
return check_instruction((unsigned int *) from, sw_filter);
} else {
@@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 
bhrb_filter)
if (!(branch_sample_type & x))
continue;
/*
-* Privilege filter requests have been already
-* taken care during the base PMU configuration.
-*/
-   if ((x == PERF_SAMPLE_BRANCH_USER)
-   || (x == PERF_SAMPLE_BRANCH_KERNEL)
-   || (x == PERF_SAMPLE_BRANCH_HV))
-   continue;
-
-   /*
 * Requested filter not available either
 * in PMU or in SW.
 */
@@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 
bhrb_filter)
 }
 
 /* SW implemented branch filters */
-static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL,
+static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER,
+ PERF_SAMPLE_BRANCH_KERNEL,
+ PERF_SAMPLE_BRANCH_HV,
+ PERF_SAMPLE_BRANCH_ANY_CALL,
  PERF_SAMPLE_BRANCH_COND,
  PERF_SAMPLE_BRANCH_ANY_RETURN,
  PERF_SAMPLE_BRANCH_IND_CALL };
@@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 
/* Apply SW branch filters and drop the entry if required */
if (!keep_branch(cpuhw->bhrb_entries[u_index].from,
+   cpuhw->bhrb_entries[u_index].to,
cpuhw->bhrb_sw_filter))
u_index--;
u_index++;
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 4743bde..b6e21da 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *bhrb_filter)
 * filter configuration. BHRB is always recorded along with a
 * r

[V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters

2014-05-05 Thread Anshuman Khandual
Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU can only handle one HW based branch filter request at any point of 
time.
For all other combinations PMU will pass it on to the SW.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 50 --
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 699b1dd..4743bde 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,6 +635,16 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 {
+   u64 x, pmu_bhrb_filter;
+   pmu_bhrb_filter = 0;
+   *bhrb_filter = 0;
+
+   /* No branch filter requested */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+   *bhrb_filter = PERF_SAMPLE_BRANCH_ANY;
+   return pmu_bhrb_filter;
+   }
+
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
 * regular PMU event. As the privilege state filter is handled
@@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *bhrb_filter)
/* Ignore user, kernel, hv bits */
branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-   /* No branch filter requested */
-   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
-   return 0;
+   /*
+* P8 does not support oring of PMU HW branch filters. Hence
+* if multiple branch filters are requested which includes filters
+* supported in PMU, still go ahead and clear the PMU based HW branch
+* filter component as in this case all the filters will be processed
+* in SW.
+*/
 
-   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
-   return POWER8_MMCRA_IFM1;
+   for_each_branch_sample_type(x) {
+   /* Ignore privilege branch filters */
+   if ((x == PERF_SAMPLE_BRANCH_USER)
+   || (x == PERF_SAMPLE_BRANCH_KERNEL)
+   || (x == PERF_SAMPLE_BRANCH_HV))
+   continue;
+
+   if (!(branch_sample_type & x))
+   continue;
+
+   /* Supported individual PMU branch filters */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+   branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+   if (branch_sample_type) {
+   /* Multiple branch filters will be processed in 
SW */
+   pmu_bhrb_filter = 0;
+   *bhrb_filter = 0;
+   return pmu_bhrb_filter;
+   } else {
+   /* Individual branch filter will be processed 
in PMU */
+   pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+   *bhrb_filter|= PERF_SAMPLE_BRANCH_ANY_CALL;
+   return pmu_bhrb_filter;
+   }
+   }
}
 
-   /* Every thing else is unsupported */
-   return -1;
+   return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND

2014-05-05 Thread Anshuman Khandual
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Various architectures can provide
this functionality with either with HW filtering support (if present)
or with SW filtering of captured branch instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[V6 05/11] powerpc, perf: Re-arrange BHRB processing

2014-05-05 Thread Anshuman Khandual
This patch cleans up some existing indentation problem and
re-organizes the BHRB processing code with an helper function
named `update_branch_entry` making it more readable. This patch
does not change any functionality.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 102 
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4520c93..66bea54 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr)
return target - (unsigned long)&instr + addr;
 }
 
+/* Update individual branch entry */
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, 
u64 to, int pred)
+{
+   cpuhw->bhrb_entries[u_index].from = from;
+   cpuhw->bhrb_entries[u_index].to = to;
+   cpuhw->bhrb_entries[u_index].mispred = pred;
+   cpuhw->bhrb_entries[u_index].predicted = ~pred;
+   return;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
u64 val;
-   u64 addr;
+   u64 addr, tmp;
int r_index, u_index, pred;
 
r_index = 0;
@@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
if (!val)
/* Terminal marker: End of valid BHRB entries */
break;
-   else {
-   addr = val & BHRB_EA;
-   pred = val & BHRB_PREDICTION;
 
-   if (!addr)
-   /* invalid entry */
-   continue;
+   addr = val & BHRB_EA;
+   pred = val & BHRB_PREDICTION;
 
-   /* Branches are read most recent first (ie. mfbhrb 0 is
-* the most recent branch).
-* There are two types of valid entries:
-* 1) a target entry which is the to address of a
-*computed goto like a blr,bctr,btar.  The next
-*entry read from the bhrb will be branch
-*corresponding to this target (ie. the actual
-*blr/bctr/btar instruction).
-* 2) a from address which is an actual branch.  If a
-*target entry proceeds this, then this is the
-*matching branch for that target.  If this is not
-*following a target entry, then this is a branch
-*where the target is given as an immediate field
-*in the instruction (ie. an i or b form branch).
-*In this case we need to read the instruction from
-*memory to determine the target/to address.
+   if (!addr)
+   /* invalid entry */
+   continue;
+
+   /* Branches are read most recent first (ie. mfbhrb 0 is
+* the most recent branch).
+* There are two types of valid entries:
+* 1) a target entry which is the to address of a
+*computed goto like a blr,bctr,btar.  The next
+*entry read from the bhrb will be branch
+*corresponding to this target (ie. the actual
+*blr/bctr/btar instruction).
+* 2) a from address which is an actual branch.  If a
+*target entry proceeds this, then this is the
+*matching branch for that target.  If this is not
+*following a target entry, then this is a branch
+*where the target is given as an immediate field
+*in the instruction (ie. an i or b form branch).
+*In this case we need to read the instruction from
+*memory to determine the target/to address.
+*/
+   if (val & BHRB_TARGET) {
+   /* Target branches use two entries
+* (ie. computed gotos/XL form)
 */
+   tmp = addr;
 
+   /* Get from address in next entry */
+   val = read_bhrb(r_index++);
+   addr = val & BHRB_EA;
if (val & BHRB_TARGET) {
-   /* Target branches use two entries
-* (ie. computed gotos/XL form)
-*/
-   cpuhw->bhrb_entries[u_index].to = addr;
-   cpuhw->bhrb_entries[u_index].mispred = pred;
-   cpuhw->bhrb_entries[u_index].predicted = ~pred;
-
-   

[V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8

2014-05-05 Thread Anshuman Khandual
This patch does some code re-arrangements to make it clear that
it ignores any separate privilege level branch filter request
and does not support any combinations of HW PMU branch filters.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index fe2763b..13f47f5 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,8 +635,6 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 {
-   u64 pmu_bhrb_filter = 0;
-
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
 * regular PMU event. As the privilege state filter is handled
@@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 * PMU event, we ignore any separate BHRB specific request.
 */
 
-   /* No branch filter requested */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-   return pmu_bhrb_filter;
-
-   /* Invalid branch filter options - HW does not support */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-   return -1;
+   /* Ignore user, kernel, hv bits */
+   branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-   return -1;
+   /* No branch filter requested */
+   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
+   return 0;
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-   pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
-   return pmu_bhrb_filter;
+   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
+   return POWER8_MMCRA_IFM1;
}
 
/* Every thing else is unsupported */
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Boot problems with a PA6T board

2014-05-05 Thread Christian Zigotzky

Hi Michael,

Thanks a lot for your answer. They reasoned that "starting cpu hw idx 
0... failed" is reported because that core of the CPU is already up and 
running.


I have built a git kernel from 2014-04-02.

-> git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git

-> git show 3e75c6de1ac33fe3500f44573d9212dc82c99f59
-> git checkout -f 3e75c6de1ac33fe3500f44573d9212dc82c99f59; git clean -fdx

This kernel booted and showed a Kernel Panic with the following error 
message:


Oops: Machine check, sig: 7 [#1]

Rgds,

Christian


On 05.05.2014 07:48, Michael Ellerman wrote:

On Sun, 2014-05-04 at 18:02 +0200, Christian Zigotzky wrote:

Hi All,

The RC 1, 2, and 3 of the kernel 3.15 don't boot on my PA6T board with a
Radeon HD 6870 graphics card.

Screenshot:
http://forum.hyperion-entertainment.biz/download/file.php?id=1060&mode=view

The kernel 3.14 starts without any problems. Has anyone a tip for me,
please?

The line that says "starting cpu hw idx 0... failed" looks a little worrying.
Do you see that on 3.14 as well?

Otherwise bisection is probably your best bet.

cheers





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5] KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest

2014-05-05 Thread Alexander Graf

On 05/05/2014 05:09 AM, Aneesh Kumar K.V wrote:

This patch make sure we inherit the LE bit correctly in different case
so that we can run Little Endian distro in PR mode

Signed-off-by: Aneesh Kumar K.V 


Thanks, applied to kvm-ppc-queue.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf

On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
   saved dsisr or not

  arch/powerpc/include/asm/disassemble.h | 34 +++
  arch/powerpc/kernel/align.c| 34 +--
  arch/powerpc/kvm/book3s_emulate.c  | 43 --
  3 files changed, 40 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 856f8deb557a..6330a61b875a 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst)
  {
return (inst >> 11) & 0x7fff;
  }
+
+#define IS_XFORM(inst) (get_op(inst)  == 31)
+#define IS_DSFORM(inst)(get_op(inst) >= 56)
+
+/*
+ * Create a DSISR value from the instruction
+ */
+static inline unsigned make_dsisr(unsigned instr)
+{
+   unsigned dsisr;
+
+
+   /* bits  6:15 --> 22:31 */
+   dsisr = (instr & 0x03ff) >> 16;
+
+   if (IS_XFORM(instr)) {
+   /* bits 29:30 --> 15:16 */
+   dsisr |= (instr & 0x0006) << 14;
+   /* bit 25 -->17 */
+   dsisr |= (instr & 0x0040) << 8;
+   /* bits 21:24 --> 18:21 */
+   dsisr |= (instr & 0x0780) << 3;
+   } else {
+   /* bit  5 -->17 */
+   dsisr |= (instr & 0x0400) >> 12;
+   /* bits  1: 4 --> 18:21 */
+   dsisr |= (instr & 0x7800) >> 17;
+   /* bits 30:31 --> 12:13 */
+   if (IS_DSFORM(instr))
+   dsisr |= (instr & 0x0003) << 18;
+   }
+
+   return dsisr;
+}
  #endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index 94908af308d8..34f55524d456 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -25,14 +25,13 @@
  #include 
  #include 
  #include 
+#include 
  
  struct aligninfo {

unsigned char len;
unsigned char flags;
  };
  
-#define IS_XFORM(inst)	(((inst) >> 26) == 31)

-#define IS_DSFORM(inst)(((inst) >> 26) >= 56)
  
  #define INVALID	{ 0, 0 }
  
@@ -192,37 +191,6 @@ static struct aligninfo aligninfo[128] = {

  };
  
  /*

- * Create a DSISR value from the instruction
- */
-static inline unsigned make_dsisr(unsigned instr)
-{
-   unsigned dsisr;
-
-
-   /* bits  6:15 --> 22:31 */
-   dsisr = (instr & 0x03ff) >> 16;
-
-   if (IS_XFORM(instr)) {
-   /* bits 29:30 --> 15:16 */
-   dsisr |= (instr & 0x0006) << 14;
-   /* bit 25 -->17 */
-   dsisr |= (instr & 0x0040) << 8;
-   /* bits 21:24 --> 18:21 */
-   dsisr |= (instr & 0x0780) << 3;
-   } else {
-   /* bit  5 -->17 */
-   dsisr |= (instr & 0x0400) >> 12;
-   /* bits  1: 4 --> 18:21 */
-   dsisr |= (instr & 0x7800) >> 17;
-   /* bits 30:31 --> 12:13 */
-   if (IS_DSFORM(instr))
-   dsisr |= (instr & 0x0003) << 18;
-   }
-
-   return dsisr;
-}
-
-/*
   * The dcbz (data cache block zero) instruction
   * gives an alignment fault if used on non-cacheable
   * memory.  We handle the fault mainly for the
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 99d40f8977e8..04c38f049dfd 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -569,48 +569,14 @@ unprivileged:
  
  u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)

  {
-   u32 dsisr = 0;
-
-   /*
-* This is what the spec says about DSISR bits (not mentioned = 0):
-*
-* 12:13[DS]Set to bits 30:31
-* 15:16[X] Set to bits 29:30
-* 17   [X] Set to bit 25
-*  [D/DS]  Set to bit 5
-* 18:21[X] Set to bits 21:24
-*  [D/DS]  Set to bits 1:4
-* 22:26Set to bits 6:10 (RT/RS/FRT/FRS)
-* 27:31Set to bits 11:15 (RA)
-*/
-
-   switch (get_op(inst)) {
-   /* D-form */
-   case OP_LFS:
-   case OP_LFD:
-   case OP_STFD:
-   case OP_STFS:
-   dsisr |= (inst >> 12) & 0x4000;   /* bit 17 */
-   dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */
-   break;
-   /* X-form */
-   case 31:
-   dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */
-   dsisr |= (inst <

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Alexander Graf

On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:

We reserve 5% of total ram for CMA allocation and not using that can
result in us running out of numa node memory with specific
configuration. One caveat is we may not have node local hpt with pinned
vcpu configuration. But currently libvirt also pins the vcpu to cpuset
after creating hash page table.


I don't understand the problem. Can you please elaborate?


Alex



Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++-
  1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb25ebc0af0c..f32896ffd784 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm);
  
  long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)

  {
-   unsigned long hpt;
+   unsigned long hpt = 0;
struct revmap_entry *rev;
struct page *page = NULL;
long order = KVM_DEFAULT_HPT_ORDER;
@@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
  
  	kvm->arch.hpt_cma_alloc = 0;

-   /*
-* try first to allocate it from the kernel page allocator.
-* We keep the CMA reserved for failed allocation.
-*/
-   hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
-  __GFP_NOWARN, order - PAGE_SHIFT);
-
-   /* Next try to allocate from the preallocated pool */
-   if (!hpt) {
-   VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
-   page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
-   if (page) {
-   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
-   kvm->arch.hpt_cma_alloc = 1;
-   } else
-   --order;
+   VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
+   page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
+   if (page) {
+   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
+   kvm->arch.hpt_cma_alloc = 1;
}
  
  	/* Lastly try successively smaller sizes from the page allocator */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on

2014-05-05 Thread Alexander Graf

On 05/04/2014 07:26 PM, Aneesh Kumar K.V wrote:

With debug option "sleep inside atomic section checking" enabled we get
the below WARN_ON during a PR KVM boot. This is because upstream now
have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
warning by adding preempt_disable/enable around floating point and altivec
enable.

WARNING: at arch/powerpc/kernel/process.c:156
Modules linked in: kvm_pr kvm
CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: GW 3.15.0-rc1+ #4
task: c000eb85b3a0 ti: c000ec59c000 task.ti: c000ec59c000
NIP: c0015c84 LR: d3334644 CTR: c0015c00
REGS: c000ec59f140 TRAP: 0700   Tainted: GW  (3.15.0-rc1+)
MSR: 80029032   CR: 4224  XER: 2000
CFAR: c0015c24 SOFTE: 1
GPR00: d3334644 c000ec59f3c0 c0e2fa40 c000e2f8
GPR04: 0800 2000 0001 8000
GPR08: 0001 0001 2000 c0015c00
GPR12: d333da18 cfb80900  
GPR16:    3fffce4e0fa1
GPR20: 0010 0001 0002 100b9a38
GPR24: 0002   0013
GPR28:  c000eb85b3a0 2000 c000e2f8
NIP [c0015c84] .enable_kernel_fp+0x84/0x90
LR [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
Call Trace:
[c000ec59f3c0] [0010] 0x10 (unreliable)
[c000ec59f430] [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
[c000ec59f4c0] [d324b380] .kvmppc_set_msr+0x30/0x50 [kvm]
[c000ec59f530] [d3337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 
[kvm_pr]
[c000ec59f5f0] [d324a944] .kvmppc_emulate_instruction+0x284/0xa80 
[kvm]
[c000ec59f6c0] [d3336888] .kvmppc_handle_exit_pr+0x488/0xb70 
[kvm_pr]
[c000ec59f790] [d3338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr]
[c000ec59f960] [d3336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr]
[c000ec59f9f0] [d324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm]
[c000ec59fa60] [d3249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm]
[c000ec59faf0] [d3244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm]
[c000ec59fcb0] [c0224e34] .do_vfs_ioctl+0x4d4/0x790
[c000ec59fd90] [c0225148] .SyS_ioctl+0x58/0xb0
[c000ec59fe30] [c000a1e4] syscall_exit+0x0/0x98

Signed-off-by: Aneesh Kumar K.V 


Thanks, applied to kvm-ppc-queue.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Alexander Graf

On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:

Signed-off-by: Aneesh Kumar K.V 


No patch description, no proper explanations anywhere why you're doing 
what. All of that in a pretty sensitive piece of code. There's no way 
this patch can go upstream in its current form.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alexander Graf

On 05/05/2014 03:27 AM, Gavin Shan wrote:

The series of patches intends to support EEH for PCI devices, which have been
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to support
EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
   to emulate XICS. Without introducing new mechanism, we just extend that
   existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
   initiated from guest are posted to host where the requests get handled or
   delivered to underly firmware for further handling. For that, the host kerenl
   has to maintain the PCI address (host domain/bus/slot/function to guest's
   PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
   will be built when initializing VFIO device in QEMU and destroied when the
   VFIO device in QEMU is going to offline, or VM is destroy.


Do you also expose all those interfaces to user space? VFIO is as much 
about user space device drivers as it is about device assignment.


I would like to first see an implementation that doesn't touch KVM 
emulation code at all but instead routes everything through QEMU. As a 
second step we can then accelerate performance critical paths inside of KVM.


That way we ensure that user space device drivers have all the power 
over a device they need to drive it.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall

2014-05-05 Thread Tudor Laurentiu

On 04/30/2014 11:09 PM, Alexander Graf wrote:


On 30.04.14 22:03, Stuart Yoder wrote:



-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Wednesday, April 30, 2014 2:56 PM
To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc: move epapr paravirt init of power_save to
an initcall


On 30.04.14 21:54, Stuart Yoder wrote:

From: Stuart Yoder 

some restructuring of epapr paravirt init resulted in
ppc_md.power_save being set, and then overwritten to
NULL during machine_init.  This patch splits the
initialization of ppc_md.power_save out into a postcore
init call.

Signed-off-by: Stuart Yoder 
---
   arch/powerpc/kernel/epapr_paravirt.c |   25
-
   1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/epapr_paravirt.c

b/arch/powerpc/kernel/epapr_paravirt.c

index 6300c13..c49b69c 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -52,11 +52,6 @@ static int __init early_init_dt_scan_epapr(unsigned

long node,

   #endif
   }

-#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
-if (of_get_flat_dt_prop(node, "has-idle", NULL))
-ppc_md.power_save = epapr_ev_idle;
-#endif
-
   epapr_paravirt_enabled = true;

   return 1;
@@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
   return 0;
   }

+static int __init epapr_idle_init_dt_scan(unsigned long node,
+   const char *uname,
+   int depth, void *data)
+{
+#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
+if (of_get_flat_dt_prop(node, "has-idle", NULL))
+ppc_md.power_save = epapr_ev_idle;
+#endif
+return 0;
+}
+
+static int __init epapr_idle_init(void)
+{
+if (epapr_paravirt_enabled)
+of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);

Doesn't this scan all nodes? We only want to match on
/hypervisor/has-idle, no?

I cut/pasted from  the approach the existing code in that file
took, but yes you're right we just need the one property.
Let me respin that to look at the hypervisor node only.


Yeah, the same commit that introduced the breakage on has-idle also
removed the explicit check for /hypervisor.

Laurentiu, was this change on purpose?



Alex,

IIRC, at that time i had to switch from the normal "of" functions to a 
completely different api that's available in early init stage. This 
early "of" api is pretty limited (e.g. doesn't have a way to address a 
specific node) and i had to use that function that scans the whole tree.


---
Best Regards, Laurentiu
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall

2014-05-05 Thread Alexander Graf

On 05/05/2014 02:17 PM, Tudor Laurentiu wrote:

On 04/30/2014 11:09 PM, Alexander Graf wrote:


On 30.04.14 22:03, Stuart Yoder wrote:



-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Wednesday, April 30, 2014 2:56 PM
To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc: move epapr paravirt init of 
power_save to

an initcall


On 30.04.14 21:54, Stuart Yoder wrote:

From: Stuart Yoder 

some restructuring of epapr paravirt init resulted in
ppc_md.power_save being set, and then overwritten to
NULL during machine_init.  This patch splits the
initialization of ppc_md.power_save out into a postcore
init call.

Signed-off-by: Stuart Yoder 
---
   arch/powerpc/kernel/epapr_paravirt.c |   25
-
   1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/epapr_paravirt.c

b/arch/powerpc/kernel/epapr_paravirt.c

index 6300c13..c49b69c 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -52,11 +52,6 @@ static int __init 
early_init_dt_scan_epapr(unsigned

long node,

   #endif
   }

-#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
-if (of_get_flat_dt_prop(node, "has-idle", NULL))
-ppc_md.power_save = epapr_ev_idle;
-#endif
-
   epapr_paravirt_enabled = true;

   return 1;
@@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
   return 0;
   }

+static int __init epapr_idle_init_dt_scan(unsigned long node,
+   const char *uname,
+   int depth, void *data)
+{
+#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
+if (of_get_flat_dt_prop(node, "has-idle", NULL))
+ppc_md.power_save = epapr_ev_idle;
+#endif
+return 0;
+}
+
+static int __init epapr_idle_init(void)
+{
+if (epapr_paravirt_enabled)
+of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);

Doesn't this scan all nodes? We only want to match on
/hypervisor/has-idle, no?

I cut/pasted from  the approach the existing code in that file
took, but yes you're right we just need the one property.
Let me respin that to look at the hypervisor node only.


Yeah, the same commit that introduced the breakage on has-idle also
removed the explicit check for /hypervisor.

Laurentiu, was this change on purpose?



Alex,

IIRC, at that time i had to switch from the normal "of" functions to a 
completely different api that's available in early init stage. This 
early "of" api is pretty limited (e.g. doesn't have a way to address a 
specific node) and i had to use that function that scans the whole tree.


Ok, so it is an accident. Could you please post a patch that checks that 
the node we're looking at is called "hypervisor"? The simple API should 
give you enough information for that at least. Maybe you could even 
check that the parent node is the root node.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall

2014-05-05 Thread Tudor Laurentiu

On 05/05/2014 03:21 PM, Alexander Graf wrote:

On 05/05/2014 02:17 PM, Tudor Laurentiu wrote:

On 04/30/2014 11:09 PM, Alexander Graf wrote:


On 30.04.14 22:03, Stuart Yoder wrote:



-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Wednesday, April 30, 2014 2:56 PM
To: Yoder Stuart-B08248; b...@kernel.crashing.org; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc: move epapr paravirt init of
power_save to
an initcall


On 30.04.14 21:54, Stuart Yoder wrote:

From: Stuart Yoder 

some restructuring of epapr paravirt init resulted in
ppc_md.power_save being set, and then overwritten to
NULL during machine_init.  This patch splits the
initialization of ppc_md.power_save out into a postcore
init call.

Signed-off-by: Stuart Yoder 
---
   arch/powerpc/kernel/epapr_paravirt.c |   25
-
   1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/epapr_paravirt.c

b/arch/powerpc/kernel/epapr_paravirt.c

index 6300c13..c49b69c 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -52,11 +52,6 @@ static int __init
early_init_dt_scan_epapr(unsigned

long node,

   #endif
   }

-#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
-if (of_get_flat_dt_prop(node, "has-idle", NULL))
-ppc_md.power_save = epapr_ev_idle;
-#endif
-
   epapr_paravirt_enabled = true;

   return 1;
@@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
   return 0;
   }

+static int __init epapr_idle_init_dt_scan(unsigned long node,
+   const char *uname,
+   int depth, void *data)
+{
+#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
+if (of_get_flat_dt_prop(node, "has-idle", NULL))
+ppc_md.power_save = epapr_ev_idle;
+#endif
+return 0;
+}
+
+static int __init epapr_idle_init(void)
+{
+if (epapr_paravirt_enabled)
+of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);

Doesn't this scan all nodes? We only want to match on
/hypervisor/has-idle, no?

I cut/pasted from  the approach the existing code in that file
took, but yes you're right we just need the one property.
Let me respin that to look at the hypervisor node only.


Yeah, the same commit that introduced the breakage on has-idle also
removed the explicit check for /hypervisor.

Laurentiu, was this change on purpose?



Alex,

IIRC, at that time i had to switch from the normal "of" functions to a
completely different api that's available in early init stage. This
early "of" api is pretty limited (e.g. doesn't have a way to address a
specific node) and i had to use that function that scans the whole tree.


Ok, so it is an accident. Could you please post a patch that checks that
the node we're looking at is called "hypervisor"? The simple API should
give you enough information for that at least. Maybe you could even
check that the parent node is the root node.



Just had a quick look and it looks that that early fdt api was improved 
with a function that allows specifying a starting path for the scan 
(of_scan_flat_dt_by_path() added in commit 
57d74bcf3072b65bde5aa540cedc976a75c48e5c). So i think we can simply use 
that instead.


---
Best Regards, Laurentiu
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: memcpy optimization for 64bit LE

2014-05-05 Thread Philippe Bergheaud

Anton Blanchard wrote:

Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).

The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.

[ I simplified the loop a bit - Anton ]

Got it.

The 3 instructions that you have removed were modifying r5 for no reason,
as the last instruction was always resetting r5 to its initial value.

Philippe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alex Williamson
On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
> On 05/05/2014 03:27 AM, Gavin Shan wrote:
> > The series of patches intends to support EEH for PCI devices, which have 
> > been
> > passed through to PowerKVM based guest via VFIO. The implementation is
> > straightforward based on the issues or problems we have to resolve to 
> > support
> > EEH for PowerKVM based guest.
> >
> > - Emulation for EEH RTAS requests. Thanksfully, we already have 
> > infrastructure
> >to emulate XICS. Without introducing new mechanism, we just extend that
> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
> >initiated from guest are posted to host where the requests get handled or
> >delivered to underly firmware for further handling. For that, the host 
> > kerenl
> >has to maintain the PCI address (host domain/bus/slot/function to guest's
> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address 
> > mapping
> >will be built when initializing VFIO device in QEMU and destroied when 
> > the
> >VFIO device in QEMU is going to offline, or VM is destroy.
> 
> Do you also expose all those interfaces to user space? VFIO is as much 
> about user space device drivers as it is about device assignment.
> 
> I would like to first see an implementation that doesn't touch KVM 
> emulation code at all but instead routes everything through QEMU. As a 
> second step we can then accelerate performance critical paths inside of KVM.
> 
> That way we ensure that user space device drivers have all the power 
> over a device they need to drive it.

+1



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V
Alexander Graf  writes:

> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>> Although it's optional IBM POWER cpus always had DAR value set on
>> alignment interrupt. So don't try to compute these values.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>> Changes from V3:
>> * Use make_dsisr instead of checking feature flag to decide whether to use
>>saved dsisr or not
>>



>>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>   {
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +return vcpu->arch.fault_dar;
>
> How about PA6T and G5s?
>
>

Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Aneesh Kumar K.V
Alexander Graf  writes:

> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>> We reserve 5% of total ram for CMA allocation and not using that can
>> result in us running out of numa node memory with specific
>> configuration. One caveat is we may not have node local hpt with pinned
>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>> after creating hash page table.
>
> I don't understand the problem. Can you please elaborate?
>
>

Lets take a system with 100GB RAM. We reserve around 5GB for htab
allocation. Now if we use rest of available memory for hugetlbfs
(because we want all the guest to be backed by huge pages), we would
end up in a situation where we have a few GB of free RAM and 5GB of CMA
reserve area. Now if we allow hash page table allocation to consume the
free space, we would end up hitting page allocation failure for other
non movable kernel allocation even though we still have 5GB CMA reserve
space free.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf

On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
saved dsisr or not





   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
   {
+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?



Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.


Yes, and I'm asking whether we know that this statement holds true for 
PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
at least developed by IBM, I'd assume its semantics here are similar to 
POWER4, but for PA6T I wouldn't be so sure.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Aneesh Kumar K.V
Alexander Graf  writes:

> On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V 
>
> No patch description, no proper explanations anywhere why you're doing 
> what. All of that in a pretty sensitive piece of code. There's no way 
> this patch can go upstream in its current form.
>

Sorry about being vague. Will add a better commit message. The goal is
to export MPSS support to guest if the host support the same. MPSS
support is exported via penc encoding in "ibm,segment-page-sizes". The
actual format can be found at htab_dt_scan_page_sizes. When the guest
memory is backed by hugetlbfs we expose the penc encoding the host
support to guest via kvmppc_add_seg_page_size. 

Now the challenge to THP support is to make sure that our henter,
hremove etc decode base page size and actual page size correctly
from the hash table entry values. Most of the changes is to do that.
Rest of the stuff is already handled by kvm. 

NOTE: It is much easier to read the code after applying the patch rather
than reading the diff. I have added comments around each steps in the
code.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V
Alexander Graf  writes:

> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf  writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
 Although it's optional IBM POWER cpus always had DAR value set on
 alignment interrupt. So don't try to compute these values.

 Signed-off-by: Aneesh Kumar K.V 
 ---
 Changes from V3:
 * Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not

>> 
>>
ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
 +#ifdef CONFIG_PPC_BOOK3S_64
 +  return vcpu->arch.fault_dar;
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
> Yes, and I'm asking whether we know that this statement holds true for 
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
> at least developed by IBM, I'd assume its semantics here are similar to 
> POWER4, but for PA6T I wouldn't be so sure.

I will have to defer to Paul on that question. But that should not
prevent this patch from going upstream right ?

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V
Olof Johansson  writes:

> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>
>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>
>>> Alexander Graf  writes:
>>>
>>>  On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to
> use
> saved dsisr or not
>
>  
>>>
>>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>{
> +#ifdef CONFIG_PPC_BOOK3S_64
> +   return vcpu->arch.fault_dar;
>
 How about PA6T and G5s?


  Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>>>
>>
>> Yes, and I'm asking whether we know that this statement holds true for
>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at
>> least developed by IBM, I'd assume its semantics here are similar to
>> POWER4, but for PA6T I wouldn't be so sure.
>>
>>
> Thanks for looking out for us, obviously IBM doesn't (based on the reply a
> minute ago).

The reason I deferred the question to Paul is really because I don't
know enough about PA6T and G5 to comment. I intentionally restricted the
changes to BOOK3S_64 because I wanted to make sure I don't break
anything else. It is in no way to hint that others don't care.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Olof Johansson
[Now without HTML email -- it's what you get for cc:ing me at work
instead of my upstream email :)]

2014-05-05 7:43 GMT-07:00 Alexander Graf :
>
> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>
>> Alexander Graf  writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

 Although it's optional IBM POWER cpus always had DAR value set on
 alignment interrupt. So don't try to compute these values.

 Signed-off-by: Aneesh Kumar K.V 
 ---
 Changes from V3:
 * Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not

>> 
>>
ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
 +#ifdef CONFIG_PPC_BOOK3S_64
 +   return vcpu->arch.fault_dar;
>>>
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
>
> Yes, and I'm asking whether we know that this statement holds true for PA6T 
> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
> developed by IBM, I'd assume its semantics here are similar to POWER4, but 
> for PA6T I wouldn't be so sure.
>

Thanks for looking out for us, obviously IBM doesn't (based on the
reply a minute ago).

In the end, since there's been no work to enable KVM on PA6T, I'm not
too worried. I guess it's one more thing to sort out (and check for)
whenever someone does that.

I definitely don't have cycles to deal with that myself at this time.
I can help find hardware for someone who wants to, but even then I'm
guessing the interest is pretty limited.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf


> Am 05.05.2014 um 16:57 schrieb Olof Johansson :
> 
> [Now without HTML email -- it's what you get for cc:ing me at work
> instead of my upstream email :)]
> 
> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>> 
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>> 
>>> Alexander Graf  writes:
>>> 
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> 
> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to use
>saved dsisr or not
>>> 
>>> 
>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>   {
> +#ifdef CONFIG_PPC_BOOK3S_64
> +   return vcpu->arch.fault_dar;
 
 How about PA6T and G5s?
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>> 
>> 
>> Yes, and I'm asking whether we know that this statement holds true for PA6T 
>> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
>> developed by IBM, I'd assume its semantics here are similar to POWER4, but 
>> for PA6T I wouldn't be so sure.
> 
> Thanks for looking out for us, obviously IBM doesn't (based on the
> reply a minute ago).
> 
> In the end, since there's been no work to enable KVM on PA6T, I'm not
> too worried. I guess it's one more thing to sort out (and check for)
> whenever someone does that.
> 
> I definitely don't have cycles to deal with that myself at this time.
> I can help find hardware for someone who wants to, but even then I'm
> guessing the interest is pretty limited.

I know of at least 1 person who successfully runs PR KVM on a PA6T, so it's 
neither neglected nor non-working.

If you can get me access to a pa6t system I can easily check whether alignment 
interrupts generate dar and dsisr properly :).


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf


> Am 05.05.2014 um 16:50 schrieb "Aneesh Kumar K.V" 
> :
> 
> Alexander Graf  writes:
> 
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf  writes:
>>> 
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to use
>saved dsisr or not
>>> 
>>> 
>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>   {
> +#ifdef CONFIG_PPC_BOOK3S_64
> +return vcpu->arch.fault_dar;
 How about PA6T and G5s?
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>> 
>> Yes, and I'm asking whether we know that this statement holds true for 
>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
>> at least developed by IBM, I'd assume its semantics here are similar to 
>> POWER4, but for PA6T I wouldn't be so sure.
> 
> I will have to defer to Paul on that question. But that should not
> prevent this patch from going upstream right ?

Regressions are big no-gos.

Alex

> 
> -aneesh
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Alexander Graf


> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" 
> :
> 
> Alexander Graf  writes:
> 
>>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>>> We reserve 5% of total ram for CMA allocation and not using that can
>>> result in us running out of numa node memory with specific
>>> configuration. One caveat is we may not have node local hpt with pinned
>>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>>> after creating hash page table.
>> 
>> I don't understand the problem. Can you please elaborate?
> 
> Lets take a system with 100GB RAM. We reserve around 5GB for htab
> allocation. Now if we use rest of available memory for hugetlbfs
> (because we want all the guest to be backed by huge pages), we would
> end up in a situation where we have a few GB of free RAM and 5GB of CMA
> reserve area. Now if we allow hash page table allocation to consume the
> free space, we would end up hitting page allocation failure for other
> non movable kernel allocation even though we still have 5GB CMA reserve
> space free.

Isn't this a greater problem? We should start swapping before we hit the point 
where non movable kernel allocation fails, no?

The fact that KVM uses a good number of normal kernel pages is maybe 
suboptimal, but shouldn't be a critical problem.


Alex

> 
> -aneesh
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Aneesh Kumar K.V
Alexander Graf  writes:

>> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" 
>> :
>> 
>> Alexander Graf  writes:
>> 
 On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
 We reserve 5% of total ram for CMA allocation and not using that can
 result in us running out of numa node memory with specific
 configuration. One caveat is we may not have node local hpt with pinned
 vcpu configuration. But currently libvirt also pins the vcpu to cpuset
 after creating hash page table.
>>> 
>>> I don't understand the problem. Can you please elaborate?
>> 
>> Lets take a system with 100GB RAM. We reserve around 5GB for htab
>> allocation. Now if we use rest of available memory for hugetlbfs
>> (because we want all the guest to be backed by huge pages), we would
>> end up in a situation where we have a few GB of free RAM and 5GB of CMA
>> reserve area. Now if we allow hash page table allocation to consume the
>> free space, we would end up hitting page allocation failure for other
>> non movable kernel allocation even though we still have 5GB CMA reserve
>> space free.
>
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

But there is nothing much to swap. Because most of the memory is
reserved for guest RAM via hugetlbfs. 

>
> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

Yes. But then in this case we could do better isn't it ? We already have
a large part of guest RAM kept aside for htab allocation which cannot be
used for non movable allocation. And we ignore that reserve space and
use other areas for hash page table allocation with the current code.

We actually hit this case in one of the test box.

 KVM guest htab at c01e5000 (order 30), LPID 1
 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0
 libvirtd cpuset=/ mems_allowed=0,16
 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1
 Call Trace:
 [c01e3b63f150] [c0017330] .show_stack+0x130/0x200(unreliable)
 [c01e3b63f220] [c087a888] .dump_stack+0x28/0x3c
 [c01e3b63f290] [c0876a4c] .dump_header+0xbc/0x228
 [c01e3b63f360] [c01dd838].oom_kill_process+0x318/0x4c0
 [c01e3b63f440] [c01de258] .out_of_memory+0x518/0x550
 [c01e3b63f520] [c01e5aac].__alloc_pages_nodemask+0xb3c/0xbf0
 [c01e3b63f700] [c0243580] .new_slab+0x440/0x490
 [c01e3b63f7a0] [c08781fc] .__slab_alloc+0x17c/0x618
 [c01e3b63f8d0] [c02467fc].kmem_cache_alloc_node_trace+0xcc/0x300
 [c01e3b63f990] [c010f62c].alloc_fair_sched_group+0xfc/0x200
 [c01e3b63fa60] [c0104f00].sched_create_group+0x50/0xe0
 [c01e3b63fae0] [c0104fc0].cpu_cgroup_css_alloc+0x30/0x80
 [c01e3b63fb60] [c01513ec] .cgroup_mkdir+0x2bc/0x6e0
 [c01e3b63fc50] [c0275aec] .vfs_mkdir+0x14c/0x220
 [c01e3b63fcf0] [c027a734] .SyS_mkdirat+0x94/0x110
 [c01e3b63fdb0] [c027a7e4] .SyS_mkdir+0x34/0x50
 [c01e3b63fe30] [c0009f54] syscall_exit+0x0/0x98


Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB
active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB
unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB
shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB
kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB
active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB
shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB
kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB
writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)

2014-05-05 Thread Diana Craciun
From: Diana Craciun 

The CoreNet coherency fabric is a fabric-oriented, conectivity
infrastructure that enables the implementation of coherent, multicore
systems. The CCF acts as a central interconnect for cores,
platform-level caches, memory subsystem, peripheral devices and I/O host
bridges in the system.

Signed-off-by: Diana Craciun 
---
v3:
- added port ID mapping
- removed fsl,corenetx-cf

 .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 ++
 .../devicetree/bindings/powerpc/fsl/cpus.txt   |  8 +
 .../devicetree/bindings/powerpc/fsl/pamu.txt   |  8 +
 3 files changed, 58 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
new file mode 100644
index 000..1263c29
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
@@ -0,0 +1,42 @@
+Freescale CoreNet Coherency Fabric(CCF) Device Tree Binding
+
+DESCRIPTION
+
+The CoreNet coherency fabric is a fabric-oriented, connectivity infrastructure
+that enables the implementation of coherent, multicore systems.
+
+Required properties:
+
+- compatible : 
+   fsl,corenet1-cf - CoreNet coherency fabric version 1. Example 
chips: T4240,
+   B4860
+   fsl,corenet2-cf - CoreNet coherency fabric version 2. Example 
chips: P5020,
+   P4080, P3041, P2041
+   fsl,corenet-cf - It is used to represent the common registers 
between
+   CCF version 1 and CCF version 2. This compatible is retained for
+   compatibility reasons as it was already used for both CCF 
version 1 chips
+   and CCF version 2 chips.
+
+- reg : 
+   A standard property. Represents the CCF registers.
+
+- interrupts : 
+   Interrupt mapping for CCF error interrupt.
+
+- fsl,ccf-num-csdids: 
+   Specifies the number of Coherency Subdomain ID Port Mapping
+   Registers that are supported by the CCF.
+
+- fsl,ccf-num-snoopids: 
+   Specifies the number of Snoop ID Port Mapping Registers that
+   are supported by CCF.
+
+Example:
+
+   corenet-cf@18000 {
+   compatible = "fsl,corenet2-cf", "fsl,corenet-cf";
+   reg = <0x18000 0x1000>;
+   interrupts = <16 2 1 31>;
+   fsl,ccf-num-csdids = <32>;
+   fsl,ccf-num-snoopids = <32>;
+   };
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
index 922c30a..09dbc5f 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
@@ -20,3 +20,11 @@ PROPERTIES
a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
name with all uppercase letters converted to lowercase, indicates that
the category is supported by the implementation.
+
+   - fsl,portid-mapping : 
+   The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
Mapping
+   registers which are part of the CoreNet Coherency fabric (CCF) provide a
+   CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping 
functions.
+   Certain bits from these registers should be set if the coresponding CPU
+   should be snooped. This property defines a bitmask which selects the 
bit that
+   should be set if this cpu should be snooped.
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
index 1f5e329..827c637 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
@@ -26,6 +26,13 @@ Required properties:
  A standard property.
 - #size-cells  : 
  A standard property.
+- fsl,portid-mapping : 
+   The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
Mapping
+   registers which are part of the CoreNet Coherency fabric (CCF) provide a
+   CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping 
functions.
+   Certain bits from these registers should be set if PAMUs should be 
snooped.
+   This property defines a bitmask which selects the bits that should be 
set
+   if PAMUs should be snooped.
 
 Optional properties:
 - reg  : 
@@ -88,6 +95,7 @@ Example:
compatible = "fsl,pamu-v1.0", "fsl,pamu";
reg = <0x2 0x5000>;
ranges = <0 0x2 0x5000>;
+   fsl,portid-mapping = <0xf8>;
#address-cells = <1>;
#size-cells = <1>;
interrupts = <
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-de

[PATCH v2] powerpc/fsl: Updated device trees for platforms with corenet version 2

2014-05-05 Thread Diana Craciun
From: Diana Craciun 

Updated the device trees according to the corenet-cf
binding definition.

Signed-off-by: Diana Craciun 
---
 arch/powerpc/boot/dts/b4860emu.dts  |  7 ++-
 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi |  4 
 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi  |  2 ++
 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  4 
 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi  |  4 
 arch/powerpc/boot/dts/fsl/b4si-post.dtsi|  3 ++-
 arch/powerpc/boot/dts/fsl/t4240si-post.dtsi |  3 ++-
 arch/powerpc/boot/dts/fsl/t4240si-pre.dtsi  | 12 
 arch/powerpc/boot/dts/t4240emu.dts  | 15 ++-
 9 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/boot/dts/b4860emu.dts 
b/arch/powerpc/boot/dts/b4860emu.dts
index 7290021..85646b4 100644
--- a/arch/powerpc/boot/dts/b4860emu.dts
+++ b/arch/powerpc/boot/dts/b4860emu.dts
@@ -61,21 +61,25 @@
device_type = "cpu";
reg = <0 1>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
cpu1: PowerPC,e6500@2 {
device_type = "cpu";
reg = <2 3>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
cpu2: PowerPC,e6500@4 {
device_type = "cpu";
reg = <4 5>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
cpu3: PowerPC,e6500@6 {
device_type = "cpu";
reg = <6 7>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
};
 };
@@ -157,7 +161,7 @@
};
 
corenet-cf@18000 {
-   compatible = "fsl,b4-corenet-cf";
+   compatible = "fsl,corenet2-cf", "fsl,corenet-cf";
reg = <0x18000 0x1000>;
interrupts = <16 2 1 0>;
fsl,ccf-num-csdids = <32>;
@@ -167,6 +171,7 @@
iommu@2 {
compatible = "fsl,pamu-v1.0", "fsl,pamu";
reg = <0x2 0x4000>;
+   fsl,portid-mapping = <0x8000>;
#address-cells = <1>;
#size-cells = <1>;
interrupts = <
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
index 60566f99..d678944 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
@@ -76,10 +76,6 @@
compatible = "fsl,b4420-l3-cache-controller", "cache";
};
 
-   corenet-cf@18000 {
-   compatible = "fsl,b4420-corenet-cf";
-   };
-
guts: global-utilities@e {
compatible = "fsl,b4420-device-config", 
"fsl,qoriq-device-config-2.0";
};
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
index 2419731..338af7e 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
@@ -66,12 +66,14 @@
reg = <0 1>;
clocks = <&mux0>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
cpu1: PowerPC,e6500@2 {
device_type = "cpu";
reg = <2 3>;
clocks = <&mux0>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
index cbc354b..582381d 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
@@ -120,10 +120,6 @@
compatible = "fsl,b4860-l3-cache-controller", "cache";
};
 
-   corenet-cf@18000 {
-   compatible = "fsl,b4860-corenet-cf";
-   };
-
guts: global-utilities@e {
compatible = "fsl,b4860-device-config", 
"fsl,qoriq-device-config-2.0";
};
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
index 142ac86..1948f73 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
@@ -66,24 +66,28 @@
reg = <0 1>;
clocks = <&mux0>;
next-level-cache = <&L2>;
+   fsl,portid-mapping = <0x8000>;
};
cpu1: PowerPC,e6500@2 {
device_type = "cpu";
reg = <2 3>;
clocks = <&mu

[PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support

2014-05-05 Thread Lijun Pan
P1023RDS is no longer supported/manufactured by Freescale while P1023RDB is.

Signed-off-by: Lijun Pan 
---
 arch/powerpc/boot/dts/p1023rds.dts | 219 -
 arch/powerpc/configs/mpc85xx_defconfig |   1 -
 arch/powerpc/configs/mpc85xx_smp_defconfig |   1 -
 arch/powerpc/platforms/85xx/Kconfig|   6 +-
 arch/powerpc/platforms/85xx/Makefile   |   2 +-
 .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}|  36 +---
 6 files changed, 10 insertions(+), 255 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts
 rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%)

diff --git a/arch/powerpc/boot/dts/p1023rds.dts 
b/arch/powerpc/boot/dts/p1023rds.dts
deleted file mode 100644
index beb6cb1..000
--- a/arch/powerpc/boot/dts/p1023rds.dts
+++ /dev/null
@@ -1,219 +0,0 @@
-/*
- * P1023 RDS Device Tree Source
- *
- * Copyright 2010-2011 Freescale Semiconductor Inc.
- *
- * Author: Roy Zang 
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are met:
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in the
- *   documentation and/or other materials provided with the distribution.
- * * Neither the name of Freescale Semiconductor nor the
- *   names of its contributors may be used to endorse or promote products
- *   derived from this software without specific prior written permission.
- *
- *
- * ALTERNATIVELY, this software may be distributed under the terms of the
- * GNU General Public License ("GPL") as published by the Free Software
- * Foundation, either version 2 of that License or (at your option) any
- * later version.
- *
- * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
- * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
- * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
- * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
- * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
- * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
- * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
- * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-/include/ "fsl/p1023si-pre.dtsi"
-
-/ {
-   model = "fsl,P1023";
-   compatible = "fsl,P1023RDS";
-   #address-cells = <2>;
-   #size-cells = <2>;
-   interrupt-parent = <&mpic>;
-
-   memory {
-   device_type = "memory";
-   };
-
-   soc: soc@ff60 {
-   ranges = <0x0 0x0 0xff60 0x20>;
-
-   i2c@3000 {
-   rtc@68 {
-   compatible = "dallas,ds1374";
-   reg = <0x68>;
-   };
-   };
-
-   spi@7000 {
-   fsl_dataflash@0 {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   compatible = "atmel,at45db081d";
-   reg = <0>;
-   spi-max-frequency = <4000>; /* input clock 
*/
-   partition@u-boot {
-   /* 512KB for u-boot Bootloader Image */
-   label = "u-boot-spi";
-   reg = <0x 0x0008>;
-   read-only;
-   };
-   partition@dtb {
-   /* 512KB for DTB Image */
-   label = "dtb-spi";
-   reg = <0x0008 0x0008>;
-   read-only;
-   };
-   };
-   };
-
-   usb@22000 {
-   dr_mode = "host";
-   phy_type = "ulpi";
-   };
-   };
-
-   lbc: localbus@ff605000 {
-   reg = <0 0xff605000 0 0x1000>;
-
-   /* NOR Flash, BCSR */
-   ranges = <0x0 0x0 0x0 0xee00 0x0200
- 0x1 0x0 0x0 0xe000 0x8000>;
-
-   nor@0,0 {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   compatible = "cfi-fl

Re: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support

2014-05-05 Thread Scott Wood
On Mon, 2014-05-05 at 13:23 -0500, Lijun Pan wrote:
> P1023RDS is no longer supported/manufactured by Freescale while P1023RDB is.
> 
> Signed-off-by: Lijun Pan 
> ---
>  arch/powerpc/boot/dts/p1023rds.dts | 219 
> -
>  arch/powerpc/configs/mpc85xx_defconfig |   1 -
>  arch/powerpc/configs/mpc85xx_smp_defconfig |   1 -
>  arch/powerpc/platforms/85xx/Kconfig|   6 +-
>  arch/powerpc/platforms/85xx/Makefile   |   2 +-
>  .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}|  36 +---
>  6 files changed, 10 insertions(+), 255 deletions(-)
>  delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts
>  rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%)

What changed from v1?

If you want this patch merged, please respond to the comments on v1.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support

2014-05-05 Thread Lijun Pan


> -Original Message-
> From: Wood Scott-B07421
> Sent: Monday, May 05, 2014 2:05 PM
> To: Pan Lijun-B44306
> Cc: linuxppc-...@ozlabs.org; Medve Emilian-EMMEDVE1
> Subject: Re: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
> 
> On Mon, 2014-05-05 at 13:23 -0500, Lijun Pan wrote:
> > P1023RDS is no longer supported/manufactured by Freescale while
> P1023RDB is.
> >
> > Signed-off-by: Lijun Pan 
> > ---
> >  arch/powerpc/boot/dts/p1023rds.dts | 219 -
> 
> >  arch/powerpc/configs/mpc85xx_defconfig |   1 -
> >  arch/powerpc/configs/mpc85xx_smp_defconfig |   1 -
> >  arch/powerpc/platforms/85xx/Kconfig|   6 +-
> >  arch/powerpc/platforms/85xx/Makefile   |   2 +-
> >  .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}|  36 +---
> >  6 files changed, 10 insertions(+), 255 deletions(-)
> >  delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts
> >  rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%)
> 
> What changed from v1?

"Please wrap changelogs at no more than 75 columns."


> If you want this patch merged, please respond to the comments on v1.
> 
> -Scott
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)

2014-05-05 Thread Scott Wood
On Mon, 2014-05-05 at 18:58 +0300, Diana Craciun wrote:
> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt 
> b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> index 922c30a..09dbc5f 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> @@ -20,3 +20,11 @@ PROPERTIES
>   a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
>   name with all uppercase letters converted to lowercase, indicates that
>   the category is supported by the implementation.
> +
> + - fsl,portid-mapping : 
> + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
> Mapping
> + registers which are part of the CoreNet Coherency fabric (CCF) provide a
> + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping 
> functions.
> + Certain bits from these registers should be set if the coresponding CPU
> + should be snooped. This property defines a bitmask which selects the 
> bit that
> + should be set if this cpu should be snooped.

Please follow existing formatting in this file.

> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt 
> b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> index 1f5e329..827c637 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> @@ -26,6 +26,13 @@ Required properties:
> A standard property.
>  - #size-cells: 
> A standard property.
> +- fsl,portid-mapping : 
> + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
> Mapping
> + registers which are part of the CoreNet Coherency fabric (CCF) provide a
> + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping 
> functions.
> + Certain bits from these registers should be set if PAMUs should be 
> snooped.
> + This property defines a bitmask which selects the bits that should be 
> set
> + if PAMUs should be snooped.

This can't be a required property since existing trees don't have it --
in addition to allowing for the possibility of a PAMU where the snoop ID
is not known or where the snoop domain mechanism does not exist.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Christian Zigotzky

Am 05.05.14 16:57, schrieb Olof Johansson:

[Now without HTML email -- it's what you get for cc:ing me at work
instead of my upstream email :)]

2014-05-05 7:43 GMT-07:00 Alexander Graf :

On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not





ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?



Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.


Yes, and I'm asking whether we know that this statement holds true for PA6T and 
G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
developed by IBM, I'd assume its semantics here are similar to POWER4, but for 
PA6T I wouldn't be so sure.


Thanks for looking out for us, obviously IBM doesn't (based on the
reply a minute ago).

In the end, since there's been no work to enable KVM on PA6T, I'm not
too worried. I guess it's one more thing to sort out (and check for)
whenever someone does that.

I definitely don't have cycles to deal with that myself at this time.
I can help find hardware for someone who wants to, but even then I'm
guessing the interest is pretty limited.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Just for info: "PR" KVM works great on my PA6T machine. I booted the 
Lubuntu 14.04 PowerPC live DVD on a QEMU virtual machine with "PR" KVM 
successfully. But Mac OS X Jaguar, Panther, and Tiger don't boot with 
KVM on Mac-on-Linux and QEMU. See 
http://forum.hyperion-entertainment.biz/viewtopic.php?f=35&t=1747.


-- Christian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Olof Johansson
2014-05-05 8:03 GMT-07:00 Aneesh Kumar K.V :
> Olof Johansson  writes:
>
>> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>>
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>>
 Alexander Graf  writes:

  On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>
>> Although it's optional IBM POWER cpus always had DAR value set on
>> alignment interrupt. So don't try to compute these values.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>> Changes from V3:
>> * Use make_dsisr instead of checking feature flag to decide whether to
>> use
>> saved dsisr or not
>>
>>  

 ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>{
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +   return vcpu->arch.fault_dar;
>>
> How about PA6T and G5s?
>
>
>  Paul mentioned that BOOK3S always had DAR value set on alignment
 interrupt. And the patch is to enable/collect correct DAR value when
 running with Little Endian PR guest. Now to limit the impact and to
 enable Little Endian PR guest, I ended up doing the conditional code
 only for book3s 64 for which we know for sure that we set DAR value.

>>>
>>> Yes, and I'm asking whether we know that this statement holds true for
>>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at
>>> least developed by IBM, I'd assume its semantics here are similar to
>>> POWER4, but for PA6T I wouldn't be so sure.
>>>
>>>
>> Thanks for looking out for us, obviously IBM doesn't (based on the reply a
>> minute ago).
>
> The reason I deferred the question to Paul is really because I don't
> know enough about PA6T and G5 to comment. I intentionally restricted the
> changes to BOOK3S_64 because I wanted to make sure I don't break
> anything else. It is in no way to hint that others don't care.

Ah, I see -- the disconnect is that you don't think PA6T and 970 are
64-bit book3s CPUs. They are.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc: Fix comment around arch specific definition of RECLAIM_DISTANCE

2014-05-05 Thread Motohiro Kosaki
> -Original Message-
> From: Preeti U Murthy [mailto:pre...@linux.vnet.ibm.com]
> Sent: Monday, May 05, 2014 1:17 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: b...@kernel.crashing.org; an...@samba.org; Motohiro Kosaki JP
> Subject: [PATCH] powerpc: Fix comment around arch specific definition of 
> RECLAIM_DISTANCE
> 
> Commit 32e45ff43eaf5c17f changed the default value of RECLAIM_DISTANCE to 30. 
> However the comment around arch specifc
> definition of RECLAIM_DISTANCE is not updated to reflect the same. Correct 
> the value mentioned in the comment.
> 
> Signed-off-by: Preeti U Murthy 
> Cc: Anton Blanchard 
> Cc: Benjamin Herrenschmidt 
> Cc: KOSAKI Motohiro 

Acked-by: KOSAKI Motohiro 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan

2014-05-05 Thread Scott Wood
On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 04/21/2014 05:11 PM, Scott Wood wrote:
> > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
> >> +fman@40 {
> >> +  mdio@f1000 {
> >> +  #address-cells = <1>;
> >> +  #size-cells = <0>;
> >> +  compatible = "fsl,fman-xmdio";
> >> +  reg = <0xf1000 0x1000>;
> >> +  };
> >> +};
> > 
> > I'd like to see a complete fman binding before we start adding pieces.
> 
> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years
> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted
> without a binding writeup.

Pushing driver code through the netdev tree does not establish device
tree ABI.  Binding documents and dts files do.

> This patch series should probably include a
> binding blurb. However, let's not gate this patchset on a complete
> binding for the FMan

I at least want to see enough of the FMan binding to have confidence
that what we're adding now is correct.

> As you know we don't own the FMan work and the FMan work is... not ready
> for upstreaming.

I'm not asking for a driver, just a binding that describes hardware.  Is
there any reason why the fman node needs to be anywhere near as
complicated as it is in the SDK, if we're limiting it to actual hardware
description?  Do we really need to have nodes for all the sub-blocks?

> In an attempt to make some sort of progress we've
> decided to upstream the pieces that are less controversial and MDIO is
> an obvious candidate
> 
> >> +fman@40 {
> >> +  mdio0: mdio@e1120 {
> >> +  #address-cells = <1>;
> >> +  #size-cells = <0>;
> >> +  compatible = "fsl,fman-mdio";
> >> +  reg = <0xe1120 0xee0>;
> >> +  };
> >> +};
> > 
> > What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"?  I
> > don't see the latter on the list of compatibles in patch 3/6.
> 
> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is
> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi
> 

"respin this patch wi..."?

> I believe 'fsl,fman-mdio' (and others on that list) was added
> gratuitously as the FMan MDIO is completely compatible with the
> eTSEC/gianfar MDIO driver, but we can deal with that later

It's still good to identify the specific device, even if it's believed
to be 100% compatible.  Plus, IIRC there's been enough badness in the
eTSEC MDIO binding that it'd be good to steer clear of it.

> > Within each category, is the exact fman version discoverable from the
> > mdio registers?
> 
> No, but that's irrelevant as that's not the difference between the two
> compatibles

It's relevant because it means the compatible string should have a block
version number in it, or at least some other way in the MDIO node to
indicate the block version.

> >> +fman@50 {
> >> +  #address-cells = <1>;
> >> +  #size-cells = <1>;
> >> +  compatible = "simple-bus";
> > 
> > Why is this simple-bus?
> 
> Because that's the translation type for the FMan sub-nodes.

What do you mean by "translation type"?

> We need it now to get the MDIO nodes probed

No.  "simple-bus" is stating an attribute of the hardware, that the
child nodes represent simple memory-mapped devices that can be used
without special bus knowledge.  I don't think that applies here.

You can get the MDIO node probed without misusing simple-bus by adding
the fman node's compatible to the probe list in the kernel code.

This sort of thing is why I want to see what the rest of the fman
binding will look like.

>  and we'll needed later to probe other nodes/devices that will have
> standalone drivers: MAC, MURAM. etc. 

How are they truly standalone?  The exist in service to the greater
entity that is fman.  They presumably work together in some fashion.

> >> +  /* mdio nodes for fman v3 @ 0x50 */
> >> +  mdio@fc000 {
> >> +  #address-cells = <1>;
> >> +  #size-cells = <0>;
> >> +  reg = <0xfc000 0x1000>;
> >> +  };
> >> +
> >> +  mdio@fd000 {
> >> +  #address-cells = <1>;
> >> +  #size-cells = <0>;
> >> +  reg = <0xfd000 0x1000>;
> >> +  };
> >> +};
> > 
> > Where's the compatible?  Why is this file different from all the others?
> 
> The FMan v3 MDIO block (supports both Clause 22/45) is compatible with
> the FMan v2 10 Gb/s MDIO (the xgmac-mdio driver). However, the driver
> needs a small clean-up patch (still in internal review) that will get it
> working for FMan v3 MDIO.

This suggests that it is not 100% backwards compatible.

>  With that patch will add the compatible to these nodes. However, we
> need these nodes now for the board level MDIO bus muxing support
> (included in this patchset)

If you need these nodes now then add the compatible property now.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)

2014-05-05 Thread Scott Wood
On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 04/21/2014 05:14 PM, Scott Wood wrote:
> > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
> >> FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs.
> >> Add support for the internal SerDes TBI PHYs
> >>
> >> Based on prior work by Andy Fleming 
> >>
> >> Signed-off-by: Shruti Kanetkar 
> >> ---
> >>  arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  28 +
> >>  arch/powerpc/boot/dts/fsl/b4si-post.dtsi|  51 +
> >>  arch/powerpc/boot/dts/fsl/p1023si-post.dtsi |  14 +++
> >>  arch/powerpc/boot/dts/fsl/p2041si-post.dtsi |  64 
> >>  arch/powerpc/boot/dts/fsl/p3041si-post.dtsi |  64 
> >>  arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++
> >>  arch/powerpc/boot/dts/fsl/p5020si-post.dtsi |  64 
> >>  arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++
> >>  arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 
> >> 
> >>  9 files changed, 671 insertions(+)
> >>
> >> diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
> >> b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> index cbc354b..45b0ff5 100644
> >> --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> @@ -172,6 +172,34 @@
> >>compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0";
> >>};
> >>  
> >> +/include/ "qoriq-fman3-0-1g-4.dtsi"
> >> +/include/ "qoriq-fman3-0-1g-5.dtsi"
> >> +/include/ "qoriq-fman3-0-10g-0.dtsi"
> >> +/include/ "qoriq-fman3-0-10g-1.dtsi"
> >> +  fman@40 {
> >> +  ethernet@e8000 {
> >> +  tbi-handle = <&tbi4>;
> >> +  };
> > 
> > Binding needed
> > 
> > Where is the "reg" for these unit addresses?
> 
> As I said, the bulk of the FMan work comes from another team. Here we
> need just enough to hook up the MDIO and PHY nodes.

Unit addresses must match reg.  No reg, no unit address.

> I'd really like to be able to make progress on this without waiting for that 
> moment in time
> we can get the entire FMan binding in place

Why is the fman binding such a big deal?

> >> +  mdio@e9000 {
> >> +  tbi4: tbi-phy@8 {
> >> +  reg = <0x8>;
> >> +  device_type = "tbi-phy";
> >> +  };
> >> +  };
> > 
> > Binding needed for tbi-phy device_type
> 
> I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before
> without a binding)

It's existing practice on eTSEC.  FMan seemed like an opportunity to
avoid carrying cruft forward.

> > Why are we using device_type at all for this?
> 
> That's what the upstream driver is looking for.

Drivers should look for what the binding says -- not the other way
around.

>  Anyway, most days PHYs can be discovered so they don't use/need
> compatible properties. That's I guess part of the reason we don't have
> bindings for them PHY nodes

I don't see why there couldn't be a compatible that describes the
standard programming interface.

> However, what you can't discover is how they are wired to the MAC(s) so
> we still need some nodes in the device tree to convey that. Also, when
> looking for a specific kind of PHY, such as TBI, device_type works
> easier then parsing compatibles from various vendors or so

Don't you find the TBI by following the tbi-handle property?  That said,
I don't object to having a way to label a PHY as attached via TBI if
that's useful.  I'm giving a mild, non-nacking (given the history)
objection to using device_type for that (given other history).

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Benjamin Herrenschmidt
On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote:
> 
> Paul mentioned that BOOK3S always had DAR value set on alignment
> interrupt. And the patch is to enable/collect correct DAR value when
> running with Little Endian PR guest. Now to limit the impact and to
> enable Little Endian PR guest, I ended up doing the conditional code
> only for book3s 64 for which we know for sure that we set DAR value.

Only BookS ? Afaik, the kernel align.c unconditionally uses DAR on
every processor type. It's DSISR that may or may not be populated
but afaik DAR always is.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Benjamin Herrenschmidt
On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

Possibly but the fact remains, this can be avoided by making sure that
if we create a CMA reserve for KVM, then it uses it rather than using
the rest of main memory for hash tables.

> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

The point is that we explicitly reserve those pages in CMA for use
by KVM for that specific purpose, but the current code tries first
to get them out of the normal pool.

This is not an optimal behaviour and is what Aneesh patches are
trying to fix.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Benjamin Herrenschmidt
On Mon, 2014-05-05 at 16:43 +0200, Alexander Graf wrote:
> > Paul mentioned that BOOK3S always had DAR value set on alignment
> > interrupt. And the patch is to enable/collect correct DAR value when
> > running with Little Endian PR guest. Now to limit the impact and to
> > enable Little Endian PR guest, I ended up doing the conditional code
> > only for book3s 64 for which we know for sure that we set DAR value.
> 
> Yes, and I'm asking whether we know that this statement holds true for 
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
> at least developed by IBM, I'd assume its semantics here are similar to 
> POWER4, but for PA6T I wouldn't be so sure.

I am not aware of any PowerPC processor that does not set DAR on
alignment interrupts. Paul, are you ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Paul Mackerras
On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> >+#ifdef CONFIG_PPC_BOOK3S_64
> >+return vcpu->arch.fault_dar;
> 
> How about PA6T and G5s?

G5 sets DAR on an alignment interrupt.

As for PA6T, I don't know for sure, but if it doesn't, ordinary
alignment interrupts wouldn't be handled properly, since the code in
arch/powerpc/kernel/align.c assumes DAR contains the address being
accessed on all PowerPC CPUs.

Did PA Semi ever publish a user manual for the PA6T, I wonder?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/fsl-booke64: Set vmemmap_psize to 4K

2014-05-05 Thread Scott Wood
The only way Freescale booke chips support mappings larger than 4K
is via TLB1.  The only way we support (direct) TLB1 entries is via
hugetlb, which is not what map_kernel_page() does when given a large
page size.

Without this, a kernel with CONFIG_SPARSEMEM_VMEMMAP enabled crashes on
boot with messages such as:

PID hash table entries: 4096 (order: 3, 32768 bytes)
Sorting __ex_table...
BUG: Bad page state in process swapper  pfn:00a2f
page:84023a48 count:0 mapcount:0 mapping:04ffce48 
index:0x4ffbe50
page flags: 
0x4ffda40(active|arch_1|private|private_2|head|tail|swapcache|mappedtodisk|reclaim|swapbacked|unevictable|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
page flags: 0x311840(active|private|private_2|swapcache|unevictable|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-3-g7fa250c #299
Call Trace:
[c098ba20] [c0008b3c] .show_stack+0x7c/0x1cc (unreliable)
[c098baf0] [c060aa50] .dump_stack+0x88/0xb4
[c098bb70] [c00c0468] .bad_page+0x144/0x1a0
[c098bc10] [c00c0628] .free_pages_prepare+0x164/0x17c
[c098bcc0] [c00c24cc] .free_hot_cold_page+0x48/0x214
[c098bd60] [c086c318] .free_all_bootmem+0x1fc/0x354
[c098be70] [c085da84] .mem_init+0xac/0xdc
[c098bef0] [c08547b0] .start_kernel+0x21c/0x4d4
[c098bf90] [c448] .start_here_common+0x20/0x58

Signed-off-by: Scott Wood 
---
 arch/powerpc/mm/tlb_nohash.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index ae3d5b7..92cb18d 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -596,8 +596,13 @@ static void __early_init_mmu(int boot_cpu)
/* XXX This should be decided at runtime based on supported
 * page sizes in the TLB, but for now let's assume 16M is
 * always there and a good fit (which it probably is)
+*
+* Freescale booke only supports 4K pages in TLB0, so use that.
 */
-   mmu_vmemmap_psize = MMU_PAGE_16M;
+   if (mmu_has_feature(MMU_FTR_TYPE_FSL_E))
+   mmu_vmemmap_psize = MMU_PAGE_4K;
+   else
+   mmu_vmemmap_psize = MMU_PAGE_16M;
 
/* XXX This code only checks for TLB 0 capabilities and doesn't
 * check what page size combos are supported by the HW. It
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4] powerpc/fsl: Add binding for Freescale CCF

2014-05-05 Thread Scott Wood
From: Diana Craciun 

The CoreNet coherency fabric is a fabric-oriented, conectivity
infrastructure that enables the implementation of coherent, multicore
systems. The CCF acts as a central interconnect for cores,
platform-level caches, memory subsystem, peripheral devices and I/O host
bridges in the system.

Signed-off-by: Diana Craciun 
[scottw...@freescale.com: formatting and minor changes]
Signed-off-by: Scott Wood 
---
v4: Fixed various formatting issues, minor edits for clarity, and
made fsl,portid-mapping an optional property.

 .../devicetree/bindings/powerpc/fsl/ccf.txt| 46 ++
 .../devicetree/bindings/powerpc/fsl/cpus.txt   | 11 ++
 .../devicetree/bindings/powerpc/fsl/pamu.txt   | 10 +
 3 files changed, 67 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
new file mode 100644
index 000..454da7e
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
@@ -0,0 +1,46 @@
+Freescale CoreNet Coherency Fabric(CCF) Device Tree Binding
+
+DESCRIPTION
+
+The CoreNet coherency fabric is a fabric-oriented, connectivity infrastructure
+that enables the implementation of coherent, multicore systems.
+
+Required properties:
+
+- compatible: 
+   fsl,corenet1-cf - CoreNet coherency fabric version 1.
+   Example chips: T4240, B4860
+
+   fsl,corenet2-cf - CoreNet coherency fabric version 2.
+   Example chips: P5040, P5020, P4080, P3041, P2041
+
+   fsl,corenet-cf - Used to represent the common registers
+   between CCF version 1 and CCF version 2.  This compatible
+   is retained for compatibility reasons, as it was already
+   used for both CCF version 1 chips and CCF version 2
+   chips.  It should be specified after either
+   "fsl,corenet1-cf" or "fsl,corenet2-cf".
+
+- reg: 
+   A standard property. Represents the CCF registers.
+
+- interrupts: 
+   Interrupt mapping for CCF error interrupt.
+
+- fsl,ccf-num-csdids: 
+   Specifies the number of Coherency Subdomain ID Port Mapping
+   Registers that are supported by the CCF.
+
+- fsl,ccf-num-snoopids: 
+   Specifies the number of Snoop ID Port Mapping Registers that
+   are supported by CCF.
+
+Example:
+
+   corenet-cf@18000 {
+   compatible = "fsl,corenet2-cf", "fsl,corenet-cf";
+   reg = <0x18000 0x1000>;
+   interrupts = <16 2 1 31>;
+   fsl,ccf-num-csdids = <32>;
+   fsl,ccf-num-snoopids = <32>;
+   };
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
index 922c30a..f8cd239 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
@@ -20,3 +20,14 @@ PROPERTIES
a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
name with all uppercase letters converted to lowercase, indicates that
the category is supported by the implementation.
+
+- fsl,portid-mapping
+   Usage: optional
+   Value type: 
+   Definition: The Coherency Subdomain ID Port Mapping Registers and
+   Snoop ID Port Mapping registers, which are part of the CoreNet
+   Coherency fabric (CCF), provide a CoreNet Coherency Subdomain
+   ID/CoreNet Snoop ID to cpu mapping functions.  Certain bits from
+   these registers should be set if the coresponding CPU should be
+   snooped.  This property defines a bitmask which selects the bit
+   that should be set if this cpu should be snooped.
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
index 1f5e329..c2b2899 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
@@ -34,6 +34,15 @@ Optional properties:
  for legacy drivers.
 - interrupt-parent : 
  Phandle to interrupt controller
+- fsl,portid-mapping : 
+ The Coherency Subdomain ID Port Mapping Registers and
+ Snoop ID Port Mapping registers, which are part of the
+ CoreNet Coherency fabric (CCF), provide a CoreNet
+ Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping
+ functions.  Certain bits from these registers should be
+ set if PAMUs should be snooped.  This property defines
+ a bitmask which selects the bits that should be set if
+ PAMUs should be snooped.
 
 Child nodes:
 
@@ -88,6 +97,7 @@ Example:
compatible = "fsl,pamu-v1.0", "fsl,pamu";
reg =

[PATCH] powerpc/fsl: Add fsl,portid-mapping to corenet1-cf chips

2014-05-05 Thread Scott Wood
Signed-off-by: Scott Wood 
Cc: Diana Craciun 
---
 arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi  | 4 
 arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi  | 4 
 arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi  | 8 
 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi  | 2 ++
 arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi  | 4 
 10 files changed, 27 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
index b5daa4c..5290df8 100644
--- a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
@@ -262,6 +262,7 @@
interrupts = <
24 2 0 0
16 2 1 30>;
+   fsl,portid-mapping = <0x0f00>;
 
pamu0: pamu@0 {
reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
index 22f3b14..b1ea147 100644
--- a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
@@ -83,6 +83,7 @@
reg = <0>;
clocks = <&mux0>;
next-level-cache = <&L2_0>;
+   fsl,portid-mapping = <0x8000>;
L2_0: l2-cache {
next-level-cache = <&cpc>;
};
@@ -92,6 +93,7 @@
reg = <1>;
clocks = <&mux1>;
next-level-cache = <&L2_1>;
+   fsl,portid-mapping = <0x4000>;
L2_1: l2-cache {
next-level-cache = <&cpc>;
};
@@ -101,6 +103,7 @@
reg = <2>;
clocks = <&mux2>;
next-level-cache = <&L2_2>;
+   fsl,portid-mapping = <0x2000>;
L2_2: l2-cache {
next-level-cache = <&cpc>;
};
@@ -110,6 +113,7 @@
reg = <3>;
clocks = <&mux3>;
next-level-cache = <&L2_3>;
+   fsl,portid-mapping = <0x1000>;
L2_3: l2-cache {
next-level-cache = <&cpc>;
};
diff --git a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
index 5abd1fc..cd63cb1 100644
--- a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
@@ -289,6 +289,7 @@
interrupts = <
24 2 0 0
16 2 1 30>;
+   fsl,portid-mapping = <0x0f00>;
 
pamu0: pamu@0 {
reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
index 468e8be..dc5f4b3 100644
--- a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
@@ -84,6 +84,7 @@
reg = <0>;
clocks = <&mux0>;
next-level-cache = <&L2_0>;
+   fsl,portid-mapping = <0x8000>;
L2_0: l2-cache {
next-level-cache = <&cpc>;
};
@@ -93,6 +94,7 @@
reg = <1>;
clocks = <&mux1>;
next-level-cache = <&L2_1>;
+   fsl,portid-mapping = <0x4000>;
L2_1: l2-cache {
next-level-cache = <&cpc>;
};
@@ -102,6 +104,7 @@
reg = <2>;
clocks = <&mux2>;
next-level-cache = <&L2_2>;
+   fsl,portid-mapping = <0x2000>;
L2_2: l2-cache {
next-level-cache = <&cpc>;
};
@@ -111,6 +114,7 @@
reg = <3>;
clocks = <&mux3>;
next-level-cache = <&L2_3>;
+   fsl,portid-mapping = <0x1000>;
L2_3: l2-cache {
next-level-cache = <&cpc>;
};
diff --git a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
index bf0e7c9..12947cc 100644
--- a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
@@ -297,6 +297,7 @@
interrupts = <
  

Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)

2014-05-05 Thread Kumar Gala

On May 5, 2014, at 10:58 AM, Diana Craciun  wrote:

> From: Diana Craciun 
> 
> The CoreNet coherency fabric is a fabric-oriented, conectivity
> infrastructure that enables the implementation of coherent, multicore
> systems. The CCF acts as a central interconnect for cores,
> platform-level caches, memory subsystem, peripheral devices and I/O host
> bridges in the system.
> 
> Signed-off-by: Diana Craciun 
> ---
> v3:
>   - added port ID mapping
>   - removed fsl,corenetx-cf
> 
> .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 ++
> .../devicetree/bindings/powerpc/fsl/cpus.txt   |  8 +
> .../devicetree/bindings/powerpc/fsl/pamu.txt   |  8 +
> 3 files changed, 58 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt

[snip]

> --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> @@ -20,3 +20,11 @@ PROPERTIES
>   a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
>   name with all uppercase letters converted to lowercase, indicates that
>   the category is supported by the implementation.
> +
> + - fsl,portid-mapping : 
> + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
> Mapping
> + registers which are part of the CoreNet Coherency fabric (CCF) provide a
> + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping 
> functions.
> + Certain bits from these registers should be set if the coresponding CPU
> + should be snooped. This property defines a bitmask which selects the 
> bit that
> + should be set if this cpu should be snooped.

Under what cases can software not figure out how to set this based on the PAMUs 
in the DT?

> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt 
> b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> index 1f5e329..827c637 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> @@ -26,6 +26,13 @@ Required properties:
> A standard property.
> - #size-cells : 
> A standard property.
> +- fsl,portid-mapping : 
> + The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
> Mapping
> + registers which are part of the CoreNet Coherency fabric (CCF) provide a
> + CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping 
> functions.
> + Certain bits from these registers should be set if PAMUs should be 
> snooped.
> + This property defines a bitmask which selects the bits that should be 
> set
> + if PAMUs should be snooped.
> 
> Optional properties:
> - reg : 
> @@ -88,6 +95,7 @@ Example:
>   compatible = "fsl,pamu-v1.0", "fsl,pamu";
>   reg = <0x2 0x5000>;
>   ranges = <0 0x2 0x5000>;
> + fsl,portid-mapping = <0xf8>;
>   #address-cells = <1>;
>   #size-cells = <1>;
>   interrupts = <
> -- 
> 1.7.11.7
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)

2014-05-05 Thread Scott Wood
On Mon, 2014-05-05 at 21:12 -0500, Kumar Gala wrote:
> On May 5, 2014, at 10:58 AM, Diana Craciun  
> wrote:
> 
> > From: Diana Craciun 
> > 
> > The CoreNet coherency fabric is a fabric-oriented, conectivity
> > infrastructure that enables the implementation of coherent, multicore
> > systems. The CCF acts as a central interconnect for cores,
> > platform-level caches, memory subsystem, peripheral devices and I/O host
> > bridges in the system.
> > 
> > Signed-off-by: Diana Craciun 
> > ---
> > v3:
> > - added port ID mapping
> > - removed fsl,corenetx-cf
> > 
> > .../devicetree/bindings/powerpc/fsl/ccf.txt| 42 
> > ++
> > .../devicetree/bindings/powerpc/fsl/cpus.txt   |  8 +
> > .../devicetree/bindings/powerpc/fsl/pamu.txt   |  8 +
> > 3 files changed, 58 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
> 
> [snip]
> 
> > --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> > @@ -20,3 +20,11 @@ PROPERTIES
> > a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
> > name with all uppercase letters converted to lowercase, indicates that
> > the category is supported by the implementation.
> > +
> > +   - fsl,portid-mapping : 
> > +   The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port 
> > Mapping
> > +   registers which are part of the CoreNet Coherency fabric (CCF) provide a
> > +   CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping 
> > functions.
> > +   Certain bits from these registers should be set if the coresponding CPU
> > +   should be snooped. This property defines a bitmask which selects the 
> > bit that
> > +   should be set if this cpu should be snooped.
> 
> Under what cases can software not figure out how to set this based on the 
> PAMUs in the DT?

How would it go about doing that?

Besides the difference between corenet1-cf and corenet2-cf, on
corenet1-cf the position of the PAMU bits depends on the number of CPUs
that the chip was designed for.  This may be different from the number
of CPUs that are actually present (e.g. p4040, or AMP).  It's also a
complication that IMHO is asking for trouble, versus straightforwardly
recording information that is present in a table in the manual.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Paul Mackerras
On Mon, May 05, 2014 at 08:17:00PM +0530, Aneesh Kumar K.V wrote:
> Alexander Graf  writes:
> 
> > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
> >> Signed-off-by: Aneesh Kumar K.V 
> >
> > No patch description, no proper explanations anywhere why you're doing 
> > what. All of that in a pretty sensitive piece of code. There's no way 
> > this patch can go upstream in its current form.
> >
> 
> Sorry about being vague. Will add a better commit message. The goal is
> to export MPSS support to guest if the host support the same. MPSS
> support is exported via penc encoding in "ibm,segment-page-sizes". The
> actual format can be found at htab_dt_scan_page_sizes. When the guest
> memory is backed by hugetlbfs we expose the penc encoding the host
> support to guest via kvmppc_add_seg_page_size. 

In a case like this it's good to assume the reader doesn't know very
much about Power CPUs, and probably isn't familiar with acronyms such
as MPSS.  The patch needs an introductory paragraph explaining that on
recent IBM Power CPUs, while the hashed page table is looked up using
the page size from the segmentation hardware (i.e. the SLB), it is
possible to have the HPT entry indicate a larger page size.  Thus for
example it is possible to put a 16MB page in a 64kB segment, but since
the hash lookup is done using a 64kB page size, it may be necessary to
put multiple entries in the HPT for a single 16MB page.  This
capability is called mixed page-size segment (MPSS).  With MPSS,
there are two relevant page sizes: the base page size, which is the
size used in searching the HPT, and the actual page size, which is the
size indicated in the HPT entry.  Note that the actual page size is
always >= base page size.

> Now the challenge to THP support is to make sure that our henter,
> hremove etc decode base page size and actual page size correctly
> from the hash table entry values. Most of the changes is to do that.
> Rest of the stuff is already handled by kvm. 
> 
> NOTE: It is much easier to read the code after applying the patch rather
> than reading the diff. I have added comments around each steps in the
> code.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Gavin Shan
On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:
>On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
>> On 05/05/2014 03:27 AM, Gavin Shan wrote:
>> > The series of patches intends to support EEH for PCI devices, which have 
>> > been
>> > passed through to PowerKVM based guest via VFIO. The implementation is
>> > straightforward based on the issues or problems we have to resolve to 
>> > support
>> > EEH for PowerKVM based guest.
>> >
>> > - Emulation for EEH RTAS requests. Thanksfully, we already have 
>> > infrastructure
>> >to emulate XICS. Without introducing new mechanism, we just extend that
>> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
>> >initiated from guest are posted to host where the requests get handled 
>> > or
>> >delivered to underly firmware for further handling. For that, the host 
>> > kerenl
>> >has to maintain the PCI address (host domain/bus/slot/function to 
>> > guest's
>> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address 
>> > mapping
>> >will be built when initializing VFIO device in QEMU and destroied when 
>> > the
>> >VFIO device in QEMU is going to offline, or VM is destroy.
>> 
>> Do you also expose all those interfaces to user space? VFIO is as much 
>> about user space device drivers as it is about device assignment.
>> 

Yep, all the interfaces are exported to user space. 

>> I would like to first see an implementation that doesn't touch KVM 
>> emulation code at all but instead routes everything through QEMU. As a 
>> second step we can then accelerate performance critical paths inside of KVM.
>> 

Ok. I'll change the implementation. However, the QEMU still has to
poll/push information from/to host kerenl. So the best place for that
would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.

For the error injection, I guess I have to put the logic token management
into QEMU and error injection request will be handled by QEMU and then
routed to host kernel via additional syscall as we did for pSeries.

>> That way we ensure that user space device drivers have all the power 
>> over a device they need to drive it.
>
>+1
>

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan

2014-05-05 Thread Emil Medve
Hello Scott,


On 05/05/2014 06:25 PM, Scott Wood wrote:
> On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 04/21/2014 05:11 PM, Scott Wood wrote:
>>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
 +fman@40 {
 +  mdio@f1000 {
 +  #address-cells = <1>;
 +  #size-cells = <0>;
 +  compatible = "fsl,fman-xmdio";
 +  reg = <0xf1000 0x1000>;
 +  };
 +};
>>>
>>> I'd like to see a complete fman binding before we start adding pieces.
>>
>> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years
>> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted
>> without a binding writeup.
> 
> Pushing driver code through the netdev tree does not establish device
> tree ABI.  Binding documents and dts files do.

Sure, ideally and formally. But upstreaming a driver represents, if
nothing else, a statement of intent to observe a device tree ABI. Via
the SDK, FSL customers are using the device tree ABI the driver de facto
establishes. I guess a driver that makes it upstream can establish an
device tree ABI

We'll re-spin adding the binding document

>> This patch series should probably include a
>> binding blurb. However, let's not gate this patchset on a complete
>> binding for the FMan
> 
> I at least want to see enough of the FMan binding to have confidence
> that what we're adding now is correct.

I'm not sure what you're looking for. The nodes we're adding are
describing a very common CCSR space interface for quite common device blocks

>> As you know we don't own the FMan work and the FMan work is... not ready
>> for upstreaming.
> 
> I'm not asking for a driver, just a binding that describes hardware.  Is
> there any reason why the fman node needs to be anywhere near as
> complicated as it is in the SDK, if we're limiting it to actual hardware
> description?

Is this a trick question? :-) Of course it doesn't need to be more
complicated than actual hardware. But, to repeat myself, said
description is not... ready and I don't know when it will be. Somebody
else owns pushing the bulk of FMan upstream and I'd rather not step on
their turf quite like this

> Do we really need to have nodes for all the sub-blocks?

Definitely no, and internally I'm pushing to clean that up. However, you
surely remember we've been pushing from the early days of P4080 and it's
been, to put it optimistically, slow

>> In an attempt to make some sort of progress we've
>> decided to upstream the pieces that are less controversial and MDIO is
>> an obvious candidate
>>
 +fman@40 {
 +  mdio0: mdio@e1120 {
 +  #address-cells = <1>;
 +  #size-cells = <0>;
 +  compatible = "fsl,fman-mdio";
 +  reg = <0xe1120 0xee0>;
 +  };
 +};
>>>
>>> What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"?  I
>>> don't see the latter on the list of compatibles in patch 3/6.
>>
>> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is
>> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi
>>
> 
> "respin this patch wi..."?

Not sure where the end of that sentence went. I meant we'll re-spin with
a binding for the 10 Gb/s MDIO block

>> I believe 'fsl,fman-mdio' (and others on that list) was added
>> gratuitously as the FMan MDIO is completely compatible with the
>> eTSEC/gianfar MDIO driver, but we can deal with that later
> 
> It's still good to identify the specific device, even if it's believed
> to be 100% compatible.

You suggesting we create new compatibles for every instance/integration
of a hardware block even though is identical with an earlier hardware
integration? Well, I guess that's been done that and now we have about 8
different compatibles that convey no real difference at all

> Plus, IIRC there's been enough badness in the
> eTSEC MDIO binding that it'd be good to steer clear of it.

Hmm... I guess we can leave things as they are. I wasn't going to touch
this just now anyway

>>> Within each category, is the exact fman version discoverable from the
>>> mdio registers?
>>
>> No, but that's irrelevant as that's not the difference between the two
>> compatibles
> 
> It's relevant because it means the compatible string should have a block
> version number in it, or at least some other way in the MDIO node to
> indicate the block version.

The 1 Gb/s MDIO block doesn't track a version of its own and from a
programming interface perspective it has no visible difference since
eTSEC. The 10 Gb/s MDIO doesn't track a version of its own either and
across the existing FMan versions is identical from a programming
interface perspective

I guess we can append a 'v1.0' to the MDIO compatible(s). However, given
the SDK we'll have to support the compatibles the (already upstream)
drivers support. Dealing with all that legacy is going to be so tedious

 +fman@50 {
 +  #address-cells = <1>;
 +  #size-cells 

Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)

2014-05-05 Thread Emil Medve
Hello Scott,


On 05/05/2014 06:34 PM, Scott Wood wrote:
> On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 04/21/2014 05:14 PM, Scott Wood wrote:
>>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
 FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs.
 Add support for the internal SerDes TBI PHYs

 Based on prior work by Andy Fleming 

 Signed-off-by: Shruti Kanetkar 
 ---
  arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  28 +
  arch/powerpc/boot/dts/fsl/b4si-post.dtsi|  51 +
  arch/powerpc/boot/dts/fsl/p1023si-post.dtsi |  14 +++
  arch/powerpc/boot/dts/fsl/p2041si-post.dtsi |  64 
  arch/powerpc/boot/dts/fsl/p3041si-post.dtsi |  64 
  arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++
  arch/powerpc/boot/dts/fsl/p5020si-post.dtsi |  64 
  arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++
  arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 
 
  9 files changed, 671 insertions(+)

 diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
 b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
 index cbc354b..45b0ff5 100644
 --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
 +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
 @@ -172,6 +172,34 @@
compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0";
};
  
 +/include/ "qoriq-fman3-0-1g-4.dtsi"
 +/include/ "qoriq-fman3-0-1g-5.dtsi"
 +/include/ "qoriq-fman3-0-10g-0.dtsi"
 +/include/ "qoriq-fman3-0-10g-1.dtsi"
 +  fman@40 {
 +  ethernet@e8000 {
 +  tbi-handle = <&tbi4>;
 +  };
>>>
>>> Binding needed
>>>
>>> Where is the "reg" for these unit addresses?
>>
>> As I said, the bulk of the FMan work comes from another team. Here we
>> need just enough to hook up the MDIO and PHY nodes.
> 
> Unit addresses must match reg.  No reg, no unit address.

We can add a 'reg' property, but we really don't want to clash with the
team that is working on upstreaming the FMan/MAC bindings and drivers

>> I'd really like to be able to make progress on this without waiting for that 
>> moment in time
>> we can get the entire FMan binding in place
> 
> Why is the fman binding such a big deal?
> 
 +  mdio@e9000 {
 +  tbi4: tbi-phy@8 {
 +  reg = <0x8>;
 +  device_type = "tbi-phy";
 +  };
 +  };
>>>
>>> Binding needed for tbi-phy device_type
>>
>> I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before
>> without a binding)
> 
> It's existing practice on eTSEC.  FMan seemed like an opportunity to
> avoid carrying cruft forward.

The 1 Gb/s MDIO block is not FMan specific. As I said is the same block
from eTSEC. That's part of the reason we're trying upstreaming this
independent of the FMan stuff. So, don't think FMan, think MDIO

>>> Why are we using device_type at all for this?
>>
>> That's what the upstream driver is looking for.
> 
> Drivers should look for what the binding says -- not the other way
> around.

Yeah yeah. Nobody likes it, but the driver is/describes the de facto binding

On a constructive note, the Ethernet PHY code doesn't do device tree
based probing so no compatibles are used at all. So device_type is used
to convey a TBI PHY

>>  Anyway, most days PHYs can be discovered so they don't use/need
>> compatible properties. That's I guess part of the reason we don't have
>> bindings for them PHY nodes
> 
> I don't see why there couldn't be a compatible that describes the
> standard programming interface.

Because it can be detected at runtime and I guess stuff like that should
stay out of the device tree. I'm using PCI as an analogy here

>> However, what you can't discover is how they are wired to the MAC(s) so
>> we still need some nodes in the device tree to convey that. Also, when
>> looking for a specific kind of PHY, such as TBI, device_type works
>> easier then parsing compatibles from various vendors or so
> 
> Don't you find the TBI by following the tbi-handle property?

When the MAC "attaches" to the PHY the tbi-handle is followed. But the
MDIO/PHY code/driver(s) doesn't quite "see" the tbi-handle as it's
outside the MDIO/PHY nodes

> That said,
> I don't object to having a way to label a PHY as attached via TBI if
> that's useful.  I'm giving a mild, non-nacking (given the history)
> objection to using device_type for that (given other history).

Personally, I think that TBI PHY support is a bit messy but I don't have
bandwidth to deal with that. The TBI PHY should be handled as a regular
PHY and right now is a special case


Cheers,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/l

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alexander Graf


On 06.05.14 06:26, Gavin Shan wrote:

On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:

On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:

On 05/05/2014 03:27 AM, Gavin Shan wrote:

The series of patches intends to support EEH for PCI devices, which have been
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to support
EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
to emulate XICS. Without introducing new mechanism, we just extend that
existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
initiated from guest are posted to host where the requests get handled or
delivered to underly firmware for further handling. For that, the host 
kerenl
has to maintain the PCI address (host domain/bus/slot/function to guest's
PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
will be built when initializing VFIO device in QEMU and destroied when the
VFIO device in QEMU is going to offline, or VM is destroy.

Do you also expose all those interfaces to user space? VFIO is as much
about user space device drivers as it is about device assignment.


Yep, all the interfaces are exported to user space.


I would like to first see an implementation that doesn't touch KVM
emulation code at all but instead routes everything through QEMU. As a
second step we can then accelerate performance critical paths inside of KVM.


Ok. I'll change the implementation. However, the QEMU still has to
poll/push information from/to host kerenl. So the best place for that
would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.

For the error injection, I guess I have to put the logic token management
into QEMU and error injection request will be handled by QEMU and then
routed to host kernel via additional syscall as we did for pSeries.


Yes, start off without in-kernel XICS so everything simply lives in 
QEMU. Then add callbacks into the in-kernel XICS to inject these 
interrupts if we don't have wide enough interfaces already.




Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf


On 06.05.14 02:41, Paul Mackerras wrote:

On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:

On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?

G5 sets DAR on an alignment interrupt.

As for PA6T, I don't know for sure, but if it doesn't, ordinary
alignment interrupts wouldn't be handled properly, since the code in
arch/powerpc/kernel/align.c assumes DAR contains the address being
accessed on all PowerPC CPUs.


Now that's a good point. If we simply behave like Linux, I'm fine. This 
definitely deserves a comment on the #ifdef in the code.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev