Re: [PATCH] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread christophe lombard

The field 'num_procs' of the structure cxl_afu is not updated to the
right value (maximum number of processes that can be supported by
the AFU) when the pages are allocated (i.e. when  cxl_alloc_spa() is called).
The number of allocates pages depends on the max number of processes.

Thanks


On 06/10/2015 08:19, Michael Ellerman wrote:

On Fri, 2015-10-02 at 16:01 +0200, Christophe Lombard wrote:

This moves the initialisation of the num_procs to before the SPA
allocation.

Why? What does it fix? I can't tell from the diff or the change log.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Who uses CROSS32_COMPILE ?

2015-10-06 Thread Denis Kirjanov
On 10/6/15, Michael Ellerman  wrote:
> Does anyone build their kernels using CROSS32_COMPILE ?

I didn't even know that such macro exists..

>
> cheers
>
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Fix _ALIGN_* errors due to type difference.

2015-10-06 Thread Michael Ellerman
On Fri, 2015-02-10 at 14:33:48 UTC, "Aneesh Kumar K.V" wrote:
> This avoid errors like
> 
> unsigned int usize = 1 << 30;
> int size = 1 << 30;
> unsigned long addr = 64UL << 30 ;
> 
> value = _ALIGN_DOWN(addr, usize); -> 0
> value = _ALIGN_DOWN(addr, size);  -> 0x10

Are you actually seeing that anywhere? I assume not.

> diff --git a/arch/powerpc/boot/page.h b/arch/powerpc/boot/page.h
> index 14eca30fef64..87c42d7d283d 100644
> --- a/arch/powerpc/boot/page.h
> +++ b/arch/powerpc/boot/page.h
> @@ -22,8 +22,8 @@
>  #define PAGE_MASK(~(PAGE_SIZE-1))
>  
>  /* align addr on a size boundary - adjust address up/down if needed */
> -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1)))
> -#define _ALIGN_DOWN(addr,size)   ((addr)&(~((size)-1)))
> +#define _ALIGN_UP(addr, size)
> (((addr)+((size)-1))&(~((typeof(addr))(size)-1)))
> +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((typeof(addr))(size)-1)))
>  
>  /* align addr on a size boundary - adjust address up if needed */
>  #define _ALIGN(addr,size) _ALIGN_UP(addr,size)
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 71294a6e976e..1dd69774a31c 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -240,8 +240,8 @@ extern long long virt_phys_offset;
>  #endif
>  
>  /* align addr on a size boundary - adjust address up/down if needed */
> -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1)))
> -#define _ALIGN_DOWN(addr,size)   ((addr)&(~((size)-1)))
> +#define _ALIGN_UP(addr, size)
> (((addr)+((size)-1))&(~((typeof(addr))(size)-1)))
> +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((typeof(addr))(size)-1)))


It looks like ALIGN() in kernel.h already does this right, so can we just use
that instead for _ALIGN_UP() at least.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-06 Thread Christoph Hellwig
Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define
with an #ifndef in common code?

Also not all architectures use dma-mapping-common.h yet, so you either
need to update all of those as well, or just add the #ifndef directly
to linux/dma-mapping.h.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Kconfig.cputype: Disallow TUNE_CELL on LE systems

2015-10-06 Thread Michael Ellerman
On Mon, 2015-09-21 at 12:07 +0200, Thomas Huth wrote:
> On 21/09/15 09:18, Michael Ellerman wrote:
> > On Fri, 2015-09-18 at 16:17 +0200, Thomas Huth wrote:
> >> It looks somewhat weird that you can enable TUNE_CELL on little
> >> endian systems, so let's disable this option with CPU_LITTLE_ENDIAN.
> >>
> >> Signed-off-by: Thomas Huth 
> >> ---
> >>  I first thought that it might be better to make this option depend
> >>  on PPC_CELL instead ... but I guess it's a bad idea to depend a
> >>  CPU option on a platform option? Alternatively, would it make
> >>  sense to make it depend on (GENERIC_CPU || CELL_CPU) instead?
> > 
> > Hmm, it's a little backward, but I think it would be fine, and less 
> > confusing
> > for users. Both PS3 and Cell select PPC_CELL, so it would work in both those
> > cases.
> 
> It's just that when you step through the kernel config (e.g. with "make
> menuconfig"), you normally step through the "Processor support" first,
> and then later do the "Platform support". I think most users won't look
> back into "Processor support" again once they already reached the
> "Platform support" section, so this TUNE_CELL option then might appear
> unnoticed when you enable a Cell platform under "Platform support".

Ah OK. Personally I almost never use menuconfig, but I guess some folks do.

That actually seems like we should reorder those sections, ie. put platform
support first, and then processor support. After all there's not much point
agonising over whether to tune for CELL cpus if you then don't enable a Cell
platform.

I'm not sure if it's that simple in practice ... :)

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, 1/5] powerpc:numa Add numa_cpu_lookup function to update lookup table

2015-10-06 Thread Michael Ellerman
On Sun, 2015-27-09 at 18:29:09 UTC, Raghavendra K T wrote:
> We access numa_cpu_lookup_table array directly in all the places
> to read/update numa cpu lookup information. Instead use a helper
> function to update.
> 
> This is helpful in changing the way numa<-->cpu mapping in single
> place when needed.
> 
> This is a cosmetic change, no change in functionality.
> 
> Signed-off-by: Raghavendra K T 
> Signed-off-by: Raghavendra K T 
> ---
>  arch/powerpc/include/asm/mmzone.h |  2 +-
>  arch/powerpc/kernel/smp.c | 10 +-
>  arch/powerpc/mm/numa.c| 28 +---
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmzone.h 
> b/arch/powerpc/include/asm/mmzone.h
> index 7b58917..c24a5f4 100644
> --- a/arch/powerpc/include/asm/mmzone.h
> +++ b/arch/powerpc/include/asm/mmzone.h
> @@ -29,7 +29,7 @@ extern struct pglist_data *node_data[];
>   * Following are specific to this numa platform.
>   */
>  
> -extern int numa_cpu_lookup_table[];
> +extern int numa_cpu_lookup(int cpu);

Can you rename it better :)

Something like cpu_to_nid().

Although maybe nid is wrong given the rest of the series.

> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 8b9502a..d5e6eee 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -52,7 +52,6 @@ int numa_cpu_lookup_table[NR_CPUS];
>  cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>  struct pglist_data *node_data[MAX_NUMNODES];
>  
> -EXPORT_SYMBOL(numa_cpu_lookup_table);
>  EXPORT_SYMBOL(node_to_cpumask_map);
>  EXPORT_SYMBOL(node_data);
>  
> @@ -134,19 +133,25 @@ static int __init fake_numa_create_new_node(unsigned 
> long end_pfn,
>   return 0;
>  }
>  
> -static void reset_numa_cpu_lookup_table(void)
> +int numa_cpu_lookup(int cpu)
>  {
> - unsigned int cpu;
> -
> - for_each_possible_cpu(cpu)
> - numa_cpu_lookup_table[cpu] = -1;
> + return numa_cpu_lookup_table[cpu];
>  }
> +EXPORT_SYMBOL(numa_cpu_lookup);

I don't see you changing any modular code that uses this, or any macros that
might be used by modules, so I don't see why this needs to be exported?

I think you just added it because num_cpu_lookup_table was exported?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-10-06 Thread Michael Ellerman
On Sun, 2015-09-27 at 23:59 +0530, Raghavendra K T wrote:
> Problem description:
> Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
> numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
> got from device tree is naturally mapped (directly) to nid.
> 
> Potential side effect of that is:
> 
> 1) There are several places in kernel that assumes serial node numbering.
> and memory allocations assume that all the nodes from 0-(highest nid)
> exist inturn ending up allocating memory for the nodes that does not exist.

Is it several? Or lots?

If it's several, ie. more than two but not lots, then we should probably just
fix those places. Or is that /really/ hard for some reason?

Do we ever get whole nodes hotplugged in under PowerVM? I don't think so, but I
don't remember for sure.

> 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping
> sparse nid of the host system to contiguous nids of guest (numa affinity,
> placement) could be a challenge.

Can you elaborate? That's a bit vague.

> Possible Solutions:
> 1) Handling the memory allocations is kernel case by case: Though in some
> cases it is easy to achieve, some cases may be intrusive/not trivial. 
> at the end it does not handle side effect (2) above.
> 
> 2) Map the sparse chipid got from device tree to a serial nid at kernel
> level (The idea proposed in this series).
> Pro: It is more natural to handle at kernel level than at lower (OPAL) layer.
> con: The chipid is in device tree no longer the same as nid in kernel
> 
> 3) Let the lower layer (OPAL) give the serial node ids after parsing the
> chipid and the associativity etc [ either as a separate item in device tree
> or by compacting the chipid numbers ]
> Pros: kernel, device tree are on same page and less change in kernel
> Con: is it the functionality expected in lower layer

...

> 3) Numactl tests from
> ftp://oss.sgi.com/www/projects/libnuma/download/numactl-2.0.10.tar.gz
> 
> (infact there were more breakage before the patch because of sparse nid
> and memoryless node cases of powerpc)

This is probably the best argument for your series. ie. userspace is dumb and
fixing every broken app that assumes linear node numbering is not feasible.


So on the whole I think the concept is good. This series though is a bit
confusing because of all the renaming etc. etc. Nish made lots of good comments
so I'll wait for a v2 based on those.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, 1/5] powerpc:numa Add numa_cpu_lookup function to update lookup table

2015-10-06 Thread Raghavendra K T

On 10/06/2015 03:47 PM, Michael Ellerman wrote:

On Sun, 2015-27-09 at 18:29:09 UTC, Raghavendra K T wrote:

We access numa_cpu_lookup_table array directly in all the places
to read/update numa cpu lookup information. Instead use a helper
function to update.

This is helpful in changing the way numa<-->cpu mapping in single
place when needed.

This is a cosmetic change, no change in functionality.

Signed-off-by: Raghavendra K T 
Signed-off-by: Raghavendra K T 
---
  arch/powerpc/include/asm/mmzone.h |  2 +-
  arch/powerpc/kernel/smp.c | 10 +-
  arch/powerpc/mm/numa.c| 28 +---
  3 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/mmzone.h 
b/arch/powerpc/include/asm/mmzone.h
index 7b58917..c24a5f4 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -29,7 +29,7 @@ extern struct pglist_data *node_data[];
   * Following are specific to this numa platform.
   */

-extern int numa_cpu_lookup_table[];
+extern int numa_cpu_lookup(int cpu);


Can you rename it better :)

Something like cpu_to_nid().


Good name. sure.



Although maybe nid is wrong given the rest of the series.


May be not. The current plan is to rename (after discussing with Nish)

chipid to pnid (physical nid)
and nid to vnid (virtual nid)

within powerpc numa.c
[reasoning chipid is applicable only to OPAL, since we want to handle
powerkvm, powervm and baremetal we need a generic name ]

But 'nid' naming will be retained which is applicable for generic
kernel interactions.




diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8b9502a..d5e6eee 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -52,7 +52,6 @@ int numa_cpu_lookup_table[NR_CPUS];
  cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
  struct pglist_data *node_data[MAX_NUMNODES];

-EXPORT_SYMBOL(numa_cpu_lookup_table);
  EXPORT_SYMBOL(node_to_cpumask_map);
  EXPORT_SYMBOL(node_data);

@@ -134,19 +133,25 @@ static int __init fake_numa_create_new_node(unsigned long 
end_pfn,
return 0;
  }

-static void reset_numa_cpu_lookup_table(void)
+int numa_cpu_lookup(int cpu)
  {
-   unsigned int cpu;
-
-   for_each_possible_cpu(cpu)
-   numa_cpu_lookup_table[cpu] = -1;
+   return numa_cpu_lookup_table[cpu];
  }
+EXPORT_SYMBOL(numa_cpu_lookup);


I don't see you changing any modular code that uses this, or any macros that
might be used by modules, so I don't see why this needs to be exported?

I think you just added it because num_cpu_lookup_table was exported?



arch/powerpc/kernel/smp.c uses it.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Kconfig.cputype: Disallow TUNE_CELL on LE systems

2015-10-06 Thread Thomas Huth
On 06/10/15 12:05, Michael Ellerman wrote:
> On Mon, 2015-09-21 at 12:07 +0200, Thomas Huth wrote:
>> On 21/09/15 09:18, Michael Ellerman wrote:
>>> On Fri, 2015-09-18 at 16:17 +0200, Thomas Huth wrote:
 It looks somewhat weird that you can enable TUNE_CELL on little
 endian systems, so let's disable this option with CPU_LITTLE_ENDIAN.

 Signed-off-by: Thomas Huth 
 ---
  I first thought that it might be better to make this option depend
  on PPC_CELL instead ... but I guess it's a bad idea to depend a
  CPU option on a platform option? Alternatively, would it make
  sense to make it depend on (GENERIC_CPU || CELL_CPU) instead?
>>>
>>> Hmm, it's a little backward, but I think it would be fine, and less 
>>> confusing
>>> for users. Both PS3 and Cell select PPC_CELL, so it would work in both those
>>> cases.
>>
>> It's just that when you step through the kernel config (e.g. with "make
>> menuconfig"), you normally step through the "Processor support" first,
>> and then later do the "Platform support". I think most users won't look
>> back into "Processor support" again once they already reached the
>> "Platform support" section, so this TUNE_CELL option then might appear
>> unnoticed when you enable a Cell platform under "Platform support".
> 
> Ah OK. Personally I almost never use menuconfig, but I guess some folks do.
> 
> That actually seems like we should reorder those sections, ie. put platform
> support first, and then processor support. After all there's not much point
> agonising over whether to tune for CELL cpus if you then don't enable a Cell
> platform.

Not sure whether reordering the sections make much sense - others might
think "I want to support Cell chips with my distro, so let's enable that
first, then let's see which platforms I can select next..." - so I'd
rather not do that.

> I'm not sure if it's that simple in practice ... :)

Maybe we could also simply remove the TUNE_CELL option nowadays? I think
this was used for building generic Linux distros, which are just
optimized for Cell ... but who is still doing that nowadays?

Alternatively, if that is not an option and if you don't like my patch
with CPU_LITTLE_ENDIAN, what about changing it to check "depends on
(GENERIC_CPU || CELL_CPU)" instead?

 Thomas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/2] powerpc/xmon: Paged output for paca display

2015-10-06 Thread Michael Ellerman
On Fri, 2015-21-08 at 04:24:27 UTC, Sam bobroff wrote:
> The paca display is already more than 24 lines, which can be problematic
> if you have an old school 80x24 terminal, or more likely you are on a
> virtual terminal which does not scroll for whatever reason.
> 
> This patch adds a new command ".", which takes a single (hex) numeric
> argument: lines per page. It will cause the output of "dp" and "dpa"
> to be broken into pages, if necessary.
> 
> This is implemented by running over the entire output both for the
> initial command and for each subsequent page: the visible part is
> clipped out by checking line numbers. This is a simplistic approach
> but minimally invasive; it is intended to be easily reusable for other
> commands.
> 
> Sample output:
> 
> 0:mon> .10
> 0:mon> dp1
> paca for cpu 0x1 @ cfdc0480:
>  possible = yes
>  present  = yes
>  online   = yes
>  lock_token   = 0x8000(0x8)
>  paca_index   = 0x1   (0xa)
>  kernel_toc   = 0xc0eb2400(0x10)
>  kernelbase   = 0xc000(0x18)
>  kernel_msr   = 0xb0001032(0x20)
>  emergency_sp = 0xc0003ffe8000(0x28)
>  mc_emergency_sp  = 0xc0003ffe4000(0x2e0)
>  in_mce   = 0x0   (0x2e8)
>  data_offset  = 0x7f17(0x30)
>  hw_cpu_id= 0x8   (0x38)
>  cpu_start= 0x1   (0x3a)
>  kexec_state  = 0x0   (0x3b)
> [Enter for next page]
> 0:mon>
>  __current= 0xc0007e696620(0x290)
>  kstack   = 0xc0007e6ebe30(0x298)
>  stab_rr  = 0xb   (0x2a0)
>  saved_r1 = 0xc0007ef37860(0x2a8)
>  trap_save= 0x0   (0x2b8)
>  soft_enabled = 0x0   (0x2ba)
>  irq_happened = 0x1   (0x2bb)
>  io_sync  = 0x0   (0x2bc)
>  irq_work_pending = 0x0   (0x2bd)
>  nap_state_lost   = 0x0   (0x2be)
> 0:mon>
> 
> (Based on a similar patch by Michael Ellerman 
> "[v2] powerpc/xmon: Allow limiting the size of the paca display".
> This patch is an alternative and cannot coexist with the original.)
> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/xmon/xmon.c | 86 
> +++-
>  1 file changed, 71 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index e599259..9ce9e7d 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -72,6 +72,12 @@ static int xmon_gate;
>  
>  static unsigned long in_xmon __read_mostly = 0;
>  
> +#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } while 
> (0)

Can you do this is a proper function. I know it will need to be varargs, but
that shouldn't be too ugly.

> +#define MAX_PAGED_SIZE 1024

Why do we need a max at all?

> +static unsigned long paged_size = 0, paged_pos, paged_cur_page;

> +#ifdef CONFIG_PPC64
> +static unsigned long paca_cpu;
> +#endif

That can just be static in dump_pacas() by the looks.

>  static unsigned long adrs;
>  static int size = 1;
>  #define MAX_DUMP (128 * 1024)
> @@ -242,6 +248,9 @@ Commands:\n\
>  "  u dump TLB\n"
>  #endif
>  "  ? help\n"
> +#ifdef CONFIG_PPC64
> +"  .#limit output to # lines per page (dump paca only)\n"
> +#endif

Don't make it 64-bit only.

>  "  zrreboot\n\
>zh halt\n"
>  ;
> @@ -833,6 +842,19 @@ static void remove_cpu_bpts(void)
>   write_ciabr(0);
>  }
>  
> +static void paged_set_size(void)

"paged" isn't reading very well for me. Can we use "pagination" instead? I know
it's longer but monitors are wide these days.

Also I prefer verb first usually, so set_pagination_size() etc.

> +{
> + if (!scanhex(&paged_size) || (paged_size > MAX_PAGED_SIZE)) {
> + printf("Invalid number of lines per page (max: %d).\n",
> +MAX_PAGED_SIZE);
> + paged_size = 0;
> + }
> +}
> +static void paged_reset(void)
> +{
> + paged_cur_page = 0;
> +}

You only call that once so a function seems like over kill.

>  /* Command interpreting routine */
>  static char *last_cmd;
>  
> @@ -863,7 +885,8 @@ cmds(struct pt_regs *excp)
>   take_input(last_cmd);
>   last_cmd = NULL;
>   cmd = inchar();
> - }
> + } else
> + paged_reset();
>   switch (cmd) {
>   case 'm':
>   cmd = inchar();
> @@ -924,6 +947,9 @@ cmds(struct pt_regs *excp)
>   case '?':
>   xmon_puts(help_string);
>   break;
> + case '.':
> + paged_set_size();
> + break;
>   case 'b':
> 

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-10-06 Thread Raghavendra K T

On 10/06/2015 03:55 PM, Michael Ellerman wrote:

On Sun, 2015-09-27 at 23:59 +0530, Raghavendra K T wrote:

Problem description:
Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
got from device tree is naturally mapped (directly) to nid.

Potential side effect of that is:

1) There are several places in kernel that assumes serial node numbering.
and memory allocations assume that all the nodes from 0-(highest nid)
exist inturn ending up allocating memory for the nodes that does not exist.


Is it several? Or lots?

If it's several, ie. more than two but not lots, then we should probably just
fix those places. Or is that /really/ hard for some reason?



It is several and I did attempt to fix them. But the rest of the places
(like memcg, work queue, scheduler and so on) are tricky to fix because
the memory allocations are glued with other things.
and similar fix may be expected in future too..



Do we ever get whole nodes hotplugged in under PowerVM? I don't think so, but I
don't remember for sure.



Even on powervm we do have discontiguous  numa nodes. [Adding more to 
it, we could even end up creating a dummy node 0 just to make kernel

happy]
for e.g.,
available: 2 nodes (0,7)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 7 cpus: 0 1 2 3 4 5 6 7
node 7 size: 10240 MB
node 7 free: 8174 MB
node distances:
node   0   7
  0:  10  40
  7:  40  10

note that node zero neither has any cpu nor memory.


2) For virtualization use cases (such as qemu, libvirt, openstack), mapping
sparse nid of the host system to contiguous nids of guest (numa affinity,
placement) could be a challenge.


Can you elaborate? That's a bit vague.


one e.g., i can think of: (though libvirt/openstack people will know 
more about it) suppose one wishes to have half of the vcpus bind to one

physical node and rest of the vcpus to second numa node, we cant say
whether second node is 1,8, or 16. and same libvirtxml on a two node
system may not be valid for another two numa node system.
[ i believe it may cause some migration problem too ].




Possible Solutions:
1) Handling the memory allocations is kernel case by case: Though in some
cases it is easy to achieve, some cases may be intrusive/not trivial.
at the end it does not handle side effect (2) above.

2) Map the sparse chipid got from device tree to a serial nid at kernel
level (The idea proposed in this series).
Pro: It is more natural to handle at kernel level than at lower (OPAL) layer.
con: The chipid is in device tree no longer the same as nid in kernel

3) Let the lower layer (OPAL) give the serial node ids after parsing the
chipid and the associativity etc [ either as a separate item in device tree
or by compacting the chipid numbers ]
Pros: kernel, device tree are on same page and less change in kernel
Con: is it the functionality expected in lower layer


...


3) Numactl tests from
ftp://oss.sgi.com/www/projects/libnuma/download/numactl-2.0.10.tar.gz

(infact there were more breakage before the patch because of sparse nid
and memoryless node cases of powerpc)


This is probably the best argument for your series. ie. userspace is dumb and
fixing every broken app that assumes linear node numbering is not feasible.


So on the whole I think the concept is good. This series though is a bit
confusing because of all the renaming etc. etc. Nish made lots of good comments
so I'll wait for a v2 based on those.



Yes, will be sending V2 soon extending my patch to fix powervm case too.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Who uses CROSS32_COMPILE ?

2015-10-06 Thread Michael Ellerman
On Tue, 2015-10-06 at 12:40 +0300, Denis Kirjanov wrote:
> On 10/6/15, Michael Ellerman  wrote:
> > Does anyone build their kernels using CROSS32_COMPILE ?
> 
> I didn't even know that such macro exists..

Good, I want to remove it :)

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread Christophe Leroy



Le 29/09/2015 00:07, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:

We are spending between 40 and 160 cycles with a mean of 65 cycles in
the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction.

Signed-off-by: Christophe Leroy 

Does this just make it simpler or does it make it faster?  What is the
performance impact?  Is the performance impact seen with or without
CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
mtspr/mfspr combo in order to replace one mfspr.


The performance impact is not noticeable. Theoritically it adds 1 cycle 
on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
spend around 10% of the time in TLB handling exceptions, that represents 
only 0.15% of the total CPU time. So that's almost nothing.
Behind the fact to get in simpler, the main reason is because I need a 
third register for the following patch in the set, otherwise I would 
spend a more time saving and restoring CR several times.


Christophe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either

2015-10-06 Thread Christophe Leroy


Le 29/09/2015 01:41, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:40PM +0200, Christophe Leroy wrote:

iounmap() cannot vunmap() area mapped by TLBCAMs either

Signed-off-by: Christophe Leroy 
---
No change in v2

  arch/powerpc/mm/pgtable_32.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 7692d1b..03a073a 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -278,7 +278,9 @@ void iounmap(volatile void __iomem *addr)
 * If mapped by BATs then there is nothing to do.
 * Calling vfree() generates a benign warning.
 */
-   if (v_mapped_by_bats((unsigned long)addr)) return;
+   if (v_mapped_by_bats((unsigned long)addr) ||
+   v_mapped_by_tlbcam((unsigned long)addr))
+   return;

This is pretty pointless given that the next patch replaces both with
v_mapped_by_other().


I thought it was cleaner to first fix the bug, in order to make the 
following patch straight through, but I can skip it, no problem.


Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2015-10-06 Thread Christophe Leroy



Le 29/09/2015 01:47, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote:

x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, so lets group them into a single function.

Signed-off-by: Christophe Leroy 
---
No change in v2

  arch/powerpc/mm/pgtable_32.c | 33 ++---
  1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 03a073a..3fd9083 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa);
  #define p_mapped_by_tlbcam(x) (0UL)
  #endif /* HAVE_TLBCAM */
  
+static inline unsigned long p_mapped_by_other(phys_addr_t pa)

+{
+   unsigned long v;
+
+   v = p_mapped_by_bats(pa);
+   if (v /*&& p_mapped_by_bats(p+size-1)*/)
+   return v;
+
+   return p_mapped_by_tlbcam(pa);
+}

Did you forget to remove that comment?


No I didn't, I though it was there for a reason, it has been there since 
2005.

Do you think I should remove it ?

Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations

2015-10-06 Thread Christophe Leroy



Le 29/09/2015 02:00, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote:

We are spending between 40 and 160 cycles with a mean of 65 cycles
in the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction

Signed-off-by: Christophe Leroy 
---
No change in v2

  arch/powerpc/kernel/head_8xx.S | 15 ---
  1 file changed, 4 insertions(+), 11 deletions(-)

Why is this a separate patch from 1/25?

Same comments as on that patch.


Just because here there is no real need behind the simplification of the 
code, whereas the first one was a pre-requisite for the following patch.

Should I merge them together anyway ?

Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread Christophe Lombard
This moves the initialisation of the num_procs to before the SPA
allocation.
The field 'num_procs' of the structure cxl_afu is not updated to the
right value (maximum number of processes that can be supported by
the AFU) when the pages are allocated (i.e. when  cxl_alloc_spa() is called).
The number of allocates pages depends on the max number of processes.

Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index b37f2e8..d2e75c8 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu)
 
dev_info(&afu->dev, "Activating AFU directed mode\n");
 
+   afu->num_procs = afu->max_procs_virtualised;
if (afu->spa == NULL) {
if (cxl_alloc_spa(afu))
return -ENOMEM;
@@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu)
cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
 
afu->current_mode = CXL_MODE_DIRECTED;
-   afu->num_procs = afu->max_procs_virtualised;
 
if ((rc = cxl_chardev_m_afu_add(afu)))
return rc;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones

2015-10-06 Thread Christophe Leroy



Le 29/09/2015 02:03, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote:

Move 8xx SPRN defines into reg_8xx.h and add some missing ones

Signed-off-by: Christophe Leroy 
---
No change in v2

Why are they being moved?  Why are they being separated from the bit
definitions?



It was to keep asm/reg_8xx.h self sufficient for the following patch.

Also because including asm/mmu-8xx.h creates circular inclusion issue 
(mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes 
cache.h, cache.h include reg.h which includes reg_8xx). The circle 
starts with an inclusion of asm/cache.h by linux/cache.h, himself 
included by linux/printk.h, and I end up with 'implicit declaration' issues.


How can I fix that ?

Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup

2015-10-06 Thread Christophe Leroy



Le 29/09/2015 01:58, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote:

On recent kernels, with some debug options like for instance
CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
the kernel code fits in the first 8M.
Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
at startup, allthough pinning TLB is not necessary for that.

This patch adds a second 8M page to the initial mapping in order to
have 16M mapped regardless of CONFIG_PIN_TLB, like several other
32 bits PPC (40x, 601, ...)

Signed-off-by: Christophe Leroy 
---

Is the assumption that nobody is still running 8xx systems with only 8
MiB RAM on current kernels?


No, setup_initial_memory_limit() limits the memory to the minimum 
between 16M and the real memory size, so if a platform has only 8M, it 
will still be limited to 8M even with 16M mapped.


Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 16:02 +0200, Christophe Leroy wrote:
> Le 29/09/2015 01:47, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote:
> > > x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
> > > purpose, so lets group them into a single function.
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > No change in v2
> > > 
> > >   arch/powerpc/mm/pgtable_32.c | 33 ++---
> > >   1 file changed, 26 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> > > index 03a073a..3fd9083 100644
> > > --- a/arch/powerpc/mm/pgtable_32.c
> > > +++ b/arch/powerpc/mm/pgtable_32.c
> > > @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t 
> > > pa);
> > >   #define p_mapped_by_tlbcam(x)   (0UL)
> > >   #endif /* HAVE_TLBCAM */
> > >   
> > > +static inline unsigned long p_mapped_by_other(phys_addr_t pa)
> > > +{
> > > + unsigned long v;
> > > +
> > > + v = p_mapped_by_bats(pa);
> > > + if (v /*&& p_mapped_by_bats(p+size-1)*/)
> > > + return v;
> > > +
> > > + return p_mapped_by_tlbcam(pa);
> > > +}
> > Did you forget to remove that comment?
> > 
> > 
> No I didn't, I though it was there for a reason, it has been there since 
> 2005.
> Do you think I should remove it ?

Oh, you took it from __ioremap_caller.  Commented-out code is generally 
frowned upon, and it makes even less sense now because there's no "size" in 
p_mapped_by_other.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 16:10 +0200, Christophe Leroy wrote:
> Le 29/09/2015 01:58, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote:
> > > On recent kernels, with some debug options like for instance
> > > CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
> > > the kernel code fits in the first 8M.
> > > Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
> > > at startup, allthough pinning TLB is not necessary for that.
> > > 
> > > This patch adds a second 8M page to the initial mapping in order to
> > > have 16M mapped regardless of CONFIG_PIN_TLB, like several other
> > > 32 bits PPC (40x, 601, ...)
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > Is the assumption that nobody is still running 8xx systems with only 8
> > MiB RAM on current kernels?
> > 
> > 
> No, setup_initial_memory_limit() limits the memory to the minimum 
> between 16M and the real memory size, so if a platform has only 8M, it 
> will still be limited to 8M even with 16M mapped.

And you just hope you don't get a speculative fetch from the second 8M?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/6] kernel/cpu.c: eliminate some indirection

2015-10-06 Thread Rasmus Villemoes
v2: fix build failure on ppc, add acks.

The four cpumasks cpu_{possible,online,present,active}_bits are
exposed readonly via the corresponding const variables
cpu_xyz_mask. But they are also accessible for arbitrary writing via
the exposed functions set_cpu_xyz. There's quite a bit of code
throughout the kernel which iterates over or otherwise accesses these
bitmaps, and having the access go via the cpu_xyz_mask variables is
nowadays [1] simply a useless indirection.

It may be that any problem in CS can be solved by an extra level of
indirection, but that doesn't mean every extra indirection solves a
problem. In this case, it even necessitates some minor ugliness (see
4/6).

Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a
struct member, to avoid problems when the identifier cpu_online_mask
becomes a macro later in the series. The next four patches eliminate
the cpu_xyz_mask variables by simply exposing the actual bitmaps,
after renaming them to discourage direct access - that still happens
through cpu_xyz_mask, which are now simply macros with the same type
and value as they used to have.

After that, there's no longer any reason to have the setter functions
be out-of-line: The boolean parameter is almost always a literal true
or false, so by making them static inlines they will usually compile
to one or two instructions.

For a defconfig build on x86_64, bloat-o-meter says we save ~3000
bytes. We also save a little stack (stackdelta says 127 functions have
a 16 byte smaller stack frame, while two grow by that amount). Mostly
because, when iterating over the mask, gcc typically loads the value
of cpu_xyz_mask into a callee-saved register and from there into %rdi
before each find_next_bit call - now it can just load the appropriate
immediate address into %rdi before each call.

[1] See Rusty's kind explanation
http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for
some historic context.

Rasmus Villemoes (6):
  powerpc/fadump: rename cpu_online_mask member of struct
fadump_crash_info_header
  kernel/cpu.c: change type of cpu_possible_bits and friends
  kernel/cpu.c: export __cpu_*_mask
  drivers/base/cpu.c: use __cpu_*_mask directly
  kernel/cpu.c: eliminate cpu_*_mask
  kernel/cpu.c: make set_cpu_* static inlines

 arch/powerpc/include/asm/fadump.h |  2 +-
 arch/powerpc/kernel/fadump.c  |  4 +--
 drivers/base/cpu.c| 10 +++---
 include/linux/cpumask.h   | 55 -
 kernel/cpu.c  | 64 ---
 5 files changed, 68 insertions(+), 67 deletions(-)

-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/6] powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_header

2015-10-06 Thread Rasmus Villemoes
As preparation for eliminating the indirect access to the various
global cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename
the cpu_online_mask member of struct fadump_crash_info_header to
simply online_mask, thus allowing cpu_online_mask to become a macro.

Acked-by: Michael Ellerman 
Signed-off-by: Rasmus Villemoes 
---
 arch/powerpc/include/asm/fadump.h | 2 +-
 arch/powerpc/kernel/fadump.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 493e72f64b35..b4407d0add27 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -191,7 +191,7 @@ struct fadump_crash_info_header {
u64 elfcorehdr_addr;
u32 crashing_cpu;
struct pt_regs  regs;
-   struct cpumask  cpu_online_mask;
+   struct cpumask  online_mask;
 };
 
 /* Crash memory ranges */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 26d091a1a54c..3cb3b02a13dd 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -415,7 +415,7 @@ void crash_fadump(struct pt_regs *regs, const char *str)
else
ppc_save_regs(&fdh->regs);
 
-   fdh->cpu_online_mask = *cpu_online_mask;
+   fdh->online_mask = *cpu_online_mask;
 
/* Call ibm,os-term rtas call to trigger firmware assisted dump */
rtas_os_term((char *)str);
@@ -646,7 +646,7 @@ static int __init fadump_build_cpu_notes(const struct 
fadump_mem_struct *fdm)
}
/* Lower 4 bytes of reg_value contains logical cpu id */
cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK;
-   if (fdh && !cpumask_test_cpu(cpu, &fdh->cpu_online_mask)) {
+   if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) {
SKIP_TO_NEXT_CPU(reg_entry);
continue;
}
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy 
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

If you had said in the changelog that it was because future patches would 
need the register to be saved, we could have avoided this exchange...  
Especially with large patchsets, I review the patches one at a time.  Don't 
assume I know what's coming in patch n+1 (and especially not n+m) when I 
review patch n.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy 
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

FWIW, the added instruction is an SPR access and I doubt that's only one 
cycle.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 16:12 +0200, Christophe Leroy wrote:
> Le 29/09/2015 02:00, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles
> > > in the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > No change in v2
> > > 
> > >   arch/powerpc/kernel/head_8xx.S | 15 ---
> > >   1 file changed, 4 insertions(+), 11 deletions(-)
> > Why is this a separate patch from 1/25?
> > 
> > Same comments as on that patch.
> > 
> > 
> Just because here there is no real need behind the simplification of the 
> code, whereas the first one was a pre-requisite for the following patch.
> Should I merge them together anyway ?

If there's no real need, why do it?  It's not really a major readability 
enhancement...

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 16:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 02:03, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote:
> > > Move 8xx SPRN defines into reg_8xx.h and add some missing ones
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > No change in v2
> > Why are they being moved?  Why are they being separated from the bit
> > definitions?
> > 
> > 
> It was to keep asm/reg_8xx.h self sufficient for the following patch.

Again, it would have been nice if this were in the commit message.

> Also because including asm/mmu-8xx.h creates circular inclusion issue 
> (mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes 
> cache.h, cache.h include reg.h which includes reg_8xx). The circle 
> starts with an inclusion of asm/cache.h by linux/cache.h, himself 
> included by linux/printk.h, and I end up with 'implicit declaration' issues.
> 
> How can I fix that ?

mmu-8xx.h should have been including page.h instead of assuming the caller h 
as done so...  but another option is to do what mmu-book3e.h does, and use 
the kconfig symbols instead of PAGE_SHIFT.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Missing operand for tlbie instruction on Power7

2015-10-06 Thread Laura Abbott

On 10/05/2015 08:35 PM, Michael Ellerman wrote:

On Fri, 2015-10-02 at 08:43 -0700, Laura Abbott wrote:

Hi,

We received a report (https://bugzilla.redhat.com/show_bug.cgi?id=1267395) of 
bad assembly
when compiling on powerpc with little endian


...


After some discussion with the binutils folks, it turns out that the tlbie
instruction actually requires another operand and binutils was updated to
check for this https://sourceware.org/ml/binutils/2015-05/msg00133.html .

The code sequence in arch/powerpc/include/asm/ppc_asm.h now needs to be updated:

#if !defined(CONFIG_4xx) && !defined(CONFIG_8xx)
#define tlbia   \
  li  r4,1024;\
  mtctr   r4; \
  lis r4,KERNELBASE@h;\
0:  tlbie   r4; \
  addir4,r4,0x1000;   \
  bdnz0b
#endif

I don't know enough ppc assembly to properly fix this but I can test.


How are you testing? This code is fairly old and I'm dubious if it still works.

These days we have a ppc_md hook for flushing the TLB, ppc_md.flush_tlb().
Ideally the swsusp code would use that.

cheers




Testing would probably just be compile and maybe boot. I don't have regular
access to the hardware. This problem just showed up for me when someone
tried to compile Fedora rawhide with the latest binutils.

From what I can tell, it looks like the .flush_tlb of the cpu_spec is only
defined for power7 and power8 and I don't see a ppc_md.flush_tlb on the
master branch. It's not clear what to do for the case where there is no
flush_tlb function. Would filling in a .flush_tlb for all the PPC_BOOK3S_64
with the existing tlbia sequence work? It's also worth noting that the
__flush_power7 uses tlbiel instead of tlbie.

Thanks,
Laura
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH-RFC 3/7] powerpc: convert to generic builtin command line

2015-10-06 Thread Daniel Walker
This updates the powerpc code to use the CONFIG_GENERIC_CMDLINE
option.

Cc: xe-ker...@external.cisco.com
Cc: Daniel Walker 
Signed-off-by: Daniel Walker 
---
 arch/powerpc/Kconfig| 23 +--
 arch/powerpc/kernel/prom.c  |  4 
 arch/powerpc/kernel/prom_init.c |  8 
 3 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9a7057e..26252dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -160,6 +160,7 @@ config PPC
select EDAC_ATOMIC_SCRUB
select ARCH_HAS_DMA_SET_COHERENT_MASK
select HAVE_ARCH_SECCOMP_FILTER
+   select GENERIC_CMDLINE
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
@@ -640,28 +641,6 @@ config PPC_DENORMALISATION
  Add support for handling denormalisation of single precision
  values.  Useful for bare metal only.  If unsure say Y here.
 
-config CMDLINE_BOOL
-   bool "Default bootloader kernel arguments"
-
-config CMDLINE
-   string "Initial kernel command string"
-   depends on CMDLINE_BOOL
-   default "console=ttyS0,9600 console=tty0 root=/dev/sda2"
-   help
- On some platforms, there is currently no way for the boot loader to
- pass arguments to the kernel. For these platforms, you can supply
- some command-line options at build time by entering them here.  In
- most cases you will need to specify the root device here.
-
-config CMDLINE_FORCE
-   bool "Always use the default kernel command string"
-   depends on CMDLINE_BOOL
-   help
- Always use the default kernel command string, even if the boot
- loader passes other arguments to the kernel.
- This is useful if you cannot or don't want to change the
- command-line options your boot loader passes to the kernel.
-
 config EXTRA_TARGETS
string "Additional default image types"
help
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index bef76c5..3281d5a 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -670,6 +671,9 @@ void __init early_init_devtree(void *params)
 */
of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line);
 
+   /* append and prepend any arguments built into the kernel. */
+   cmdline_add_builtin(boot_command_line, NULL, COMMAND_LINE_SIZE);
+
/* Scan memory nodes and rebuild MEMBLOCKs */
of_scan_flat_dt(early_init_dt_scan_root, NULL);
of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 15099c4..2dd2608 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -595,11 +596,10 @@ static void __init early_cmdline_parse(void)
p = prom_cmd_line;
if ((long)prom.chosen > 0)
l = prom_getprop(prom.chosen, "bootargs", p, 
COMMAND_LINE_SIZE-1);
-#ifdef CONFIG_CMDLINE
+
if (l <= 0 || p[0] == '\0') /* dbl check */
-   strlcpy(prom_cmd_line,
-   CONFIG_CMDLINE, sizeof(prom_cmd_line));
-#endif /* CONFIG_CMDLINE */
+   cmdline_add_builtin(prom_cmd_line, NULL, sizeof(prom_cmd_line));
+
prom_printf("command line: %s\n", prom_cmd_line);
 
 #ifdef CONFIG_PPC64
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread christophe leroy



Le 06/10/2015 18:46, Scott Wood a écrit :

On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:

Le 29/09/2015 00:07, Scott Wood a écrit :

On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:

We are spending between 40 and 160 cycles with a mean of 65 cycles in
the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction.

Signed-off-by: Christophe Leroy 

Does this just make it simpler or does it make it faster?  What is the
performance impact?  Is the performance impact seen with or without
CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
mtspr/mfspr combo in order to replace one mfspr.



The performance impact is not noticeable. Theoritically it adds 1 cycle
on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
spend around 10% of the time in TLB handling exceptions, that represents
only 0.15% of the total CPU time. So that's almost nothing.
Behind the fact to get in simpler, the main reason is because I need a
third register for the following patch in the set, otherwise I would
spend a more time saving and restoring CR several times.

FWIW, the added instruction is an SPR access and I doubt that's only one
cycle.


According to the mpc885 reference manual (table 9-1), Instruction 
Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR

and CTR and to SPRs external to the core" is "serialize + 1 cycle".
Taking into account we preeceeding instructions are also 'mtspr', we are 
already serialized, so it is only one cycle I believe.

Am I interpreting it wrong ?

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread Scott Wood
On Tue, 2015-10-06 at 22:30 +0200, christophe leroy wrote:
> Le 06/10/2015 18:46, Scott Wood a écrit :
> > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> > > Le 29/09/2015 00:07, Scott Wood a écrit :
> > > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > > > We are spending between 40 and 160 cycles with a mean of 65 cycles 
> > > > > in
> > > > > the TLB handling routines (measured with mftbl) so make it more
> > > > > simple althought it adds one instruction.
> > > > > 
> > > > > Signed-off-by: Christophe Leroy 
> > > > Does this just make it simpler or does it make it faster?  What is the
> > > > performance impact?  Is the performance impact seen with or without
> > > > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > > > mtspr/mfspr combo in order to replace one mfspr.
> > > > 
> > > > 
> > > The performance impact is not noticeable. Theoritically it adds 1 cycle
> > > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
> > > spend around 10% of the time in TLB handling exceptions, that represents
> > > only 0.15% of the total CPU time. So that's almost nothing.
> > > Behind the fact to get in simpler, the main reason is because I need a
> > > third register for the following patch in the set, otherwise I would
> > > spend a more time saving and restoring CR several times.
> > FWIW, the added instruction is an SPR access and I doubt that's only one
> > cycle.
> > 
> > 
> According to the mpc885 reference manual (table 9-1), Instruction 
> Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR
> and CTR and to SPRs external to the core" is "serialize + 1 cycle".
> Taking into account we preeceeding instructions are also 'mtspr', we are 
> already serialized, so it is only one cycle I believe.
> Am I interpreting it wrong ?

I don't know.  The manual doesn't go into much detail about the mechanics of 
serialization.  If it's just about "block[ing] all execution units" without 
any effect on fetching, decoding, etc. then maybe you're right.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/3] ppc64: Fix warnings

2015-10-06 Thread Scott Wood
Produce a warning-free build on ppc64 (at least, when built as 64-bit
userspace -- if a 64-bit binary for ppc64 is a requirement, why is -m64
set only on purgatory?).  Mostly unused (or write-only) variable
warnings, but also one nasty one where reserve() was used without a
prototype, causing long long arguments to be passed as int.

Signed-off-by: Scott Wood 
---
v2: no change

 kexec/arch/ppc64/crashdump-ppc64.c | 3 ++-
 kexec/arch/ppc64/kexec-elf-ppc64.c | 9 +
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/kexec/arch/ppc64/crashdump-ppc64.c 
b/kexec/arch/ppc64/crashdump-ppc64.c
index 6214b83..b3c8928 100644
--- a/kexec/arch/ppc64/crashdump-ppc64.c
+++ b/kexec/arch/ppc64/crashdump-ppc64.c
@@ -33,6 +33,7 @@
 #include "../../kexec-syscall.h"
 #include "../../crashdump.h"
 #include "kexec-ppc64.h"
+#include "../../fs2dt.h"
 #include "crashdump-ppc64.h"
 
 static struct crash_elf_info elf_info64 =
@@ -187,7 +188,7 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
DIR *dir, *dmem;
FILE *file;
struct dirent *dentry, *mentry;
-   int i, n, crash_rng_len = 0;
+   int n, crash_rng_len = 0;
unsigned long long start, end;
int page_size;
 
diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c 
b/kexec/arch/ppc64/kexec-elf-ppc64.c
index 4a1540e..adcee4c 100644
--- a/kexec/arch/ppc64/kexec-elf-ppc64.c
+++ b/kexec/arch/ppc64/kexec-elf-ppc64.c
@@ -97,7 +97,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
struct mem_ehdr ehdr;
char *cmdline, *modified_cmdline = NULL;
const char *devicetreeblob;
-   int cmdline_len, modified_cmdline_len;
uint64_t max_addr, hole_addr;
char *seg_buf = NULL;
off_t seg_size = 0;
@@ -107,7 +106,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
uint64_t *rsvmap_ptr;
struct bootblock *bb_ptr;
 #endif
-   int i;
int result, opt;
uint64_t my_kernel, my_dt_offset;
uint64_t my_opal_base = 0, my_opal_entry = 0;
@@ -162,10 +160,7 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
}
}
 
-   cmdline_len = 0;
-   if (cmdline)
-   cmdline_len = strlen(cmdline) + 1;
-   else
+   if (!cmdline)
fprintf(stdout, "Warning: append= option is not passed. Using 
the first kernel root partition\n");
 
if (ramdisk && reuse_initrd)
@@ -181,7 +176,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
strncpy(modified_cmdline, cmdline, COMMAND_LINE_SIZE);
modified_cmdline[COMMAND_LINE_SIZE - 1] = '\0';
}
-   modified_cmdline_len = strlen(modified_cmdline);
}
 
/* Parse the Elf file */
@@ -219,7 +213,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
return -1;
/* Use new command line. */
cmdline = modified_cmdline;
-   cmdline_len = strlen(modified_cmdline) + 1;
}
 
/* Add v2wrap to the current image */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/3] ppc64: Avoid rfid if no need to clear MSR_LE

2015-10-06 Thread Scott Wood
Commit a304e2d82a8c3 ("ppc64: purgatory: Reset primary cpu endian to
big-endian) changed bctr to rfid.  rfid is book3s-only and will cause a
fatal exception on book3e.

Purgatory is an isolated environment which makes importing information
about the subarch awkward, so instead rely on the fact that MSR_LE
should never be set on book3e, and the rfid is only needed if MSR_LE is
set (and thus needs to be cleared).  In theory that MSR bit is reserved
on book3e, rather than zero, but in practice I have not seen it set.

Signed-off-by: Scott Wood 
Cc: Samuel Mendoza-Jonas 
---
v2: new patch

 purgatory/arch/ppc64/v2wrap.S | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/purgatory/arch/ppc64/v2wrap.S b/purgatory/arch/ppc64/v2wrap.S
index 179ade9..3534080 100644
--- a/purgatory/arch/ppc64/v2wrap.S
+++ b/purgatory/arch/ppc64/v2wrap.S
@@ -116,9 +116,17 @@ master:
stw 7,0x5c(4)   # and patch it into the kernel
mr  3,16# restore dt address
 
+   mfmsr   5
+   andi.   10,5,1  # test MSR_LE
+   bne little_endian
+
+   li  5,0 # r5 will be 0 for kernel
+   mtctr   4   # prepare branch to
+   bctr# start kernel
+   
+little_endian: # book3s-only
mtsrr0  4   # prepare branch to
 
-   mfmsr   5
clrrdi  5,5,1   # clear MSR_LE
mtsrr1  5
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/3] ppc64: Add a flag to tell the kernel it's booting from kexec

2015-10-06 Thread Scott Wood
It needs to know this because the SMP release mechanism for Freescale
book3e is different from when booting with normal hardware.  In theory
we could simulate the normal spin table mechanism, but not (easily) at
the addresses U-Boot put in the device tree -- so there'd need to be
even more communication between the kernel and kexec to set that up.

Signed-off-by: Scott Wood 
---
v2: Use a device tree property rather than setting a flag in the kernel
image, as requested by Michael Ellerman.
---
 kexec/arch/ppc64/Makefile   |  6 +++
 kexec/arch/ppc64/fdt.c  | 78 +
 kexec/arch/ppc64/include/arch/fdt.h |  9 +
 kexec/arch/ppc64/kexec-elf-ppc64.c  |  7 
 4 files changed, 100 insertions(+)
 create mode 100644 kexec/arch/ppc64/fdt.c
 create mode 100644 kexec/arch/ppc64/include/arch/fdt.h

diff --git a/kexec/arch/ppc64/Makefile b/kexec/arch/ppc64/Makefile
index 9a6e475..37cd233 100644
--- a/kexec/arch/ppc64/Makefile
+++ b/kexec/arch/ppc64/Makefile
@@ -1,11 +1,15 @@
 #
 # kexec ppc64 (linux booting linux)
 #
+include $(srcdir)/kexec/libfdt/Makefile.libfdt
+
 ppc64_KEXEC_SRCS =  kexec/arch/ppc64/kexec-elf-rel-ppc64.c
 ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-zImage-ppc64.c
 ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-elf-ppc64.c
 ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-ppc64.c
 ppc64_KEXEC_SRCS += kexec/arch/ppc64/crashdump-ppc64.c
+ppc64_KEXEC_SRCS += kexec/arch/ppc64/fdt.c
+ppc64_KEXEC_SRCS += $(LIBFDT_SRCS:%=kexec/libfdt/%)
 
 ppc64_ARCH_REUSE_INITRD =
 
@@ -13,6 +17,8 @@ ppc64_FS2DT   = kexec/fs2dt.c
 ppc64_FS2DT_INCLUDE = -include $(srcdir)/kexec/arch/ppc64/crashdump-ppc64.h \
   -include $(srcdir)/kexec/arch/ppc64/kexec-ppc64.h
 
+ppc64_CPPFLAGS = -I$(srcdir)/kexec/libfdt
+
 dist += kexec/arch/ppc64/Makefile $(ppc64_KEXEC_SRCS)  \
kexec/arch/ppc64/kexec-ppc64.h kexec/arch/ppc64/crashdump-ppc64.h \
kexec/arch/ppc64/include/arch/options.h
diff --git a/kexec/arch/ppc64/fdt.c b/kexec/arch/ppc64/fdt.c
new file mode 100644
index 000..8bc6d2d
--- /dev/null
+++ b/kexec/arch/ppc64/fdt.c
@@ -0,0 +1,78 @@
+/*
+ * ppc64 fdt fixups
+ *
+ * Copyright 2015 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Let the kernel know it booted from kexec, as some things (e.g.
+ * secondary CPU release) may work differently.
+ */
+static int fixup_kexec_prop(void *fdt)
+{
+   int err, nodeoffset;
+
+   nodeoffset = fdt_subnode_offset(fdt, 0, "chosen");
+   if (nodeoffset < 0)
+   nodeoffset = fdt_add_subnode(fdt, 0, "chosen");
+   if (nodeoffset < 0) {
+   printf("%s: add /chosen %s\n", __func__,
+  fdt_strerror(nodeoffset));
+   return -1;
+   }
+
+   err = fdt_setprop(fdt, nodeoffset, "linux,booted-from-kexec",
+ NULL, 0);
+   if (err < 0) {
+   printf("%s: couldn't write linux,booted-from-kexec: %s\n",
+  __func__, fdt_strerror(err));
+   return -1;
+   }
+
+   return 0;
+}
+
+
+/*
+ * For now, assume that the added content fits in the file.
+ * This should be the case when flattening from /proc/device-tree,
+ * and when passing in a dtb, dtc can be told to add padding.
+ */
+int fixup_dt(char **fdt, off_t *size)
+{
+   int ret;
+
+   *size += 4096;
+   *fdt = realloc(*fdt, *size);
+   if (!*fdt) {
+   fprintf(stderr, "%s: out of memory\n", __func__);
+   return -1;
+   }
+
+   ret = fdt_open_into(*fdt, *fdt, *size);
+   if (ret < 0) {
+   fprintf(stderr, "%s: fdt_open_into: %s\n", __func__,
+   fdt_strerror(ret));
+   return -1;
+   }
+
+   ret = fixup_kexec_prop(*fdt);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
diff --git a/kexec/arch/ppc64/include/arch/fdt.h 
b/kexec/arch/ppc64/include/arch/fdt.h
new file mode 100644
index 000..14f8be2
--- /dev/null
+++ b/kexec/arch/ppc64/include/arch/fdt.h
@@ -0,0 +1,9 @@
+#ifndef KEXEC_ARCH_PPC64_FDT
+#define KEXEC_ARCH_PPC64_FDT
+
+#include 
+
+int fixup_dt(char **fdt, off_t *size);
+
+#endif
+
diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c 
b/kexec/arch/ppc64/kexec-elf-ppc64.c
index adcee4c..ddd3de8 100644
--- a/kexec/arch/ppc64/kexec-elf-ppc64.c
+++ b/kexec/arch/ppc64/kexec-elf-ppc64.c
@@ -37,6 +37,8 @@
 #include "kexec-ppc64.h"
 #include "../../fs2dt.h"
 #include "crashdump-pp

Re: [kexec-lite PATCH V2] trampoline: Reset primary cpu endian to big-endian

2015-10-06 Thread Scott Wood
On Wed, 2015-07-08 at 13:49 +1000, Samuel Mendoza-Jonas wrote:
> On 08/07/15 13:37, Scott Wood wrote:
> > On Wed, 2015-07-08 at 13:29 +1000, Samuel Mendoza-Jonas wrote:
> > > Older big-endian ppc64 kernels don't include the FIXUP_ENDIAN check,
> > > meaning if we kexec from a little-endian kernel the target kernel will
> > > fail to boot.
> > > Returning to big-endian before we enter the target kernel ensures that
> > > the target kernel can boot whether or not it includes FIXUP_ENDIAN.
> > > 
> > > Signed-off-by: Samuel Mendoza-Jonas 
> > > ---
> > > V2: As suggested by Anton take advantage of the rfid call and switch off
> > > MSR_LE and branch to the target kernel in the same step.
> > > 
> > >  kexec_trampoline.S | 11 +--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kexec_trampoline.S b/kexec_trampoline.S
> > > index a3eb314..3751112 100644
> > > --- a/kexec_trampoline.S
> > > +++ b/kexec_trampoline.S
> > > @@ -88,8 +88,15 @@ start:
> > >  
> > >   li  r5,0
> > >  
> > > - mtctr   r4
> > > - bctr
> > > + mtsrr0  r4
> > > +
> > > + mfmsr   r5
> > > + clrrdi  r5,r5,1 /* Clear MSR_LE */
> > > + mtsrr1  r5
> > > +
> > > + li  r5,0
> > > +
> > > + rfid
> > 
> > Is kexec-lite meant to be specific to book3s-64?  The README just says "A 
> > simple kexec for flattened device tree platforms" and I see a 
> > __powerpc64__ 
> > ifdef in kexec_trampoline.S (but not in the above patch)...
> > 
> > -Scott
> > 
> 
> I believe that particular ifdef is to check if we're little-endian when 
> reading
> the device tree, but that's still a good point - I'll check with Anton.

It looks like this ended up going into main kexec, which means I get to find 
some way to distinguish book3s from book3e to avoid that rfid.  Yay.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] perf: Fix build break on powerpc due to sample_reg_masks

2015-10-06 Thread Michael Ellerman
On Wed, 2015-09-30 at 16:45 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 30, 2015 at 09:09:09PM +0200, Jiri Olsa escreveu:
> > On Wed, Sep 30, 2015 at 11:28:36AM -0700, Sukadev Bhattiprolu wrote:
> > > From e29a7236122c4d807ec9ebc721b5d7d75c8d Mon Sep 17 00:00:00 2001
> > > From: Sukadev Bhattiprolu 
> > > Date: Thu, 24 Sep 2015 17:53:49 -0400
> > > Subject: [PATCH v2] perf: Fix build break on powerpc due to 
> > > sample_reg_masks
> > > 
> > > perf_regs.c does not get built on Powerpc as CONFIG_PERF_REGS is false.
> > > So the weak definition for 'sample_regs_masks' doesn't get picked up.
> > > 
> > > Adding perf_regs.o to util/Build unconditionally, exposes a redefinition
> > > error for 'perf_reg_value()' function (due to the static inline version
> > > in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that
> > > function.
> > > 
> > > Signed-off-by: Sukadev Bhattiprolu 
> > 
> > Acked-by: Jiri Olsa 
> 
> Thanks, applied.

Is this going to Linus' tree any time soon?

I have folks pinging me to say that perf is broken on powerpc.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v9 1/4] perf, kvm/{x86, s390}: Remove dependency on uapi/kvm_perf.h

2015-10-06 Thread Hemant Kumar
Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.

Signed-off-by: Hemant Kumar 
---
Changelog:
v8 to v9:
- Removed the macro definitions.
- Changed the access of kvm_entry_trace and kvm_exit_trace
- Removed unnecessary formatting.
v7 to v8:
- Removed unnecessary __unused_parameter modifiers.

 tools/perf/arch/s390/util/kvm-stat.c |  8 +++-
 tools/perf/arch/x86/util/kvm-stat.c  | 14 +++---
 tools/perf/builtin-kvm.c | 32 ++--
 tools/perf/util/kvm-stat.h   |  5 +
 4 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/tools/perf/arch/s390/util/kvm-stat.c 
b/tools/perf/arch/s390/util/kvm-stat.c
index a5dbc07..b85a94b 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -10,7 +10,7 @@
  */
 
 #include "../../util/kvm-stat.h"
-#include 
+#include 
 
 define_exit_reasons_table(sie_exit_reasons, sie_intercept_code);
 define_exit_reasons_table(sie_icpt_insn_codes, icpt_insn_codes);
@@ -18,6 +18,12 @@ define_exit_reasons_table(sie_sigp_order_codes, 
sigp_order_codes);
 define_exit_reasons_table(sie_diagnose_codes, diagnose_codes);
 define_exit_reasons_table(sie_icpt_prog_codes, icpt_prog_codes);
 
+const char *vcpu_id_str = "id";
+const int decode_str_len = 40;
+const char *kvm_exit_reason = "icptcode";
+const char *kvm_entry_trace = "kvm:kvm_s390_sie_enter";
+const char *kvm_exit_trace = "kvm:kvm_s390_sie_exit";
+
 static void event_icpt_insn_get_key(struct perf_evsel *evsel,
struct perf_sample *sample,
struct event_key *key)
diff --git a/tools/perf/arch/x86/util/kvm-stat.c 
b/tools/perf/arch/x86/util/kvm-stat.c
index 14e4e66..babefda 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -1,5 +1,7 @@
 #include "../../util/kvm-stat.h"
-#include 
+#include 
+#include 
+#include 
 
 define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
 define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
@@ -11,6 +13,12 @@ static struct kvm_events_ops exit_events = {
.name = "VM-EXIT"
 };
 
+const char *vcpu_id_str = "vcpu_id";
+const int decode_str_len = 20;
+const char *kvm_exit_reason = "exit_reason";
+const char *kvm_entry_trace = "kvm:kvm_entry";
+const char *kvm_exit_trace = "kvm:kvm_exit";
+
 /*
  * For the mmio events, we treat:
  * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
@@ -65,7 +73,7 @@ static void mmio_event_decode_key(struct perf_kvm_stat *kvm 
__maybe_unused,
  struct event_key *key,
  char *decode)
 {
-   scnprintf(decode, DECODE_STR_LEN, "%#lx:%s",
+   scnprintf(decode, decode_str_len, "%#lx:%s",
  (unsigned long)key->key,
  key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
 }
@@ -109,7 +117,7 @@ static void ioport_event_decode_key(struct perf_kvm_stat 
*kvm __maybe_unused,
struct event_key *key,
char *decode)
 {
-   scnprintf(decode, DECODE_STR_LEN, "%#llx:%s",
+   scnprintf(decode, decode_str_len, "%#llx:%s",
  (unsigned long long)key->key,
  key->info ? "POUT" : "PIN");
 }
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index fc1cffb..5104c7e 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -31,7 +31,6 @@
 #include 
 
 #ifdef HAVE_KVM_STAT_SUPPORT
-#include 
 #include "util/kvm-stat.h"
 
 void exit_event_get_key(struct perf_evsel *evsel,
@@ -39,12 +38,12 @@ void exit_event_get_key(struct perf_evsel *evsel,
struct event_key *key)
 {
key->info = 0;
-   key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON);
+   key->key = perf_evsel__intval(evsel, sample, kvm_exit_reason);
 }
 
 bool kvm_exit_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, KVM_EXIT_TRACE);
+   return !strcmp(evsel->name, kvm_exit_trace);
 }
 
 bool exit_event_begin(struct perf_evsel *evsel,
@@ -60,7 +59,7 @@ bool exit_event_begin(struct perf_evsel *evsel,
 
 bool kvm_entry_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, KVM_ENTRY_TRACE);
+   return !strcmp(evsel->name, kvm_entry_trace);
 }
 
 bool exit_event_end(struct perf_evsel *evsel,
@@ -92,7 +91,7 @@ void exit_event_decode_key(struct perf_kvm_stat *kvm,
const char *exit_reason = get_exit_reason(kvm, key->exit_reasons,
  key->key);
 
-   scnprintf(decode, DECODE_STR_LEN, "%s", exit_reason);
+   scnprintf(decode, decode_str_len, "%s", exit_reason);
 }
 
 static bool register_kvm_events_ops(struct perf_kvm_stat *kvm)
@@ -358,7 +357,12 @@

[PATCH v9 2/4] perf,kvm/{x86,s390}: Remove const from kvm_events_tp

2015-10-06 Thread Hemant Kumar
This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.

Signed-off-by: Hemant Kumar 
---
 tools/perf/arch/s390/util/kvm-stat.c | 2 +-
 tools/perf/arch/x86/util/kvm-stat.c  | 2 +-
 tools/perf/util/kvm-stat.h   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/s390/util/kvm-stat.c 
b/tools/perf/arch/s390/util/kvm-stat.c
index b85a94b..ed57df2 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -79,7 +79,7 @@ static struct kvm_events_ops exit_events = {
.name = "VM-EXIT"
 };
 
-const char * const kvm_events_tp[] = {
+const char *kvm_events_tp[] = {
"kvm:kvm_s390_sie_enter",
"kvm:kvm_s390_sie_exit",
"kvm:kvm_s390_intercept_instruction",
diff --git a/tools/perf/arch/x86/util/kvm-stat.c 
b/tools/perf/arch/x86/util/kvm-stat.c
index babefda..b63d4be 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -129,7 +129,7 @@ static struct kvm_events_ops ioport_events = {
.name = "IO Port Access"
 };
 
-const char * const kvm_events_tp[] = {
+const char *kvm_events_tp[] = {
"kvm:kvm_entry",
"kvm:kvm_exit",
"kvm:kvm_mmio",
diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h
index dd55548..c965dc8 100644
--- a/tools/perf/util/kvm-stat.h
+++ b/tools/perf/util/kvm-stat.h
@@ -133,7 +133,7 @@ bool kvm_entry_event(struct perf_evsel *evsel);
  */
 int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid);
 
-extern const char * const kvm_events_tp[];
+extern const char *kvm_events_tp[];
 extern struct kvm_reg_events_ops kvm_reg_events_ops[];
 extern const char * const kvm_skip_events[];
 extern const char *vcpu_id_str;
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v9 3/4] perf,kvm/powerpc: Port perf kvm stat to powerpc

2015-10-06 Thread Hemant Kumar
perf kvm can be used to analyze guest exit reasons. This support already
exists in x86. Hence, porting it to powerpc.

 - To trace KVM events :
  perf kvm stat record
  If many guests are running, we can track for a specific guest by using
  --pid as in : perf kvm stat record --pid 

 - To see the results :
  perf kvm stat report

The result shows the number of exits (from the guest context to
host/hypervisor context) grouped by their respective exit reasons with
their frequency.

Since, different powerpc machines have different KVM tracepoints, this
patch discovers the available tracepoints dynamically and accordingly
looks for them. If any single tracepoint is not present, this support
won't be enabled for reporting. To record, this will fail if any of the
events we are looking to record isn't available.
Right now, its only supported on PowerPC Book3S_HV architectures.

To analyze the different exits, group them and present them (in a slight
descriptive way) to the user, we need a mapping between the "exit
code" (dumped in the kvm_guest_exit tracepoint data) and to its related
Interrupt vector description (exit reason). This patch adds this mapping
in book3s_hv_exits.h.

It records on two available KVM tracepoints for book3s_hv:
"kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter".

Here is a sample o/p:
 # pgrep qemu
19378
60515

2 Guests are running on the host.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515

Analyze events for pid(s) 60515, all VCPUs:

   VM-EXITSamples  Samples% Time%Min TimeMax Time 
Avg time

   SYSCALL   914163.67% 7.49%  1.26us   5782.39us  
9.87us ( +-   6.46% )
H_DATA_STORAGE   411428.66% 5.07%  1.72us   4597.68us 
14.84us ( +-  20.06% )
HV_DECREMENTER418 2.91% 4.26%  0.70us  30002.22us
122.58us ( +-  70.29% )
  EXTERNAL392 2.73% 0.06%  0.64us104.10us  
1.94us ( +-  18.83% )
RETURN_TO_HOST287 2.00%83.11%  1.53us 124240.15us   
3486.52us ( +-  16.81% )
H_INST_STORAGE  5 0.03% 0.00%  1.88us  3.73us  
2.39us ( +-  14.20% )

Total Samples:14357, Total events handled time:1203918.42us.

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Hemant Kumar 
---
Changelog:
v8 to v9:
- Moved the book3s specific setup into one function.
- Removed the macros (which were being used only once).
- Formatting changes.
v7 to v8:
- Fixed a perf kvm stat live bug.
v6 to v7:
- Removed dependency on uapi.
v4 to v5:
- Removed dependency on arch/powerpc/kvm/trace_book3s.h and added them in
the userspace side.
- No more arch side dependency.
v1 to v3:
- Split the patches for powerpc and perf

 tools/perf/arch/powerpc/Makefile   |   2 +
 tools/perf/arch/powerpc/util/Build |   1 +
 tools/perf/arch/powerpc/util/book3s_hv_exits.h |  33 
 tools/perf/arch/powerpc/util/kvm-stat.c| 100 +
 tools/perf/builtin-kvm.c   |  18 +
 tools/perf/util/kvm-stat.h |   1 +
 6 files changed, 155 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h
 create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 7fbca17..9f9cea3 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -1,3 +1,5 @@
 ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
+
+HAVE_KVM_STAT_SUPPORT := 1
diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 7b8b0d1..c8fe207 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += sym-handling.o
+libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/book3s_hv_exits.h 
b/tools/perf/arch/powerpc/util/book3s_hv_exits.h
new file mode 100644
index 000..e68ba2d
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_hv_exits.h
@@ -0,0 +1,33 @@
+#ifndef ARCH_PERF_BOOK3S_HV_EXITS_H
+#define ARCH_PERF_BOOK3S_HV_EXITS_H
+
+/*
+ * PowerPC Interrupt vectors : exit code to name mapping
+ */
+
+#define kvm_trace_symbol_exit \
+   {0x0,   "RETURN_TO_HOST"}, \
+   {0x100, "SYSTEM_RESET"}, \
+   {0x200, "MACHINE_CHECK"}, \
+   {0x300, "DATA_STORAGE"}, \
+   {0x380, "DATA_SEGMENT"}, \
+   {0x400, "INST_STORAGE"}, \
+   {0x480, "INST_SEGMENT"}, \
+   {0x500, "EXTERNAL"}, \
+   {0x501, "EXTERNAL_LEVEL"}, \
+   {0x502, "EXTERNAL_HV"}, \
+   {0x600, "ALIGNMENT"}, \
+   {0x700, "PROGRAM"}, \
+   {0x800, "FP_UNAVAIL"}, \
+   {0x900, "DECREMENTER"}, \
+   {0x980, "HV_DECREMENTER"}, \
+   

[PATCH v9 4/4] perf,kvm/powerpc: Add support for HCALL reasons

2015-10-06 Thread Hemant Kumar
Powerpc provides hcall events that also provides insights into guest
behaviour. Enhance perf kvm stat to record and analyze hcall events.

 - To trace hcall events :
  perf kvm stat record

 - To show the results :
  perf kvm stat report --event=hcall

The result shows the number of hypervisor calls from the guest grouped
by their respective reasons displayed with the frequency.

This patch makes use of two additional tracepoints
"kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall
codes to their respective names, it needs a mapping. Such mapping is
added in this patch in book3s_hcalls.h.

 # pgrep qemu
A sample output :
19378
60515

2 VMs running.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515 --event=hcall

Analyze events for all VMs, all VCPUs:

HCALL-EVENTSamples  Samples% Time%Min TimeMax Time 
Avg time

  H_IPI82266.08%88.10%  0.63us 11.38us  
2.05us ( +-   1.42% )
 H_SEND_CRQ14411.58% 3.77%  0.41us  0.88us  
0.50us ( +-   1.47% )
   H_VIO_SIGNAL118 9.49% 2.86%  0.37us  0.83us  
0.47us ( +-   1.43% )
H_PUT_TERM_CHAR 76 6.11% 2.07%  0.37us  0.90us  
0.52us ( +-   2.43% )
H_GET_TERM_CHAR 74 5.95% 2.23%  0.37us  1.70us  
0.58us ( +-   4.77% )
 H_RTAS  6 0.48% 0.85%  1.10us  9.25us  
2.70us ( +-  48.57% )
  H_PERFMON  4 0.32% 0.12%  0.41us  0.96us  
0.59us ( +-  20.92% )

Total Samples:1244, Total events handled time:1916.69us.

Signed-off-by: Hemant Kumar 
---
Changelog:
v8 to v9:
- Removed the macros (which were being used only once).
v6 to v7:
- Removed dependency on uapi.
v4 to v5:
- Removed dependency on arch/powerpc/include/asm/hvall.h and added them
in userspace side.
- No more arch side dependency.
v1 to v2:
- Split the patches for powerpc and perf.

 tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++
 tools/perf/arch/powerpc/util/kvm-stat.c  |  65 +-
 2 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h

diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h 
b/tools/perf/arch/powerpc/util/book3s_hcalls.h
new file mode 100644
index 000..0dd6b7f
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h
@@ -0,0 +1,123 @@
+#ifndef ARCH_PERF_BOOK3S_HV_HCALLS_H
+#define ARCH_PERF_BOOK3S_HV_HCALLS_H
+
+/*
+ * PowerPC HCALL codes : hcall code to name mapping
+ */
+#define kvm_trace_symbol_hcall \
+   {0x4, "H_REMOVE"},  \
+   {0x8, "H_ENTER"},   \
+   {0xc, "H_READ"},\
+   {0x10, "H_CLEAR_MOD"},  \
+   {0x14, "H_CLEAR_REF"},  \
+   {0x18, "H_PROTECT"},\
+   {0x1c, "H_GET_TCE"},\
+   {0x20, "H_PUT_TCE"},\
+   {0x24, "H_SET_SPRG0"},  \
+   {0x28, "H_SET_DABR"},   \
+   {0x2c, "H_PAGE_INIT"},  \
+   {0x30, "H_SET_ASR"},\
+   {0x34, "H_ASR_ON"}, \
+   {0x38, "H_ASR_OFF"},\
+   {0x3c, "H_LOGICAL_CI_LOAD"},\
+   {0x40, "H_LOGICAL_CI_STORE"},   \
+   {0x44, "H_LOGICAL_CACHE_LOAD"}, \
+   {0x48, "H_LOGICAL_CACHE_STORE"},\
+   {0x4c, "H_LOGICAL_ICBI"},   \
+   {0x50, "H_LOGICAL_DCBF"},   \
+   {0x54, "H_GET_TERM_CHAR"},  \
+   {0x58, "H_PUT_TERM_CHAR"},  \
+   {0x5c, "H_REAL_TO_LOGICAL"},\
+   {0x60, "H_HYPERVISOR_DATA"},\
+   {0x64, "H_EOI"},\
+   {0x68, "H_CPPR"},   \
+   {0x6c, "H_IPI"},\
+   {0x70, "H_IPOLL"},  \
+   {0x74, "H_XIRR"},   \
+   {0x78, "H_MIGRATE_DMA"},\
+   {0x7c, "H_PERFMON"},\
+   {0xdc, "H_REGISTER_VPA"},   \
+   {0xe0, "H_CEDE"},   \
+   {0xe4, "H_CONFER"},  

Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV

2015-10-06 Thread Michael Ellerman
On Fri, 2015-10-02 at 20:07 +1000, Alexey Kardashevskiy wrote:
> On 08/19/2015 12:01 PM, Wei Yang wrote:
> > In original design, it tries to group VFs to enable more number of VFs in 
> > the
> > system, when VF BAR is bigger than 64MB. This design has a flaw in which one
> > error on a VF will interfere other VFs in the same group.
> >
> > This patch series change this design by using M64 BAR in Single PE mode to
> > cover only one VF BAR. By doing so, it gives absolute isolation between VFs.
> >
> > Wei Yang (6):
> >powerpc/powernv: don't enable SRIOV when VF BAR has non
> >  64bit-prefetchable BAR
> >powerpc/powernv: simplify the calculation of iov resource alignment
> >powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
> >powerpc/powernv: replace the hard coded boundary with gate
> >powerpc/powernv: boundary the total VF BAR size instead of the
> >  individual one
> >powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
> >  mode
> >
> >   arch/powerpc/include/asm/pci-bridge.h |7 +-
> >   arch/powerpc/platforms/powernv/pci-ioda.c |  328 
> > +++--
> >   2 files changed, 175 insertions(+), 160 deletions(-)
> 
> I have posted few comments but in general the patchset makes things simpler 
> by removing a compound PE and does not seem to make things worse so:
> 
> Acked-by: Alexey Kardashevskiy 

Thanks for reviewing it.

I'll wait for a v5 that incorporates your comments.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset

2015-10-06 Thread Gavin Shan
On PowerNV platform, the PE is kept in frozen state until the PE
reset is completed to avoid recursive EEH error caused by MMIO
access during the period of EEH reset. The PE's frozen state is
cleared after BARs of PCI device included in the PE are restored
and enabled. However, we needn't clear the frozen state for PHB PE
explicitly at this point as there is no real PE for PHB PE. As the
PHB PE is always binding with PE#0, we actually clear PE#0, which
is wrong. It doesn't incur any problem though.

This checks if the PE is PHB PE and doesn't clear the frozen state
if it is.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_driver.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..3a626ed 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus)
eeh_ops->configure_bridge(pe);
eeh_pe_restore_bars(pe);
 
-   /* Clear frozen state */
-   rc = eeh_clear_pe_frozen_state(pe, false);
-   if (rc)
-   return rc;
+   /*
+* If it's PHB PE, the frozen state on all available PEs should have
+* been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
+* child PEs because they might be in frozen state.
+*/
+   if (!(pe->type & EEH_PE_PHB)) {
+   rc = eeh_clear_pe_frozen_state(pe, false);
+   if (rc)
+   return rc;
+   }
 
/* Give the system 5 seconds to finish running the user-space
 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/5] powerpc/eeh: Force reset on fenced PHB

2015-10-06 Thread Gavin Shan
On fenced PHB, the error handlers in the drivers of its subordinate
devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset
will be issued during the recovery. It's conflicting with the fact
that fenced PHB won't be recovered without reset.

This limits the return value from the error handlers in the drivers
of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE
or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during
recovery.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_driver.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 32178a4..76d918b 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -664,9 +664,17 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 * to accomplish the reset.  Each child gets a report of the
 * status ... if any child can't handle the reset, then the entire
 * slot is dlpar removed and added.
+*
+* When the PHB is fenced, we have to issue a reset to recover from
+* the error. Override the result if necessary to have partially
+* hotplug for this case.
 */
pr_info("EEH: Notify device drivers to shutdown\n");
eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+   if ((pe->type & EEH_PE_PHB) &&
+   result != PCI_ERS_RESULT_NEED_NONE &&
+   result != PCI_ERS_RESULT_NEED_RESET)
+   result = PCI_ERS_RESULT_NEED_RESET;
 
/* Get the current PCI slot state. This can take a long time,
 * sometimes over 300 seconds for certain systems.
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state()

2015-10-06 Thread Gavin Shan
This cleans up pseries_eeh_get_state(), no functional changes:

   * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output
 argument is zero to avoid nested if statements.
   * Skip clearing bits in the PE state represented by variable
 "result" to simplify the code.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 60 
 1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1ba55d0..ac3ffd9 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -433,42 +433,34 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int 
*state)
return ret;
 
/* Parse the result out */
-   result = 0;
-   if (rets[1]) {
-   switch(rets[0]) {
-   case 0:
-   result &= ~EEH_STATE_RESET_ACTIVE;
-   result |= EEH_STATE_MMIO_ACTIVE;
-   result |= EEH_STATE_DMA_ACTIVE;
-   break;
-   case 1:
-   result |= EEH_STATE_RESET_ACTIVE;
-   result |= EEH_STATE_MMIO_ACTIVE;
-   result |= EEH_STATE_DMA_ACTIVE;
-   break;
-   case 2:
-   result &= ~EEH_STATE_RESET_ACTIVE;
-   result &= ~EEH_STATE_MMIO_ACTIVE;
-   result &= ~EEH_STATE_DMA_ACTIVE;
-   break;
-   case 4:
-   result &= ~EEH_STATE_RESET_ACTIVE;
-   result &= ~EEH_STATE_MMIO_ACTIVE;
-   result &= ~EEH_STATE_DMA_ACTIVE;
-   result |= EEH_STATE_MMIO_ENABLED;
-   break;
-   case 5:
-   if (rets[2]) {
-   if (state) *state = rets[2];
-   result = EEH_STATE_UNAVAILABLE;
-   } else {
-   result = EEH_STATE_NOT_SUPPORT;
-   }
-   break;
-   default:
+   if (!rets[1])
+   return EEH_STATE_NOT_SUPPORT;
+
+   switch(rets[0]) {
+   case 0:
+   result = EEH_STATE_MMIO_ACTIVE |
+EEH_STATE_DMA_ACTIVE;
+   break;
+   case 1:
+   result = EEH_STATE_RESET_ACTIVE |
+EEH_STATE_MMIO_ACTIVE  |
+EEH_STATE_DMA_ACTIVE;
+   break;
+   case 2:
+   result = 0;
+   break;
+   case 4:
+   result = EEH_STATE_MMIO_ENABLED;
+   break;
+   case 5:
+   if (rets[2]) {
+   if (state) *state = rets[2];
+   result = EEH_STATE_UNAVAILABLE;
+   } else {
result = EEH_STATE_NOT_SUPPORT;
}
-   } else {
+   break;
+   default:
result = EEH_STATE_NOT_SUPPORT;
}
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path

2015-10-06 Thread Gavin Shan
When one of below two flags or both of them are marked in the PE
state, the PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE
or EEH_STATE_MMIO_ENABLED.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e968533..ddbf406 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -630,7 +630,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 */
switch (function) {
case EEH_OPT_THAW_MMIO:
-   active_flag = EEH_STATE_MMIO_ACTIVE;
+   active_flag = EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED;
break;
case EEH_OPT_THAW_DMA:
active_flag = EEH_STATE_DMA_ACTIVE;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] powerpc/eeh: More relexed hotplug criterion

2015-10-06 Thread Gavin Shan
Currently, we rely on the existence of struct pci_driver::err_handler
to judge if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However, it's not elaborate.
some device drivers are implementing part of the EEH error handlers
to collect diag-data. That means the driver still expects a hotplug
to recover from the EEH error.

This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_driver.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3a626ed..32178a4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
driver = eeh_pcid_get(dev);
if (driver) {
eeh_pcid_put(dev);
-   if (driver->err_handler)
+   if (driver->err_handler &&
+   driver->err_handler->error_detected &&
+   driver->err_handler->slot_reset &&
+   driver->err_handler->resume)
return NULL;
}
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread Ian Munsie
Excerpts from Michael Ellerman's message of 2015-10-06 17:19:02 +1100:
> On Fri, 2015-10-02 at 16:01 +0200, Christophe Lombard wrote:
> > This moves the initialisation of the num_procs to before the SPA
> > allocation.
> 
> Why? What does it fix? I can't tell from the diff or the change log.

This will mean we only ever allocate a fixed number of pages for the
scheduled process area (which in itself looks like it has a minor bug as
it will start trying at two pages instead of one), which will limit us
to 958 processes with 2 x 64K pages. If we actually try to use more
processes than that we'd probably overrun the buffer and corrupt memory
or crash.

The only reason we haven't hit this out in the field so far is any AFUs
the requires at least three interrupts per process is already limited to
less processes than that anyway (e.g. min of 4 interrupts limits it to
509 processes, and all the AFUs I'm aware of require at least that many
interrupts), but we could hit it on an AFU that requires 0, 1 or 2
interrupts per process, or when using 4K pages.

This fix should go to stable.

@Christophe, can you resend with this info in the commit message?

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 00/18] powerpc/fsl-book3e-64: kexec/kdump support

2015-10-06 Thread Scott Wood
This patchset adds support for kexec and kdump to e5500 and e6500 based
systems running 64-bit kernels.  It depends on the kexec-tools patch
http://patchwork.ozlabs.org/patch/527050/ ("ppc64: Add a flag to tell the
kernel it's booting from kexec").

Scott Wood (12):
  powerpc/fsl-booke-64: Allow booting from the secondary thread
  powerpc/fsl-corenet: Disable coreint if kexec is enabled
  powerpc/85xx: Don't use generic timebase sync on 64-bit
  powerpc/fsl_pci: Don't set up inbound windows in kdump crash kernel
  powerpc/85xx: Load all early TLB entries at once
  powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry
  powerpc/e6500: kexec: Handle hardware threads
  powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
  powerpc/book3e-64: Don't limit paca to 256 MiB
  powerpc/book3e-64/kexec: Enable SMP release
  powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
  powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop

Tiejun Chen (6):
  powerpc/85xx: Implement 64-bit kexec support
  powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
  powerpc/booke64: Fix args to copy_and_flush
  powerpc/book3e: support CONFIG_RELOCATABLE
  powerpc/book3e-64/kexec: create an identity TLB mapping
  powerpc/book3e-64: Enable kexec

 Documentation/devicetree/bindings/chosen.txt  |  8 +++
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/include/asm/exception-64e.h  |  4 +-
 arch/powerpc/include/asm/page.h   |  7 ++-
 arch/powerpc/kernel/crash.c   |  6 +-
 arch/powerpc/kernel/exceptions-64e.S  | 17 --
 arch/powerpc/kernel/head_64.S | 43 --
 arch/powerpc/kernel/machine_kexec_64.c| 18 ++
 arch/powerpc/kernel/misc_64.S | 60 ++-
 arch/powerpc/kernel/paca.c|  6 +-
 arch/powerpc/kernel/setup_64.c| 25 +++-
 arch/powerpc/mm/fsl_booke_mmu.c   | 35 ---
 arch/powerpc/mm/mmu_decl.h|  4 +-
 arch/powerpc/mm/tlb_nohash.c  | 41 ++---
 arch/powerpc/mm/tlb_nohash_low.S  | 63 
 arch/powerpc/platforms/85xx/corenet_generic.c |  4 ++
 arch/powerpc/platforms/85xx/smp.c | 86 ---
 arch/powerpc/sysdev/fsl_pci.c | 84 +++---
 18 files changed, 443 insertions(+), 70 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 01/18] powerpc/fsl-booke-64: Allow booting from the secondary thread

2015-10-06 Thread Scott Wood
This allows SMP kernels to work as kdump crash kernels.  While crash
kernels don't really need to be SMP, this prevents things from breaking
if a user does it anyway (which is not something you want to only find
out once the main kernel has crashed in the field, especially if
whether it works or not depends on which cpu crashed).

Signed-off-by: Scott Wood 
---
 arch/powerpc/platforms/85xx/smp.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index b8b8216..c2ded03 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -173,15 +173,22 @@ static inline u32 read_spin_table_addr_l(void *spin_table)
 static void wake_hw_thread(void *info)
 {
void fsl_secondary_thread_init(void);
-   unsigned long imsr1, inia1;
+   unsigned long imsr, inia;
int nr = *(const int *)info;
 
-   imsr1 = MSR_KERNEL;
-   inia1 = *(unsigned long *)fsl_secondary_thread_init;
-
-   mttmr(TMRN_IMSR1, imsr1);
-   mttmr(TMRN_INIA1, inia1);
-   mtspr(SPRN_TENS, TEN_THREAD(1));
+   imsr = MSR_KERNEL;
+   inia = *(unsigned long *)fsl_secondary_thread_init;
+
+   if (cpu_thread_in_core(nr) == 0) {
+   /* For when we boot on a secondary thread with kdump */
+   mttmr(TMRN_IMSR0, imsr);
+   mttmr(TMRN_INIA0, inia);
+   mtspr(SPRN_TENS, TEN_THREAD(0));
+   } else {
+   mttmr(TMRN_IMSR1, imsr);
+   mttmr(TMRN_INIA1, inia);
+   mtspr(SPRN_TENS, TEN_THREAD(1));
+   }
 
smp_generic_kick_cpu(nr);
 }
@@ -224,6 +231,12 @@ static int smp_85xx_kick_cpu(int nr)
 
smp_call_function_single(primary, wake_hw_thread, &nr, 0);
return 0;
+   } else if (cpu_thread_in_core(boot_cpuid) != 0 &&
+  cpu_first_thread_sibling(boot_cpuid) == nr) {
+   if (WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_SMT)))
+   return -ENOENT;
+
+   smp_call_function_single(boot_cpuid, wake_hw_thread, &nr, 0);
}
 #endif
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 02/18] powerpc/fsl-corenet: Disable coreint if kexec is enabled

2015-10-06 Thread Scott Wood
Problems have been observed in coreint (EPR) mode if interrupts are
left pending (due to the lack of device quiescence with kdump) after
having tried to deliver to a CPU but unable to deliver due to MSR[EE]
-- interrupts no longer get reliably delivered in the new kernel.  I
tried various ways of fixing it up inside the crash kernel itself, and
none worked (including resetting the entire mpic).  Masking all
interrupts and issuing EOIs in the crashing kernel did help a lot of
the time, but the behavior was not consistent.

Thus, stick to standard IACK mode when kdump is a possibility.

Signed-off-by: Scott Wood 
---
Previously I discussed the possibility of removing coreint entirely,
but I think we want to keep it for virtualized guests.
---
 arch/powerpc/platforms/85xx/corenet_generic.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index b395571..04ffbcb 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -214,7 +214,11 @@ define_machine(corenet_generic) {
.pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
.pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
 #endif
+#ifdef CONFIG_KEXEC
+   .get_irq= mpic_get_irq,
+#else
.get_irq= mpic_get_coreint_irq,
+#endif
.restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 03/18] powerpc/85xx: Don't use generic timebase sync on 64-bit

2015-10-06 Thread Scott Wood
85xx currently uses the generic timebase sync mechanism when
CONFIG_KEXEC is enabled, because 32-bit 85xx kexec support does a hard
reset of each core.  64-bit 85xx kexec does not do this, so we neither
need nor want this (nor is the generic timebase sync code built on
ppc64).

FWIW, I don't like the fact that the hard reset is done on 32-bit
kexec, and I especially don't like the timebase sync being triggered
only on the presence of CONFIG_KEXEC rather than actually booting in
that environment, but that's beyond the scope of this patch...

Signed-off-by: Scott Wood 
---
 arch/powerpc/platforms/85xx/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index c2ded03..a0763be 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -344,7 +344,7 @@ struct smp_ops_t smp_85xx_ops = {
.cpu_disable= generic_cpu_disable,
.cpu_die= generic_cpu_die,
 #endif
-#ifdef CONFIG_KEXEC
+#if defined(CONFIG_KEXEC) && !defined(CONFIG_PPC64)
.give_timebase  = smp_generic_give_timebase,
.take_timebase  = smp_generic_take_timebase,
 #endif
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 04/18] powerpc/fsl_pci: Don't set up inbound windows in kdump crash kernel

2015-10-06 Thread Scott Wood
Otherwise, because the top end of the crash kernel is treated as the
absolute top of memory rather than the beginning of a reserved region,
in-flight DMA from the previous kernel that targets areas above the
crash kernel can trigger a storm of PCI errors.  We only do this for
kdump, not normal kexec, in case kexec is being used to upgrade to a
kernel that wants a different inbound memory map.

Signed-off-by: Scott Wood 
Cc: Mingkai Hu 
---
v2: new patch

 arch/powerpc/sysdev/fsl_pci.c | 84 +++
 1 file changed, 61 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index ebc1f412..98d671c 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -179,6 +179,19 @@ static int setup_one_atmu(struct ccsr_pci __iomem *pci,
return i;
 }
 
+static bool is_kdump(void)
+{
+   struct device_node *node;
+
+   node = of_find_node_by_type(NULL, "memory");
+   if (!node) {
+   WARN_ON_ONCE(1);
+   return false;
+   }
+
+   return of_property_read_bool(node, "linux,usable-memory");
+}
+
 /* atmu setup for fsl pci/pcie controller */
 static void setup_pci_atmu(struct pci_controller *hose)
 {
@@ -192,6 +205,16 @@ static void setup_pci_atmu(struct pci_controller *hose)
const char *name = hose->dn->full_name;
const u64 *reg;
int len;
+   bool setup_inbound;
+
+   /*
+* If this is kdump, we don't want to trigger a bunch of PCI
+* errors by closing the window on in-flight DMA.
+*
+* We still run most of the function's logic so that things like
+* hose->dma_window_size still get set.
+*/
+   setup_inbound = !is_kdump();
 
if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
if (in_be32(&pci->block_rev1) >= PCIE_IP_REV_2_2) {
@@ -204,8 +227,11 @@ static void setup_pci_atmu(struct pci_controller *hose)
/* Disable all windows (except powar0 since it's ignored) */
for(i = 1; i < 5; i++)
out_be32(&pci->pow[i].powar, 0);
-   for (i = start_idx; i < end_idx; i++)
-   out_be32(&pci->piw[i].piwar, 0);
+
+   if (setup_inbound) {
+   for (i = start_idx; i < end_idx; i++)
+   out_be32(&pci->piw[i].piwar, 0);
+   }
 
/* Setup outbound MEM window */
for(i = 0, j = 1; i < 3; i++) {
@@ -278,6 +304,7 @@ static void setup_pci_atmu(struct pci_controller *hose)
 
/* Setup inbound mem window */
mem = memblock_end_of_DRAM();
+   pr_info("%s: end of DRAM %llx\n", __func__, mem);
 
/*
 * The msi-address-64 property, if it exists, indicates the physical
@@ -320,12 +347,14 @@ static void setup_pci_atmu(struct pci_controller *hose)
 
piwar |= ((mem_log - 1) & PIWAR_SZ_MASK);
 
-   /* Setup inbound memory window */
-   out_be32(&pci->piw[win_idx].pitar,  0x);
-   out_be32(&pci->piw[win_idx].piwbar, 0x);
-   out_be32(&pci->piw[win_idx].piwar,  piwar);
-   win_idx--;
+   if (setup_inbound) {
+   /* Setup inbound memory window */
+   out_be32(&pci->piw[win_idx].pitar,  0x);
+   out_be32(&pci->piw[win_idx].piwbar, 0x);
+   out_be32(&pci->piw[win_idx].piwar,  piwar);
+   }
 
+   win_idx--;
hose->dma_window_base_cur = 0x;
hose->dma_window_size = (resource_size_t)sz;
 
@@ -343,13 +372,15 @@ static void setup_pci_atmu(struct pci_controller *hose)
 
piwar = (piwar & ~PIWAR_SZ_MASK) | (mem_log - 1);
 
-   /* Setup inbound memory window */
-   out_be32(&pci->piw[win_idx].pitar,  0x);
-   out_be32(&pci->piw[win_idx].piwbear,
-   pci64_dma_offset >> 44);
-   out_be32(&pci->piw[win_idx].piwbar,
-   pci64_dma_offset >> 12);
-   out_be32(&pci->piw[win_idx].piwar,  piwar);
+   if (setup_inbound) {
+   /* Setup inbound memory window */
+   out_be32(&pci->piw[win_idx].pitar,  0x);
+   out_be32(&pci->piw[win_idx].piwbear,
+   pci64_dma_offset >> 44);
+   out_be32(&pci->piw[win_idx].piwbar,
+   pci64_dma_offset >> 12);
+   out_be32(&pci->piw[win_idx].piwar,  piwar);
+   }
 
/*
 * install our own dma_set_mask handler to fixup dma_ops
@@ -362,12 +393,15 @@ static void setup_pci_atmu(stru

[PATCH v2 05/18] powerpc/85xx: Load all early TLB entries at once

2015-10-06 Thread Scott Wood
Use an AS=1 trampoline TLB entry to allow all normal TLB1 entries to
be loaded at once.  This avoids the need to keep the translation that
code is executing from in the same TLB entry in the final TLB
configuration as during early boot, which in turn is helpful for
relocatable kernels (e.g. kdump) where the kernel is not running from
what would be the first TLB entry.

On e6500, we limit map_mem_in_cams() to the primary hwthread of a
core (the boot cpu is always considered primary, as a kdump kernel
can be entered on any cpu).  Each TLB only needs to be set up once,
and when we do, we don't want another thread to be running when we
create a temporary trampoline TLB1 entry.

Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/setup_64.c   |  8 +
 arch/powerpc/mm/fsl_booke_mmu.c  | 15 --
 arch/powerpc/mm/mmu_decl.h   |  1 +
 arch/powerpc/mm/tlb_nohash.c | 19 +++-
 arch/powerpc/mm/tlb_nohash_low.S | 63 
 5 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bdcbb71..505ec2c 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -108,6 +108,14 @@ static void setup_tlb_core_data(void)
for_each_possible_cpu(cpu) {
int first = cpu_first_thread_sibling(cpu);
 
+   /*
+* If we boot via kdump on a non-primary thread,
+* make sure we point at the thread that actually
+* set up this TLB.
+*/
+   if (cpu_first_thread_sibling(boot_cpuid) == first)
+   first = boot_cpuid;
+
paca[cpu].tcd_ptr = &paca[first].tcd;
 
/*
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 354ba3c..36d3c55 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -105,8 +105,9 @@ unsigned long p_mapped_by_tlbcam(phys_addr_t pa)
  * an unsigned long (for example, 32-bit implementations cannot support a 4GB
  * size).
  */
-static void settlbcam(int index, unsigned long virt, phys_addr_t phys,
-   unsigned long size, unsigned long flags, unsigned int pid)
+static void preptlbcam(int index, unsigned long virt, phys_addr_t phys,
+  unsigned long size, unsigned long flags,
+  unsigned int pid)
 {
unsigned int tsize;
 
@@ -141,7 +142,13 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
tlbcam_addrs[index].start = virt;
tlbcam_addrs[index].limit = virt + size - 1;
tlbcam_addrs[index].phys = phys;
+}
 
+void settlbcam(int index, unsigned long virt, phys_addr_t phys,
+  unsigned long size, unsigned long flags,
+  unsigned int pid)
+{
+   preptlbcam(index, virt, phys, size, flags, pid);
loadcam_entry(index);
 }
 
@@ -181,13 +188,15 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
unsigned long cam_sz;
 
cam_sz = calc_cam_sz(ram, virt, phys);
-   settlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0);
+   preptlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0);
 
ram -= cam_sz;
amount_mapped += cam_sz;
virt += cam_sz;
phys += cam_sz;
}
+
+   loadcam_multi(0, i, max_cam_idx);
tlbcam_index = i;
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 085b66b..27c3a2d 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -152,6 +152,7 @@ extern int switch_to_as1(void);
 extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu);
 #endif
 extern void loadcam_entry(unsigned int index);
+extern void loadcam_multi(int first_idx, int num, int tmp_idx);
 
 struct tlbcam {
u32 MAS0;
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 723a099..a7381fb 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -628,10 +629,26 @@ static void early_init_this_mmu(void)
 #ifdef CONFIG_PPC_FSL_BOOK3E
if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) {
unsigned int num_cams;
+   int __maybe_unused cpu = smp_processor_id();
+   bool map = true;
 
/* use a quarter of the TLBCAM for bolted linear map */
num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4;
-   linear_map_top = map_mem_in_cams(linear_map_top, num_cams);
+
+   /*
+* Only do the mapping once per core, or else the
+* transient mapping would cause problems.
+*/
+#ifdef CONFIG_SMP
+   if (cpu != boot_cpuid &&
+   

[PATCH v2 06/18] powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry

2015-10-06 Thread Scott Wood
This is required for kdump to work when loaded at at an address that
does not fall within the first TLB entry -- which can easily happen
because while the lower limit is enforced via reserved memory, which
doesn't affect how much is mapped, the upper limit is enforced via a
different mechanism that does.  Thus, more TLB entries are needed than
would normally be used, as the total memory to be mapped might not be a
power of two.

Signed-off-by: Scott Wood 
---
 arch/powerpc/mm/fsl_booke_mmu.c | 22 +++---
 arch/powerpc/mm/mmu_decl.h  |  3 ++-
 arch/powerpc/mm/tlb_nohash.c| 24 +---
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 36d3c55..5eef7d7 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -178,7 +178,8 @@ unsigned long calc_cam_sz(unsigned long ram, unsigned long 
virt,
 }
 
 static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt,
-   unsigned long ram, int max_cam_idx)
+   unsigned long ram, int max_cam_idx,
+   bool dryrun)
 {
int i;
unsigned long amount_mapped = 0;
@@ -188,7 +189,9 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, 
unsigned long virt,
unsigned long cam_sz;
 
cam_sz = calc_cam_sz(ram, virt, phys);
-   preptlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0);
+   if (!dryrun)
+   preptlbcam(i, virt, phys, cam_sz,
+  pgprot_val(PAGE_KERNEL_X), 0);
 
ram -= cam_sz;
amount_mapped += cam_sz;
@@ -196,6 +199,9 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, 
unsigned long virt,
phys += cam_sz;
}
 
+   if (dryrun)
+   return amount_mapped;
+
loadcam_multi(0, i, max_cam_idx);
tlbcam_index = i;
 
@@ -208,12 +214,12 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
return amount_mapped;
 }
 
-unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx)
+unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, bool dryrun)
 {
unsigned long virt = PAGE_OFFSET;
phys_addr_t phys = memstart_addr;
 
-   return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx);
+   return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx, dryrun);
 }
 
 #ifdef CONFIG_PPC32
@@ -244,7 +250,7 @@ void __init adjust_total_lowmem(void)
ram = min((phys_addr_t)__max_low_memory, (phys_addr_t)total_lowmem);
 
i = switch_to_as1();
-   __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM);
+   __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, false);
restore_to_as0(i, 0, 0, 1);
 
pr_info("Memory CAM mapping: ");
@@ -312,10 +318,12 @@ notrace void __init relocate_init(u64 dt_ptr, phys_addr_t 
start)
n = switch_to_as1();
/* map a 64M area for the second relocation */
if (memstart_addr > start)
-   map_mem_in_cams(0x400, CONFIG_LOWMEM_CAM_NUM);
+   map_mem_in_cams(0x400, CONFIG_LOWMEM_CAM_NUM,
+   false);
else
map_mem_in_cams_addr(start, PAGE_OFFSET + offset,
-   0x400, CONFIG_LOWMEM_CAM_NUM);
+   0x400, CONFIG_LOWMEM_CAM_NUM,
+   false);
restore_to_as0(n, offset, __va(dt_ptr), 1);
/* We should never reach here */
panic("Relocation error");
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 27c3a2d..9f58ff4 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -141,7 +141,8 @@ extern void MMU_init_hw(void);
 extern unsigned long mmu_mapin_ram(unsigned long top);
 
 #elif defined(CONFIG_PPC_FSL_BOOK3E)
-extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx);
+extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx,
+bool dryrun);
 extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt,
 phys_addr_t phys);
 #ifdef CONFIG_PPC32
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index a7381fb..bb04e4d 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -648,7 +648,7 @@ static void early_init_this_mmu(void)
 
if (map)
linear_map_top = map_mem_in_cams(linear_map_top,
-num_cams);
+

[PATCH v2 07/18] powerpc/85xx: Implement 64-bit kexec support

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

Unlike 32-bit 85xx kexec, we don't do a core reset.

Signed-off-by: Tiejun Chen 
[scottwood: edit changelog, and cleanup]
Signed-off-by: Scott Wood 
---
 arch/powerpc/platforms/85xx/smp.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index a0763be..2e46684 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -351,6 +351,7 @@ struct smp_ops_t smp_85xx_ops = {
 };
 
 #ifdef CONFIG_KEXEC
+#ifdef CONFIG_PPC32
 atomic_t kexec_down_cpus = ATOMIC_INIT(0);
 
 void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary)
@@ -370,9 +371,18 @@ static void mpc85xx_smp_kexec_down(void *arg)
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0,1);
 }
+#else
+void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary)
+{
+   local_irq_disable();
+   hard_irq_disable();
+   mpic_teardown_this_cpu(secondary);
+}
+#endif
 
 static void mpc85xx_smp_machine_kexec(struct kimage *image)
 {
+#ifdef CONFIG_PPC32
int timeout = INT_MAX;
int i, num_cpus = num_present_cpus();
 
@@ -393,6 +403,7 @@ static void mpc85xx_smp_machine_kexec(struct kimage *image)
if ( i == smp_processor_id() ) continue;
mpic_reset_core(i);
}
+#endif
 
default_machine_kexec(image);
 }
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 08/18] powerpc/e6500: kexec: Handle hardware threads

2015-10-06 Thread Scott Wood
The new kernel will be expecting secondary threads to be disabled,
not spinning.

Signed-off-by: Scott Wood 
---
v2: minor cleanup

 arch/powerpc/kernel/head_64.S | 16 ++
 arch/powerpc/platforms/85xx/smp.c | 46 +++
 2 files changed, 62 insertions(+)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index d48125d..8b2bf0d 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -182,6 +182,8 @@ exception_marker:
 
 #ifdef CONFIG_PPC_BOOK3E
 _GLOBAL(fsl_secondary_thread_init)
+   mfspr   r4,SPRN_BUCSR
+
/* Enable branch prediction */
lis r3,BUCSR_INIT@h
ori r3,r3,BUCSR_INIT@l
@@ -196,10 +198,24 @@ _GLOBAL(fsl_secondary_thread_init)
 * number.  There are two threads per core, so shift everything
 * but the low bit right by two bits so that the cpu numbering is
 * continuous.
+*
+* If the old value of BUCSR is non-zero, this thread has run
+* before.  Thus, we assume we are coming from kexec or a similar
+* scenario, and PIR is already set to the correct value.  This
+* is a bit of a hack, but there are limited opportunities for
+* getting information into the thread and the alternatives
+* seemed like they'd be overkill.  We can't tell just by looking
+* at the old PIR value which state it's in, since the same value
+* could be valid for one thread out of reset and for a different
+* thread in Linux.
 */
+
mfspr   r3, SPRN_PIR
+   cmpwi   r4,0
+   bne 1f
rlwimi  r3, r3, 30, 2, 30
mtspr   SPRN_PIR, r3
+1:
 #endif
 
 _GLOBAL(generic_secondary_thread_init)
diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index 2e46684..712764f 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -374,9 +374,55 @@ static void mpc85xx_smp_kexec_down(void *arg)
 #else
 void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary)
 {
+   int cpu = smp_processor_id();
+   int sibling = cpu_last_thread_sibling(cpu);
+   bool notified = false;
+   int disable_cpu;
+   int disable_threadbit = 0;
+   long start = mftb();
+   long now;
+
local_irq_disable();
hard_irq_disable();
mpic_teardown_this_cpu(secondary);
+
+   if (cpu == crashing_cpu && cpu_thread_in_core(cpu) != 0) {
+   /*
+* We enter the crash kernel on whatever cpu crashed,
+* even if it's a secondary thread.  If that's the case,
+* disable the corresponding primary thread.
+*/
+   disable_threadbit = 1;
+   disable_cpu = cpu_first_thread_sibling(cpu);
+   } else if (sibling != crashing_cpu &&
+  cpu_thread_in_core(cpu) == 0 &&
+  cpu_thread_in_core(sibling) != 0) {
+   disable_threadbit = 2;
+   disable_cpu = sibling;
+   }
+
+   if (disable_threadbit) {
+   while (paca[disable_cpu].kexec_state < KEXEC_STATE_REAL_MODE) {
+   barrier();
+   now = mftb();
+   if (!notified && now - start > 100) {
+   pr_info("%s/%d: waiting for cpu %d to enter 
KEXEC_STATE_REAL_MODE (%d)\n",
+   __func__, smp_processor_id(),
+   disable_cpu,
+   paca[disable_cpu].kexec_state);
+   notified = true;
+   }
+   }
+
+   if (notified) {
+   pr_info("%s: cpu %d done waiting\n",
+   __func__, disable_cpu);
+   }
+
+   mtspr(SPRN_TENC, disable_threadbit);
+   while (mfspr(SPRN_TENSR) & disable_threadbit)
+   cpu_relax();
+   }
 }
 #endif
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 09/18] powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

Rename 'interrupt_end_book3e' to '__end_interrupts' so that the symbol
can be used by both book3s and book3e.

Signed-off-by: Tiejun Chen 
[scottwood: edit changelog]
Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/exceptions-64e.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index f3bd5e7..9d4a006 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -542,8 +542,8 @@ interrupt_base_book3e:  
/* fake trap */
EXCEPTION_STUB(0x320, ehpriv)
EXCEPTION_STUB(0x340, lrat_error)
 
-   .globl interrupt_end_book3e
-interrupt_end_book3e:
+   .globl __end_interrupts
+__end_interrupts:
 
 /* Critical Input Interrupt */
START_EXCEPTION(critical_input);
@@ -736,7 +736,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
beq+1f
 
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
-   LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e)
+   LOAD_REG_IMMEDIATE(r15,__end_interrupts)
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
blt+cr0,1f
@@ -800,7 +800,7 @@ kernel_dbg_exc:
beq+1f
 
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
-   LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e)
+   LOAD_REG_IMMEDIATE(r15,__end_interrupts)
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
blt+cr0,1f
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 10/18] powerpc/booke64: Fix args to copy_and_flush

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

Convert r4/r5, not r6, to a virtual address when calling
copy_and_flush.  Otherwise, r3 is already virtual, and copy_to_flush
tries to access r3+r6, PAGE_OFFSET gets added twice.

This isn't normally seen because on book3e we normally enter with
the kernel at zero and thus skip copy_to_flush -- but it will be
needed for kexec support.

Signed-off-by: Tiejun Chen 
[scottwood: split patch and rewrote changelog]
Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/head_64.S | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 8b2bf0d..a1e85ca 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -474,15 +474,15 @@ __after_prom_start:
  */
li  r3,0/* target addr */
 #ifdef CONFIG_PPC_BOOK3E
-   tovirt(r3,r3)   /* on booke, we already run at 
PAGE_OFFSET */
+   tovirt(r3,r3)   /* on booke, we already run at PAGE_OFFSET */
 #endif
mr. r4,r26  /* In some cases the loader may  */
+#if defined(CONFIG_PPC_BOOK3E)
+   tovirt(r4,r4)
+#endif
beq 9f  /* have already put us at zero */
li  r6,0x100/* Start offset, the first 0x100 */
/* bytes were copied earlier.*/
-#ifdef CONFIG_PPC_BOOK3E
-   tovirt(r6,r6)   /* on booke, we already run at 
PAGE_OFFSET */
-#endif
 
 #ifdef CONFIG_RELOCATABLE
 /*
@@ -514,6 +514,9 @@ __after_prom_start:
 p_end: .llong  _end - _stext
 
 4: /* Now copy the rest of the kernel up to _end */
+#if defined(CONFIG_PPC_BOOK3E)
+   tovirt(r26,r26)
+#endif
addis   r5,r26,(p_end - _stext)@ha
ld  r5,(p_end - _stext)@l(r5)   /* get _end */
 5: bl  copy_and_flush  /* copy the rest */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 11/18] powerpc/book3e: support CONFIG_RELOCATABLE

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

book3e is different with book3s since 3s includes the exception
vectors code in head_64.S as it relies on absolute addressing
which is only possible within this compilation unit. So we have
to get that label address with got.

And when boot a relocated kernel, we should reset ipvr properly again
after .relocate.

Signed-off-by: Tiejun Chen 
[scottwood: cleanup and ifdef removal]
Signed-off-by: Scott Wood 
---
 arch/powerpc/include/asm/exception-64e.h |  4 ++--
 arch/powerpc/kernel/exceptions-64e.S |  9 +++--
 arch/powerpc/kernel/head_64.S| 22 +++---
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index a8b52b6..344fc43 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -204,8 +204,8 @@ exc_##label##_book3e:
 #endif
 
 #define SET_IVOR(vector_number, vector_offset) \
-   li  r3,vector_offset@l; \
-   ori r3,r3,interrupt_base_book3e@l;  \
+   LOAD_REG_ADDR(r3,interrupt_base_book3e);\
+   ori r3,r3,vector_offset@l;  \
mtspr   SPRN_IVOR##vector_number,r3;
 
 #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 9d4a006..488e631 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1351,7 +1351,10 @@ skpinv:  addir6,r6,1 /* 
Increment */
  * r4 = MAS0 w/TLBSEL & ESEL for the temp mapping
  */
/* Now we branch the new virtual address mapped by this entry */
-   LOAD_REG_IMMEDIATE(r6,2f)
+   bl  1f  /* Find our address */
+1: mflrr6
+   addir6,r6,(2f - 1b)
+   tovirt(r6,r6)
lis r7,MSR_KERNEL@h
ori r7,r7,MSR_KERNEL@l
mtspr   SPRN_SRR0,r6
@@ -1583,9 +1586,11 @@ _GLOBAL(book3e_secondary_thread_init)
mflrr28
b   3b
 
+   .globl init_core_book3e
 init_core_book3e:
/* Establish the interrupt vector base */
-   LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e)
+   tovirt(r2,r2)
+   LOAD_REG_ADDR(r3, interrupt_base_book3e)
mtspr   SPRN_IVPR,r3
sync
blr
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index a1e85ca..1b77956 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -457,12 +457,22 @@ __after_prom_start:
/* process relocations for the final address of the kernel */
lis r25,PAGE_OFFSET@highest /* compute virtual base of kernel */
sldir25,r25,32
+#if defined(CONFIG_PPC_BOOK3E)
+   tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */
+#endif
lwz r7,__run_at_load-_stext(r26)
+#if defined(CONFIG_PPC_BOOK3E)
+   tophys(r26,r26)
+#endif
cmplwi  cr0,r7,1/* flagged to stay where we are ? */
bne 1f
add r25,r25,r26
 1: mr  r3,r25
bl  relocate
+#if defined(CONFIG_PPC_BOOK3E)
+   /* IVPR needs to be set after relocation. */
+   bl  init_core_book3e
+#endif
 #endif
 
 /*
@@ -490,12 +500,21 @@ __after_prom_start:
  * variable __run_at_load, if it is set the kernel is treated as relocatable
  * kernel, otherwise it will be moved to PHYSICAL_START
  */
+#if defined(CONFIG_PPC_BOOK3E)
+   tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */
+#endif
lwz r7,__run_at_load-_stext(r26)
cmplwi  cr0,r7,1
bne 3f
 
+#ifdef CONFIG_PPC_BOOK3E
+   LOAD_REG_ADDR(r5, __end_interrupts)
+   LOAD_REG_ADDR(r11, _stext)
+   sub r5,r5,r11
+#else
/* just copy interrupts */
LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext)
+#endif
b   5f
 3:
 #endif
@@ -514,9 +533,6 @@ __after_prom_start:
 p_end: .llong  _end - _stext
 
 4: /* Now copy the rest of the kernel up to _end */
-#if defined(CONFIG_PPC_BOOK3E)
-   tovirt(r26,r26)
-#endif
addis   r5,r26,(p_end - _stext)@ha
ld  r5,(p_end - _stext)@l(r5)   /* get _end */
 5: bl  copy_and_flush  /* copy the rest */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 12/18] powerpc/book3e/kdump: Enable crash_kexec_wait_realmode

2015-10-06 Thread Scott Wood
While book3e doesn't have "real mode", we still want to wait for
all the non-crash cpus to complete their shutdown.

Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/crash.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index 51dbace..2bb252c 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -221,8 +221,8 @@ void crash_kexec_secondary(struct pt_regs *regs)
 #endif /* CONFIG_SMP */
 
 /* wait for all the CPUs to hit real mode but timeout if they don't come in */
-#if defined(CONFIG_SMP) && defined(CONFIG_PPC_STD_MMU_64)
-static void crash_kexec_wait_realmode(int cpu)
+#if defined(CONFIG_SMP) && defined(CONFIG_PPC64)
+static void __maybe_unused crash_kexec_wait_realmode(int cpu)
 {
unsigned int msecs;
int i;
@@ -244,7 +244,7 @@ static void crash_kexec_wait_realmode(int cpu)
 }
 #else
 static inline void crash_kexec_wait_realmode(int cpu) {}
-#endif /* CONFIG_SMP && CONFIG_PPC_STD_MMU_64 */
+#endif /* CONFIG_SMP && CONFIG_PPC64 */
 
 /*
  * Register a function to be called on shutdown.  Only use this if you
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 13/18] powerpc/book3e-64: Don't limit paca to 256 MiB

2015-10-06 Thread Scott Wood
This limit only makes sense on book3s, and on book3e it can cause
problems with kdump if we don't have any memory under 256 MiB.

Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/paca.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 5a23b69..7fdff63 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -206,12 +206,16 @@ void __init allocate_pacas(void)
 {
int cpu, limit;
 
+   limit = ppc64_rma_size;
+
+#ifdef CONFIG_PPC_BOOK3S_64
/*
 * We can't take SLB misses on the paca, and we want to access them
 * in real mode, so allocate them within the RMA and also within
 * the first segment.
 */
-   limit = min(0x1000ULL, ppc64_rma_size);
+   limit = min(0x1000ULL, limit);
+#endif
 
paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 14/18] powerpc/book3e-64/kexec: create an identity TLB mapping

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

book3e has no real MMU mode so we have to create an identity TLB
mapping to make sure we can access the real physical address.

Signed-off-by: Tiejun Chen 
[scottwood: cleanup, and split off some changes]
Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/misc_64.S | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 6e4168c..246ad8c 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
.text
 
@@ -496,6 +497,51 @@ kexec_flag:
 
 
 #ifdef CONFIG_KEXEC
+#ifdef CONFIG_PPC_BOOK3E
+/*
+ * BOOK3E has no real MMU mode, so we have to setup the initial TLB
+ * for a core to identity map v:0 to p:0.  This current implementation
+ * assumes that 1G is enough for kexec.
+ */
+kexec_create_tlb:
+   /*
+* Invalidate all non-IPROT TLB entries to avoid any TLB conflict.
+* IPROT TLB entries should be >= PAGE_OFFSET and thus not conflict.
+*/
+   PPC_TLBILX_ALL(0,R0)
+   sync
+   isync
+
+   mfspr   r10,SPRN_TLB1CFG
+   andi.   r10,r10,TLBnCFG_N_ENTRY /* Extract # entries */
+   subir10,r10,1   /* Last entry: no conflict with kernel text */
+   lis r9,MAS0_TLBSEL(1)@h
+   rlwimi  r9,r10,16,4,15  /* Setup MAS0 = TLBSEL | ESEL(r9) */
+
+/* Set up a temp identity mapping v:0 to p:0 and return to it. */
+#if defined(CONFIG_SMP) || defined(CONFIG_PPC_E500MC)
+#define M_IF_NEEDEDMAS2_M
+#else
+#define M_IF_NEEDED0
+#endif
+   mtspr   SPRN_MAS0,r9
+
+   lis r9,(MAS1_VALID|MAS1_IPROT)@h
+   ori r9,r9,(MAS1_TSIZE(BOOK3E_PAGESZ_1GB))@l
+   mtspr   SPRN_MAS1,r9
+
+   LOAD_REG_IMMEDIATE(r9, 0x0 | M_IF_NEEDED)
+   mtspr   SPRN_MAS2,r9
+
+   LOAD_REG_IMMEDIATE(r9, 0x0 | MAS3_SR | MAS3_SW | MAS3_SX)
+   mtspr   SPRN_MAS3,r9
+   li  r9,0
+   mtspr   SPRN_MAS7,r9
+
+   tlbwe
+   isync
+   blr
+#endif
 
 /* kexec_smp_wait(void)
  *
@@ -525,6 +571,10 @@ _GLOBAL(kexec_smp_wait)
  * don't overwrite r3 here, it is live for kexec_wait above.
  */
 real_mode: /* assume normal blr return */
+#ifdef CONFIG_PPC_BOOK3E
+   /* Create an identity mapping. */
+   b   kexec_create_tlb
+#else
 1: li  r9,MSR_RI
li  r10,MSR_DR|MSR_IR
mflrr11 /* return address to SRR0 */
@@ -536,7 +586,7 @@ real_mode:  /* assume normal blr return */
mtspr   SPRN_SRR1,r10
mtspr   SPRN_SRR0,r11
rfid
-
+#endif
 
 /*
  * kexec_sequence(newstack, start, image, control, clear_all())
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 15/18] powerpc/book3e-64/kexec: Enable SMP release

2015-10-06 Thread Scott Wood
The SMP release mechanism for FSL book3e is different from when booting
with normal hardware.  In theory we could simulate the normal spin
table mechanism, but not at the addresses U-Boot put in the device tree
-- so there'd need to be even more communication between the kernel and
kexec to set that up.  Instead, kexec-tools will set a boolean property
linux,booted-from-kexec in the /chosen node.

Signed-off-by: Scott Wood 
Cc: devicet...@vger.kernel.org
---
v2: Use a device tree property instead of a flag in the kernel image

This depends on the kexec-tools patch v2 "ppc64: Add a flag to tell the
kernel it's booting from kexec":
http://patchwork.ozlabs.org/patch/527050/
---
 Documentation/devicetree/bindings/chosen.txt |  8 
 arch/powerpc/kernel/setup_64.c   | 17 -
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/chosen.txt 
b/Documentation/devicetree/bindings/chosen.txt
index ed838f4..6ae9d82 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -44,3 +44,11 @@ Implementation note: Linux will look for the property 
"linux,stdout-path" or
 on PowerPC "stdout" if "stdout-path" is not found.  However, the
 "linux,stdout-path" and "stdout" properties are deprecated. New platforms
 should only use the "stdout-path" property.
+
+linux,booted-from-kexec
+---
+
+This property is set (currently only on PowerPC, and only needed on
+book3e) by some versions of kexec-tools to tell the new kernel that it
+is being booted by kexec, as the booting environment may differ (e.g.
+a different secondary CPU release mechanism)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 505ec2c..5c03a6a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -340,11 +340,26 @@ void early_setup_secondary(void)
 #endif /* CONFIG_SMP */
 
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC)
+static bool use_spinloop(void)
+{
+   if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
+   return true;
+
+   /*
+* When book3e boots from kexec, the ePAPR spin table does
+* not get used.
+*/
+   return of_property_read_bool(of_chosen, "linux,booted-from-kexec");
+}
+
 void smp_release_cpus(void)
 {
unsigned long *ptr;
int i;
 
+   if (!use_spinloop())
+   return;
+
DBG(" -> smp_release_cpus()\n");
 
/* All secondary cpus are spinning on a common spinloop, release them
@@ -524,7 +539,7 @@ void __init setup_system(void)
 * Freescale Book3e parts spin in a loop provided by firmware,
 * so smp_release_cpus() does nothing for them
 */
-#if defined(CONFIG_SMP) && !defined(CONFIG_PPC_FSL_BOOK3E)
+#if defined(CONFIG_SMP)
/* Release secondary cpus out of their spinloops at 0x60 now that
 * we can map physical -> logical CPU ids
 */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 16/18] powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32

2015-10-06 Thread Scott Wood
The way VIRT_PHYS_OFFSET is not correct on book3e-64, because
it does not account for CONFIG_RELOCATABLE other than via the
32-bit-only virt_phys_offset.

book3e-64 can (and if the comment about a GCC miscompilation is still
relevant, should) use the normal ppc64 __va/__pa.

At this point, only booke-32 will use VIRT_PHYS_OFFSET, so given the
issues with its calculation, restrict its definition to booke-32.

Signed-off-by: Scott Wood 
---
 arch/powerpc/include/asm/page.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 168ca67..6b67239 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -107,12 +107,13 @@ extern long long virt_phys_offset;
 #endif
 
 /* See Description below for VIRT_PHYS_OFFSET */
-#ifdef CONFIG_RELOCATABLE_PPC32
+#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE)
+#ifdef CONFIG_RELOCATABLE
 #define VIRT_PHYS_OFFSET virt_phys_offset
 #else
 #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
 #endif
-
+#endif
 
 #ifdef CONFIG_PPC64
 #define MEMORY_START   0UL
@@ -205,7 +206,7 @@ extern long long virt_phys_offset;
  * On non-Book-E PPC64 PAGE_OFFSET and MEMORY_START are constants so use
  * the other definitions for __va & __pa.
  */
-#ifdef CONFIG_BOOKE
+#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE)
 #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))
 #define __pa(x) ((unsigned long)(x) - VIRT_PHYS_OFFSET)
 #else
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 17/18] powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop

2015-10-06 Thread Scott Wood
book3e_secondary_core_init will only create a TLB entry if r4 = 0,
so do so.

Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/misc_64.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 246ad8c..ddbc535 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -485,6 +485,8 @@ _GLOBAL(kexec_wait)
mtsrr1  r11
rfid
 #else
+   /* Create TLB entry in book3e_secondary_core_init */
+   li  r4,0
ba  0x60
 #endif
 #endif
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 18/18] powerpc/book3e-64: Enable kexec

2015-10-06 Thread Scott Wood
From: Tiejun Chen 

Allow KEXEC for book3e, and bypass or convert non-book3e stuff
in kexec code.

Signed-off-by: Tiejun Chen 
[scottw...@freescale.com: move code to minimize diff, and cleanup]
Signed-off-by: Scott Wood 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/machine_kexec_64.c | 18 ++
 arch/powerpc/kernel/misc_64.S  |  6 ++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9a7057e..db49e0d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -419,7 +419,7 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 
 config KEXEC
bool "kexec system call"
-   depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP))
+   depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) || PPC_BOOK3E
select KEXEC_CORE
help
  kexec is a system call that implements the ability to shutdown your
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 1a74446..0fbd75d 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -30,6 +30,21 @@
 #include 
 #include 
 
+#ifdef CONFIG_PPC_BOOK3E
+int default_machine_kexec_prepare(struct kimage *image)
+{
+   int i;
+   /*
+* Since we use the kernel fault handlers and paging code to
+* handle the virtual mode, we must make sure no destination
+* overlaps kernel static data or bss.
+*/
+   for (i = 0; i < image->nr_segments; i++)
+   if (image->segment[i].mem < __pa(_end))
+   return -ETXTBSY;
+   return 0;
+}
+#else
 int default_machine_kexec_prepare(struct kimage *image)
 {
int i;
@@ -95,6 +110,7 @@ int default_machine_kexec_prepare(struct kimage *image)
 
return 0;
 }
+#endif /* !CONFIG_PPC_BOOK3E */
 
 static void copy_segments(unsigned long ind)
 {
@@ -365,6 +381,7 @@ void default_machine_kexec(struct kimage *image)
/* NOTREACHED */
 }
 
+#ifndef CONFIG_PPC_BOOK3E
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
 static unsigned long htab_size;
@@ -411,3 +428,4 @@ static int __init export_htab_values(void)
return 0;
 }
 late_initcall(export_htab_values);
+#endif /* !CONFIG_PPC_BOOK3E */
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index ddbc535..db475d4 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -631,9 +631,13 @@ _GLOBAL(kexec_sequence)
lhz r25,PACAHWCPUID(r13)/* get our phys cpu from paca */
 
/* disable interrupts, we are overwriting kernel data next */
+#ifdef CONFIG_PPC_BOOK3E
+   wrteei  0
+#else
mfmsr   r3
rlwinm  r3,r3,0,17,15
mtmsrd  r3,1
+#endif
 
/* copy dest pages, flush whole dest image */
mr  r3,r29
@@ -655,6 +659,7 @@ _GLOBAL(kexec_sequence)
li  r6,1
stw r6,kexec_flag-1b(5)
 
+#ifndef CONFIG_PPC_BOOK3E
/* clear out hardware hash page table and tlb */
 #if !defined(_CALL_ELF) || _CALL_ELF != 2
ld  r12,0(r27)  /* deref function descriptor */
@@ -663,6 +668,7 @@ _GLOBAL(kexec_sequence)
 #endif
mtctr   r12
bctrl   /* ppc_md.hpte_clear_all(void); */
+#endif /* !CONFIG_PPC_BOOK3E */
 
 /*
  *   kexec image calling is:
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread Ian Munsie
The explanation probably still needs to be expanded more (e.g. this
could cause a crash for an AFU that supports more than about a thousand
processes) - see my other email in reply to v1 for more, but I'm happy
for this to go in as is (but ultimately that's mpe's call).

It should also be CCd to stable, this bug was introduced before the
driver was originally upstreamed, we just never hit it because all our
AFUs are limited to less processes by their interrupt requirements.

Cc: stable 
Acked-by: Ian Munsie 

Excerpts from Christophe Lombard's message of 2015-10-07 01:19:49 +1100:
> This moves the initialisation of the num_procs to before the SPA
> allocation.
> The field 'num_procs' of the structure cxl_afu is not updated to the
> right value (maximum number of processes that can be supported by
> the AFU) when the pages are allocated (i.e. when  cxl_alloc_spa() is called).
> The number of allocates pages depends on the max number of processes.
> 
> Signed-off-by: Christophe Lombard 
> ---
>  drivers/misc/cxl/native.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index b37f2e8..d2e75c8 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu)
>  
>  dev_info(&afu->dev, "Activating AFU directed mode\n");
>  
> +afu->num_procs = afu->max_procs_virtualised;
>  if (afu->spa == NULL) {
>  if (cxl_alloc_spa(afu))
>  return -ENOMEM;
> @@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu)
>  cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
>  
>  afu->current_mode = CXL_MODE_DIRECTED;
> -afu->num_procs = afu->max_procs_virtualised;
>  
>  if ((rc = cxl_chardev_m_afu_add(afu)))
>  return rc;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread Michael Ellerman
On Wed, 2015-10-07 at 14:51 +1100, Ian Munsie wrote:
> The explanation probably still needs to be expanded more (e.g. this
> could cause a crash for an AFU that supports more than about a thousand
> processes) - see my other email in reply to v1 for more, but I'm happy
> for this to go in as is (but ultimately that's mpe's call).
> 
> It should also be CCd to stable, this bug was introduced before the
> driver was originally upstreamed, we just never hit it because all our
> AFUs are limited to less processes by their interrupt requirements.
> 
> Cc: stable 

So the driver went into 3.18, so this should be:

Cc: stable  # 3.18+

One of you please resend with a coherent change log with all the details
included.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cxl: Fix number of allocated pages in SPA

2015-10-06 Thread Ian Munsie
From: Christophe Lombard 

The scheduled process area is currently allocated before assigning the
correct maximum processes to the AFU, which will mean we only ever
allocate a fixed number of pages for the scheduled process area. This
will limit us to 958 processes with 2 x 64K pages. If we try to use more
processes than that we'd probably overrun the buffer and corrupt memory
or crash.

AFUs that require three or more interrupts per process will not be
affected as they are already limited to less processes than that, but we
could hit it on an AFU that requires 0, 1 or 2 interrupts per process,
or when using 4K pages.

This patch moves the initialisation of the num_procs to before the SPA
allocation so that enough pages will be allocated for the number of
processes that the AFU supports.

Signed-off-by: Christophe Lombard 
Signed-off-by: Ian Munsie 
Cc: stable  # 3.18+
---

Changes since v2:
 - Expanded commit message
Changes since v1:
 - Expanded commit message

 drivers/misc/cxl/native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index b37f2e8..d2e75c8 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu)
 
dev_info(&afu->dev, "Activating AFU directed mode\n");
 
+   afu->num_procs = afu->max_procs_virtualised;
if (afu->spa == NULL) {
if (cxl_alloc_spa(afu))
return -ENOMEM;
@@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu)
cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
 
afu->current_mode = CXL_MODE_DIRECTED;
-   afu->num_procs = afu->max_procs_virtualised;
 
if ((rc = cxl_chardev_m_afu_add(afu)))
return rc;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Missing operand for tlbie instruction on Power7

2015-10-06 Thread Michael Ellerman
On Tue, 2015-10-06 at 11:25 -0700, Laura Abbott wrote:
> On 10/05/2015 08:35 PM, Michael Ellerman wrote:
> > On Fri, 2015-10-02 at 08:43 -0700, Laura Abbott wrote:
> >> Hi,
> >>
> >> We received a report (https://bugzilla.redhat.com/show_bug.cgi?id=1267395) 
> >> of bad assembly
> >> when compiling on powerpc with little endian
> >
> > ...
> >
> >> After some discussion with the binutils folks, it turns out that the tlbie
> >> instruction actually requires another operand and binutils was updated to
> >> check for this https://sourceware.org/ml/binutils/2015-05/msg00133.html .
> >>
> >> The code sequence in arch/powerpc/include/asm/ppc_asm.h now needs to be 
> >> updated:
> >>
> >> #if !defined(CONFIG_4xx) && !defined(CONFIG_8xx)
> >> #define tlbia   \
> >>   li  r4,1024;\
> >>   mtctr   r4; \
> >>   lis r4,KERNELBASE@h;\
> >> 0:  tlbie   r4; \
> >>   addir4,r4,0x1000;   \
> >>   bdnz0b
> >> #endif
> >>
> >> I don't know enough ppc assembly to properly fix this but I can test.
> >
> > How are you testing? This code is fairly old and I'm dubious if it still 
> > works.
> >
> > These days we have a ppc_md hook for flushing the TLB, ppc_md.flush_tlb().
> > Ideally the swsusp code would use that.
> 
> Testing would probably just be compile and maybe boot. I don't have regular
> access to the hardware. This problem just showed up for me when someone
> tried to compile Fedora rawhide with the latest binutils.

Right. The code in question is for software suspend, ie. hibernation, so that's
what needs testing if the code is going to change.

It was mostly written for G5 (543b9fd3528f6), though it later gained support
for 64-bit BookE (5a31057fc06c3).

I just tested it on a G5 here and amazingly it worked.

So it is working code, even if it is old and crufty.

>  From what I can tell, it looks like the .flush_tlb of the cpu_spec is only
> defined for power7 and power8 and I don't see a ppc_md.flush_tlb on the
> master branch.

Yes it's only defined for Power7 and Power8 at the moment. It definitely does
exist in Linus' master branch, but I'm not sure if that's the master branch
you're referring to.

> It's not clear what to do for the case where there is no
> flush_tlb function. Would filling in a .flush_tlb for all the PPC_BOOK3S_64
> with the existing tlbia sequence work?

It might, but it's not much of an improvement. Ideally we'd have an actually
correct sequence for each cpu type.

> It's also worth noting that the __flush_power7 uses tlbiel instead of tlbie.

Yeah that's a good point. It's not clear if the swsusp code wants to a local or
a global invalidate.


As an alternative, can you try adding a .machine push / .machine "power4" /
.machine pop, around the tlbie. That should tell the assembler to drop back to
power4 mode for that instruction, which should then do the right thing. There
are some examples in that file.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev