Re: [PATCH] cxl: Fix number of allocated pages in SPA
The field 'num_procs' of the structure cxl_afu is not updated to the right value (maximum number of processes that can be supported by the AFU) when the pages are allocated (i.e. when cxl_alloc_spa() is called). The number of allocates pages depends on the max number of processes. Thanks On 06/10/2015 08:19, Michael Ellerman wrote: On Fri, 2015-10-02 at 16:01 +0200, Christophe Lombard wrote: This moves the initialisation of the num_procs to before the SPA allocation. Why? What does it fix? I can't tell from the diff or the change log. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Who uses CROSS32_COMPILE ?
On 10/6/15, Michael Ellerman wrote: > Does anyone build their kernels using CROSS32_COMPILE ? I didn't even know that such macro exists.. > > cheers > > > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: Fix _ALIGN_* errors due to type difference.
On Fri, 2015-02-10 at 14:33:48 UTC, "Aneesh Kumar K.V" wrote: > This avoid errors like > > unsigned int usize = 1 << 30; > int size = 1 << 30; > unsigned long addr = 64UL << 30 ; > > value = _ALIGN_DOWN(addr, usize); -> 0 > value = _ALIGN_DOWN(addr, size); -> 0x10 Are you actually seeing that anywhere? I assume not. > diff --git a/arch/powerpc/boot/page.h b/arch/powerpc/boot/page.h > index 14eca30fef64..87c42d7d283d 100644 > --- a/arch/powerpc/boot/page.h > +++ b/arch/powerpc/boot/page.h > @@ -22,8 +22,8 @@ > #define PAGE_MASK(~(PAGE_SIZE-1)) > > /* align addr on a size boundary - adjust address up/down if needed */ > -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) > -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) > +#define _ALIGN_UP(addr, size) > (((addr)+((size)-1))&(~((typeof(addr))(size)-1))) > +#define _ALIGN_DOWN(addr, size) ((addr)&(~((typeof(addr))(size)-1))) > > /* align addr on a size boundary - adjust address up if needed */ > #define _ALIGN(addr,size) _ALIGN_UP(addr,size) > diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h > index 71294a6e976e..1dd69774a31c 100644 > --- a/arch/powerpc/include/asm/page.h > +++ b/arch/powerpc/include/asm/page.h > @@ -240,8 +240,8 @@ extern long long virt_phys_offset; > #endif > > /* align addr on a size boundary - adjust address up/down if needed */ > -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) > -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) > +#define _ALIGN_UP(addr, size) > (((addr)+((size)-1))&(~((typeof(addr))(size)-1))) > +#define _ALIGN_DOWN(addr, size) ((addr)&(~((typeof(addr))(size)-1))) It looks like ALIGN() in kernel.h already does this right, so can we just use that instead for _ALIGN_UP() at least. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API
Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define with an #ifndef in common code? Also not all architectures use dma-mapping-common.h yet, so you either need to update all of those as well, or just add the #ifndef directly to linux/dma-mapping.h. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Kconfig.cputype: Disallow TUNE_CELL on LE systems
On Mon, 2015-09-21 at 12:07 +0200, Thomas Huth wrote: > On 21/09/15 09:18, Michael Ellerman wrote: > > On Fri, 2015-09-18 at 16:17 +0200, Thomas Huth wrote: > >> It looks somewhat weird that you can enable TUNE_CELL on little > >> endian systems, so let's disable this option with CPU_LITTLE_ENDIAN. > >> > >> Signed-off-by: Thomas Huth > >> --- > >> I first thought that it might be better to make this option depend > >> on PPC_CELL instead ... but I guess it's a bad idea to depend a > >> CPU option on a platform option? Alternatively, would it make > >> sense to make it depend on (GENERIC_CPU || CELL_CPU) instead? > > > > Hmm, it's a little backward, but I think it would be fine, and less > > confusing > > for users. Both PS3 and Cell select PPC_CELL, so it would work in both those > > cases. > > It's just that when you step through the kernel config (e.g. with "make > menuconfig"), you normally step through the "Processor support" first, > and then later do the "Platform support". I think most users won't look > back into "Processor support" again once they already reached the > "Platform support" section, so this TUNE_CELL option then might appear > unnoticed when you enable a Cell platform under "Platform support". Ah OK. Personally I almost never use menuconfig, but I guess some folks do. That actually seems like we should reorder those sections, ie. put platform support first, and then processor support. After all there's not much point agonising over whether to tune for CELL cpus if you then don't enable a Cell platform. I'm not sure if it's that simple in practice ... :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC, 1/5] powerpc:numa Add numa_cpu_lookup function to update lookup table
On Sun, 2015-27-09 at 18:29:09 UTC, Raghavendra K T wrote: > We access numa_cpu_lookup_table array directly in all the places > to read/update numa cpu lookup information. Instead use a helper > function to update. > > This is helpful in changing the way numa<-->cpu mapping in single > place when needed. > > This is a cosmetic change, no change in functionality. > > Signed-off-by: Raghavendra K T > Signed-off-by: Raghavendra K T > --- > arch/powerpc/include/asm/mmzone.h | 2 +- > arch/powerpc/kernel/smp.c | 10 +- > arch/powerpc/mm/numa.c| 28 +--- > 3 files changed, 23 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmzone.h > b/arch/powerpc/include/asm/mmzone.h > index 7b58917..c24a5f4 100644 > --- a/arch/powerpc/include/asm/mmzone.h > +++ b/arch/powerpc/include/asm/mmzone.h > @@ -29,7 +29,7 @@ extern struct pglist_data *node_data[]; > * Following are specific to this numa platform. > */ > > -extern int numa_cpu_lookup_table[]; > +extern int numa_cpu_lookup(int cpu); Can you rename it better :) Something like cpu_to_nid(). Although maybe nid is wrong given the rest of the series. > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 8b9502a..d5e6eee 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -52,7 +52,6 @@ int numa_cpu_lookup_table[NR_CPUS]; > cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; > struct pglist_data *node_data[MAX_NUMNODES]; > > -EXPORT_SYMBOL(numa_cpu_lookup_table); > EXPORT_SYMBOL(node_to_cpumask_map); > EXPORT_SYMBOL(node_data); > > @@ -134,19 +133,25 @@ static int __init fake_numa_create_new_node(unsigned > long end_pfn, > return 0; > } > > -static void reset_numa_cpu_lookup_table(void) > +int numa_cpu_lookup(int cpu) > { > - unsigned int cpu; > - > - for_each_possible_cpu(cpu) > - numa_cpu_lookup_table[cpu] = -1; > + return numa_cpu_lookup_table[cpu]; > } > +EXPORT_SYMBOL(numa_cpu_lookup); I don't see you changing any modular code that uses this, or any macros that might be used by modules, so I don't see why this needs to be exported? I think you just added it because num_cpu_lookup_table was exported? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support
On Sun, 2015-09-27 at 23:59 +0530, Raghavendra K T wrote: > Problem description: > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > got from device tree is naturally mapped (directly) to nid. > > Potential side effect of that is: > > 1) There are several places in kernel that assumes serial node numbering. > and memory allocations assume that all the nodes from 0-(highest nid) > exist inturn ending up allocating memory for the nodes that does not exist. Is it several? Or lots? If it's several, ie. more than two but not lots, then we should probably just fix those places. Or is that /really/ hard for some reason? Do we ever get whole nodes hotplugged in under PowerVM? I don't think so, but I don't remember for sure. > 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping > sparse nid of the host system to contiguous nids of guest (numa affinity, > placement) could be a challenge. Can you elaborate? That's a bit vague. > Possible Solutions: > 1) Handling the memory allocations is kernel case by case: Though in some > cases it is easy to achieve, some cases may be intrusive/not trivial. > at the end it does not handle side effect (2) above. > > 2) Map the sparse chipid got from device tree to a serial nid at kernel > level (The idea proposed in this series). > Pro: It is more natural to handle at kernel level than at lower (OPAL) layer. > con: The chipid is in device tree no longer the same as nid in kernel > > 3) Let the lower layer (OPAL) give the serial node ids after parsing the > chipid and the associativity etc [ either as a separate item in device tree > or by compacting the chipid numbers ] > Pros: kernel, device tree are on same page and less change in kernel > Con: is it the functionality expected in lower layer ... > 3) Numactl tests from > ftp://oss.sgi.com/www/projects/libnuma/download/numactl-2.0.10.tar.gz > > (infact there were more breakage before the patch because of sparse nid > and memoryless node cases of powerpc) This is probably the best argument for your series. ie. userspace is dumb and fixing every broken app that assumes linear node numbering is not feasible. So on the whole I think the concept is good. This series though is a bit confusing because of all the renaming etc. etc. Nish made lots of good comments so I'll wait for a v2 based on those. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC, 1/5] powerpc:numa Add numa_cpu_lookup function to update lookup table
On 10/06/2015 03:47 PM, Michael Ellerman wrote: On Sun, 2015-27-09 at 18:29:09 UTC, Raghavendra K T wrote: We access numa_cpu_lookup_table array directly in all the places to read/update numa cpu lookup information. Instead use a helper function to update. This is helpful in changing the way numa<-->cpu mapping in single place when needed. This is a cosmetic change, no change in functionality. Signed-off-by: Raghavendra K T Signed-off-by: Raghavendra K T --- arch/powerpc/include/asm/mmzone.h | 2 +- arch/powerpc/kernel/smp.c | 10 +- arch/powerpc/mm/numa.c| 28 +--- 3 files changed, 23 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h index 7b58917..c24a5f4 100644 --- a/arch/powerpc/include/asm/mmzone.h +++ b/arch/powerpc/include/asm/mmzone.h @@ -29,7 +29,7 @@ extern struct pglist_data *node_data[]; * Following are specific to this numa platform. */ -extern int numa_cpu_lookup_table[]; +extern int numa_cpu_lookup(int cpu); Can you rename it better :) Something like cpu_to_nid(). Good name. sure. Although maybe nid is wrong given the rest of the series. May be not. The current plan is to rename (after discussing with Nish) chipid to pnid (physical nid) and nid to vnid (virtual nid) within powerpc numa.c [reasoning chipid is applicable only to OPAL, since we want to handle powerkvm, powervm and baremetal we need a generic name ] But 'nid' naming will be retained which is applicable for generic kernel interactions. diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 8b9502a..d5e6eee 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -52,7 +52,6 @@ int numa_cpu_lookup_table[NR_CPUS]; cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; struct pglist_data *node_data[MAX_NUMNODES]; -EXPORT_SYMBOL(numa_cpu_lookup_table); EXPORT_SYMBOL(node_to_cpumask_map); EXPORT_SYMBOL(node_data); @@ -134,19 +133,25 @@ static int __init fake_numa_create_new_node(unsigned long end_pfn, return 0; } -static void reset_numa_cpu_lookup_table(void) +int numa_cpu_lookup(int cpu) { - unsigned int cpu; - - for_each_possible_cpu(cpu) - numa_cpu_lookup_table[cpu] = -1; + return numa_cpu_lookup_table[cpu]; } +EXPORT_SYMBOL(numa_cpu_lookup); I don't see you changing any modular code that uses this, or any macros that might be used by modules, so I don't see why this needs to be exported? I think you just added it because num_cpu_lookup_table was exported? arch/powerpc/kernel/smp.c uses it. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Kconfig.cputype: Disallow TUNE_CELL on LE systems
On 06/10/15 12:05, Michael Ellerman wrote: > On Mon, 2015-09-21 at 12:07 +0200, Thomas Huth wrote: >> On 21/09/15 09:18, Michael Ellerman wrote: >>> On Fri, 2015-09-18 at 16:17 +0200, Thomas Huth wrote: It looks somewhat weird that you can enable TUNE_CELL on little endian systems, so let's disable this option with CPU_LITTLE_ENDIAN. Signed-off-by: Thomas Huth --- I first thought that it might be better to make this option depend on PPC_CELL instead ... but I guess it's a bad idea to depend a CPU option on a platform option? Alternatively, would it make sense to make it depend on (GENERIC_CPU || CELL_CPU) instead? >>> >>> Hmm, it's a little backward, but I think it would be fine, and less >>> confusing >>> for users. Both PS3 and Cell select PPC_CELL, so it would work in both those >>> cases. >> >> It's just that when you step through the kernel config (e.g. with "make >> menuconfig"), you normally step through the "Processor support" first, >> and then later do the "Platform support". I think most users won't look >> back into "Processor support" again once they already reached the >> "Platform support" section, so this TUNE_CELL option then might appear >> unnoticed when you enable a Cell platform under "Platform support". > > Ah OK. Personally I almost never use menuconfig, but I guess some folks do. > > That actually seems like we should reorder those sections, ie. put platform > support first, and then processor support. After all there's not much point > agonising over whether to tune for CELL cpus if you then don't enable a Cell > platform. Not sure whether reordering the sections make much sense - others might think "I want to support Cell chips with my distro, so let's enable that first, then let's see which platforms I can select next..." - so I'd rather not do that. > I'm not sure if it's that simple in practice ... :) Maybe we could also simply remove the TUNE_CELL option nowadays? I think this was used for building generic Linux distros, which are just optimized for Cell ... but who is still doing that nowadays? Alternatively, if that is not an option and if you don't like my patch with CPU_LITTLE_ENDIAN, what about changing it to check "depends on (GENERIC_CPU || CELL_CPU)" instead? Thomas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,1/2] powerpc/xmon: Paged output for paca display
On Fri, 2015-21-08 at 04:24:27 UTC, Sam bobroff wrote: > The paca display is already more than 24 lines, which can be problematic > if you have an old school 80x24 terminal, or more likely you are on a > virtual terminal which does not scroll for whatever reason. > > This patch adds a new command ".", which takes a single (hex) numeric > argument: lines per page. It will cause the output of "dp" and "dpa" > to be broken into pages, if necessary. > > This is implemented by running over the entire output both for the > initial command and for each subsequent page: the visible part is > clipped out by checking line numbers. This is a simplistic approach > but minimally invasive; it is intended to be easily reusable for other > commands. > > Sample output: > > 0:mon> .10 > 0:mon> dp1 > paca for cpu 0x1 @ cfdc0480: > possible = yes > present = yes > online = yes > lock_token = 0x8000(0x8) > paca_index = 0x1 (0xa) > kernel_toc = 0xc0eb2400(0x10) > kernelbase = 0xc000(0x18) > kernel_msr = 0xb0001032(0x20) > emergency_sp = 0xc0003ffe8000(0x28) > mc_emergency_sp = 0xc0003ffe4000(0x2e0) > in_mce = 0x0 (0x2e8) > data_offset = 0x7f17(0x30) > hw_cpu_id= 0x8 (0x38) > cpu_start= 0x1 (0x3a) > kexec_state = 0x0 (0x3b) > [Enter for next page] > 0:mon> > __current= 0xc0007e696620(0x290) > kstack = 0xc0007e6ebe30(0x298) > stab_rr = 0xb (0x2a0) > saved_r1 = 0xc0007ef37860(0x2a8) > trap_save= 0x0 (0x2b8) > soft_enabled = 0x0 (0x2ba) > irq_happened = 0x1 (0x2bb) > io_sync = 0x0 (0x2bc) > irq_work_pending = 0x0 (0x2bd) > nap_state_lost = 0x0 (0x2be) > 0:mon> > > (Based on a similar patch by Michael Ellerman > "[v2] powerpc/xmon: Allow limiting the size of the paca display". > This patch is an alternative and cannot coexist with the original.) > > Signed-off-by: Sam Bobroff > --- > arch/powerpc/xmon/xmon.c | 86 > +++- > 1 file changed, 71 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > index e599259..9ce9e7d 100644 > --- a/arch/powerpc/xmon/xmon.c > +++ b/arch/powerpc/xmon/xmon.c > @@ -72,6 +72,12 @@ static int xmon_gate; > > static unsigned long in_xmon __read_mostly = 0; > > +#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } while > (0) Can you do this is a proper function. I know it will need to be varargs, but that shouldn't be too ugly. > +#define MAX_PAGED_SIZE 1024 Why do we need a max at all? > +static unsigned long paged_size = 0, paged_pos, paged_cur_page; > +#ifdef CONFIG_PPC64 > +static unsigned long paca_cpu; > +#endif That can just be static in dump_pacas() by the looks. > static unsigned long adrs; > static int size = 1; > #define MAX_DUMP (128 * 1024) > @@ -242,6 +248,9 @@ Commands:\n\ > " u dump TLB\n" > #endif > " ? help\n" > +#ifdef CONFIG_PPC64 > +" .#limit output to # lines per page (dump paca only)\n" > +#endif Don't make it 64-bit only. > " zrreboot\n\ >zh halt\n" > ; > @@ -833,6 +842,19 @@ static void remove_cpu_bpts(void) > write_ciabr(0); > } > > +static void paged_set_size(void) "paged" isn't reading very well for me. Can we use "pagination" instead? I know it's longer but monitors are wide these days. Also I prefer verb first usually, so set_pagination_size() etc. > +{ > + if (!scanhex(&paged_size) || (paged_size > MAX_PAGED_SIZE)) { > + printf("Invalid number of lines per page (max: %d).\n", > +MAX_PAGED_SIZE); > + paged_size = 0; > + } > +} > +static void paged_reset(void) > +{ > + paged_cur_page = 0; > +} You only call that once so a function seems like over kill. > /* Command interpreting routine */ > static char *last_cmd; > > @@ -863,7 +885,8 @@ cmds(struct pt_regs *excp) > take_input(last_cmd); > last_cmd = NULL; > cmd = inchar(); > - } > + } else > + paged_reset(); > switch (cmd) { > case 'm': > cmd = inchar(); > @@ -924,6 +947,9 @@ cmds(struct pt_regs *excp) > case '?': > xmon_puts(help_string); > break; > + case '.': > + paged_set_size(); > + break; > case 'b': >
Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support
On 10/06/2015 03:55 PM, Michael Ellerman wrote: On Sun, 2015-09-27 at 23:59 +0530, Raghavendra K T wrote: Problem description: Powerpc has sparse node numbering, i.e. on a 4 node system nodes are numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid got from device tree is naturally mapped (directly) to nid. Potential side effect of that is: 1) There are several places in kernel that assumes serial node numbering. and memory allocations assume that all the nodes from 0-(highest nid) exist inturn ending up allocating memory for the nodes that does not exist. Is it several? Or lots? If it's several, ie. more than two but not lots, then we should probably just fix those places. Or is that /really/ hard for some reason? It is several and I did attempt to fix them. But the rest of the places (like memcg, work queue, scheduler and so on) are tricky to fix because the memory allocations are glued with other things. and similar fix may be expected in future too.. Do we ever get whole nodes hotplugged in under PowerVM? I don't think so, but I don't remember for sure. Even on powervm we do have discontiguous numa nodes. [Adding more to it, we could even end up creating a dummy node 0 just to make kernel happy] for e.g., available: 2 nodes (0,7) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 7 cpus: 0 1 2 3 4 5 6 7 node 7 size: 10240 MB node 7 free: 8174 MB node distances: node 0 7 0: 10 40 7: 40 10 note that node zero neither has any cpu nor memory. 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping sparse nid of the host system to contiguous nids of guest (numa affinity, placement) could be a challenge. Can you elaborate? That's a bit vague. one e.g., i can think of: (though libvirt/openstack people will know more about it) suppose one wishes to have half of the vcpus bind to one physical node and rest of the vcpus to second numa node, we cant say whether second node is 1,8, or 16. and same libvirtxml on a two node system may not be valid for another two numa node system. [ i believe it may cause some migration problem too ]. Possible Solutions: 1) Handling the memory allocations is kernel case by case: Though in some cases it is easy to achieve, some cases may be intrusive/not trivial. at the end it does not handle side effect (2) above. 2) Map the sparse chipid got from device tree to a serial nid at kernel level (The idea proposed in this series). Pro: It is more natural to handle at kernel level than at lower (OPAL) layer. con: The chipid is in device tree no longer the same as nid in kernel 3) Let the lower layer (OPAL) give the serial node ids after parsing the chipid and the associativity etc [ either as a separate item in device tree or by compacting the chipid numbers ] Pros: kernel, device tree are on same page and less change in kernel Con: is it the functionality expected in lower layer ... 3) Numactl tests from ftp://oss.sgi.com/www/projects/libnuma/download/numactl-2.0.10.tar.gz (infact there were more breakage before the patch because of sparse nid and memoryless node cases of powerpc) This is probably the best argument for your series. ie. userspace is dumb and fixing every broken app that assumes linear node numbering is not feasible. So on the whole I think the concept is good. This series though is a bit confusing because of all the renaming etc. etc. Nish made lots of good comments so I'll wait for a v2 based on those. Yes, will be sending V2 soon extending my patch to fix powervm case too. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Who uses CROSS32_COMPILE ?
On Tue, 2015-10-06 at 12:40 +0300, Denis Kirjanov wrote: > On 10/6/15, Michael Ellerman wrote: > > Does anyone build their kernels using CROSS32_COMPILE ? > > I didn't even know that such macro exists.. Good, I want to remove it :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
Le 29/09/2015 00:07, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction. Signed-off-by: Christophe Leroy Does this just make it simpler or does it make it faster? What is the performance impact? Is the performance impact seen with or without CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an mtspr/mfspr combo in order to replace one mfspr. The performance impact is not noticeable. Theoritically it adds 1 cycle on a mean of 65 cycles, that is 1.5%. Even in the worst case where we spend around 10% of the time in TLB handling exceptions, that represents only 0.15% of the total CPU time. So that's almost nothing. Behind the fact to get in simpler, the main reason is because I need a third register for the following patch in the set, otherwise I would spend a more time saving and restoring CR several times. Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either
Le 29/09/2015 01:41, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:40PM +0200, Christophe Leroy wrote: iounmap() cannot vunmap() area mapped by TLBCAMs either Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/pgtable_32.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 7692d1b..03a073a 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -278,7 +278,9 @@ void iounmap(volatile void __iomem *addr) * If mapped by BATs then there is nothing to do. * Calling vfree() generates a benign warning. */ - if (v_mapped_by_bats((unsigned long)addr)) return; + if (v_mapped_by_bats((unsigned long)addr) || + v_mapped_by_tlbcam((unsigned long)addr)) + return; This is pretty pointless given that the next patch replaces both with v_mapped_by_other(). I thought it was cleaner to first fix the bug, in order to make the following patch straight through, but I can skip it, no problem. Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
Le 29/09/2015 01:47, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote: x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, so lets group them into a single function. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/pgtable_32.c | 33 ++--- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 03a073a..3fd9083 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t pa); #define p_mapped_by_tlbcam(x) (0UL) #endif /* HAVE_TLBCAM */ +static inline unsigned long p_mapped_by_other(phys_addr_t pa) +{ + unsigned long v; + + v = p_mapped_by_bats(pa); + if (v /*&& p_mapped_by_bats(p+size-1)*/) + return v; + + return p_mapped_by_tlbcam(pa); +} Did you forget to remove that comment? No I didn't, I though it was there for a reason, it has been there since 2005. Do you think I should remove it ? Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
Le 29/09/2015 02:00, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/head_8xx.S | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) Why is this a separate patch from 1/25? Same comments as on that patch. Just because here there is no real need behind the simplification of the code, whereas the first one was a pre-requisite for the following patch. Should I merge them together anyway ? Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] cxl: Fix number of allocated pages in SPA
This moves the initialisation of the num_procs to before the SPA allocation. The field 'num_procs' of the structure cxl_afu is not updated to the right value (maximum number of processes that can be supported by the AFU) when the pages are allocated (i.e. when cxl_alloc_spa() is called). The number of allocates pages depends on the max number of processes. Signed-off-by: Christophe Lombard --- drivers/misc/cxl/native.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c index b37f2e8..d2e75c8 100644 --- a/drivers/misc/cxl/native.c +++ b/drivers/misc/cxl/native.c @@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu) dev_info(&afu->dev, "Activating AFU directed mode\n"); + afu->num_procs = afu->max_procs_virtualised; if (afu->spa == NULL) { if (cxl_alloc_spa(afu)) return -ENOMEM; @@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu) cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L); afu->current_mode = CXL_MODE_DIRECTED; - afu->num_procs = afu->max_procs_virtualised; if ((rc = cxl_chardev_m_afu_add(afu))) return rc; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
Le 29/09/2015 02:03, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote: Move 8xx SPRN defines into reg_8xx.h and add some missing ones Signed-off-by: Christophe Leroy --- No change in v2 Why are they being moved? Why are they being separated from the bit definitions? It was to keep asm/reg_8xx.h self sufficient for the following patch. Also because including asm/mmu-8xx.h creates circular inclusion issue (mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes cache.h, cache.h include reg.h which includes reg_8xx). The circle starts with an inclusion of asm/cache.h by linux/cache.h, himself included by linux/printk.h, and I end up with 'implicit declaration' issues. How can I fix that ? Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
Le 29/09/2015 01:58, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote: On recent kernels, with some debug options like for instance CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough the kernel code fits in the first 8M. Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M at startup, allthough pinning TLB is not necessary for that. This patch adds a second 8M page to the initial mapping in order to have 16M mapped regardless of CONFIG_PIN_TLB, like several other 32 bits PPC (40x, 601, ...) Signed-off-by: Christophe Leroy --- Is the assumption that nobody is still running 8xx systems with only 8 MiB RAM on current kernels? No, setup_initial_memory_limit() limits the memory to the minimum between 16M and the real memory size, so if a platform has only 8M, it will still be limited to 8M even with 16M mapped. Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
On Tue, 2015-10-06 at 16:02 +0200, Christophe Leroy wrote: > Le 29/09/2015 01:47, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote: > > > x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of > > > purpose, so lets group them into a single function. > > > > > > Signed-off-by: Christophe Leroy > > > --- > > > No change in v2 > > > > > > arch/powerpc/mm/pgtable_32.c | 33 ++--- > > > 1 file changed, 26 insertions(+), 7 deletions(-) > > > > > > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c > > > index 03a073a..3fd9083 100644 > > > --- a/arch/powerpc/mm/pgtable_32.c > > > +++ b/arch/powerpc/mm/pgtable_32.c > > > @@ -67,6 +67,28 @@ extern unsigned long p_mapped_by_tlbcam(phys_addr_t > > > pa); > > > #define p_mapped_by_tlbcam(x) (0UL) > > > #endif /* HAVE_TLBCAM */ > > > > > > +static inline unsigned long p_mapped_by_other(phys_addr_t pa) > > > +{ > > > + unsigned long v; > > > + > > > + v = p_mapped_by_bats(pa); > > > + if (v /*&& p_mapped_by_bats(p+size-1)*/) > > > + return v; > > > + > > > + return p_mapped_by_tlbcam(pa); > > > +} > > Did you forget to remove that comment? > > > > > No I didn't, I though it was there for a reason, it has been there since > 2005. > Do you think I should remove it ? Oh, you took it from __ioremap_caller. Commented-out code is generally frowned upon, and it makes even less sense now because there's no "size" in p_mapped_by_other. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup
On Tue, 2015-10-06 at 16:10 +0200, Christophe Leroy wrote: > Le 29/09/2015 01:58, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote: > > > On recent kernels, with some debug options like for instance > > > CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough > > > the kernel code fits in the first 8M. > > > Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M > > > at startup, allthough pinning TLB is not necessary for that. > > > > > > This patch adds a second 8M page to the initial mapping in order to > > > have 16M mapped regardless of CONFIG_PIN_TLB, like several other > > > 32 bits PPC (40x, 601, ...) > > > > > > Signed-off-by: Christophe Leroy > > > --- > > Is the assumption that nobody is still running 8xx systems with only 8 > > MiB RAM on current kernels? > > > > > No, setup_initial_memory_limit() limits the memory to the minimum > between 16M and the real memory size, so if a platform has only 8M, it > will still be limited to 8M even with 16M mapped. And you just hope you don't get a speculative fetch from the second 8M? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/6] kernel/cpu.c: eliminate some indirection
v2: fix build failure on ppc, add acks. The four cpumasks cpu_{possible,online,present,active}_bits are exposed readonly via the corresponding const variables cpu_xyz_mask. But they are also accessible for arbitrary writing via the exposed functions set_cpu_xyz. There's quite a bit of code throughout the kernel which iterates over or otherwise accesses these bitmaps, and having the access go via the cpu_xyz_mask variables is nowadays [1] simply a useless indirection. It may be that any problem in CS can be solved by an extra level of indirection, but that doesn't mean every extra indirection solves a problem. In this case, it even necessitates some minor ugliness (see 4/6). Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a struct member, to avoid problems when the identifier cpu_online_mask becomes a macro later in the series. The next four patches eliminate the cpu_xyz_mask variables by simply exposing the actual bitmaps, after renaming them to discourage direct access - that still happens through cpu_xyz_mask, which are now simply macros with the same type and value as they used to have. After that, there's no longer any reason to have the setter functions be out-of-line: The boolean parameter is almost always a literal true or false, so by making them static inlines they will usually compile to one or two instructions. For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes. We also save a little stack (stackdelta says 127 functions have a 16 byte smaller stack frame, while two grow by that amount). Mostly because, when iterating over the mask, gcc typically loads the value of cpu_xyz_mask into a callee-saved register and from there into %rdi before each find_next_bit call - now it can just load the appropriate immediate address into %rdi before each call. [1] See Rusty's kind explanation http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for some historic context. Rasmus Villemoes (6): powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_header kernel/cpu.c: change type of cpu_possible_bits and friends kernel/cpu.c: export __cpu_*_mask drivers/base/cpu.c: use __cpu_*_mask directly kernel/cpu.c: eliminate cpu_*_mask kernel/cpu.c: make set_cpu_* static inlines arch/powerpc/include/asm/fadump.h | 2 +- arch/powerpc/kernel/fadump.c | 4 +-- drivers/base/cpu.c| 10 +++--- include/linux/cpumask.h | 55 - kernel/cpu.c | 64 --- 5 files changed, 68 insertions(+), 67 deletions(-) -- 2.1.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/6] powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_header
As preparation for eliminating the indirect access to the various global cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the cpu_online_mask member of struct fadump_crash_info_header to simply online_mask, thus allowing cpu_online_mask to become a macro. Acked-by: Michael Ellerman Signed-off-by: Rasmus Villemoes --- arch/powerpc/include/asm/fadump.h | 2 +- arch/powerpc/kernel/fadump.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 493e72f64b35..b4407d0add27 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -191,7 +191,7 @@ struct fadump_crash_info_header { u64 elfcorehdr_addr; u32 crashing_cpu; struct pt_regs regs; - struct cpumask cpu_online_mask; + struct cpumask online_mask; }; /* Crash memory ranges */ diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 26d091a1a54c..3cb3b02a13dd 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -415,7 +415,7 @@ void crash_fadump(struct pt_regs *regs, const char *str) else ppc_save_regs(&fdh->regs); - fdh->cpu_online_mask = *cpu_online_mask; + fdh->online_mask = *cpu_online_mask; /* Call ibm,os-term rtas call to trigger firmware assisted dump */ rtas_os_term((char *)str); @@ -646,7 +646,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) } /* Lower 4 bytes of reg_value contains logical cpu id */ cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK; - if (fdh && !cpumask_test_cpu(cpu, &fdh->cpu_online_mask)) { + if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) { SKIP_TO_NEXT_CPU(reg_entry); continue; } -- 2.1.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > Le 29/09/2015 00:07, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > We are spending between 40 and 160 cycles with a mean of 65 cycles in > > > the TLB handling routines (measured with mftbl) so make it more > > > simple althought it adds one instruction. > > > > > > Signed-off-by: Christophe Leroy > > Does this just make it simpler or does it make it faster? What is the > > performance impact? Is the performance impact seen with or without > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > mtspr/mfspr combo in order to replace one mfspr. > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > spend around 10% of the time in TLB handling exceptions, that represents > only 0.15% of the total CPU time. So that's almost nothing. > Behind the fact to get in simpler, the main reason is because I need a > third register for the following patch in the set, otherwise I would > spend a more time saving and restoring CR several times. If you had said in the changelog that it was because future patches would need the register to be saved, we could have avoided this exchange... Especially with large patchsets, I review the patches one at a time. Don't assume I know what's coming in patch n+1 (and especially not n+m) when I review patch n. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > Le 29/09/2015 00:07, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > We are spending between 40 and 160 cycles with a mean of 65 cycles in > > > the TLB handling routines (measured with mftbl) so make it more > > > simple althought it adds one instruction. > > > > > > Signed-off-by: Christophe Leroy > > Does this just make it simpler or does it make it faster? What is the > > performance impact? Is the performance impact seen with or without > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > mtspr/mfspr combo in order to replace one mfspr. > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > spend around 10% of the time in TLB handling exceptions, that represents > only 0.15% of the total CPU time. So that's almost nothing. > Behind the fact to get in simpler, the main reason is because I need a > third register for the following patch in the set, otherwise I would > spend a more time saving and restoring CR several times. FWIW, the added instruction is an SPR access and I doubt that's only one cycle. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations
On Tue, 2015-10-06 at 16:12 +0200, Christophe Leroy wrote: > Le 29/09/2015 02:00, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote: > > > We are spending between 40 and 160 cycles with a mean of 65 cycles > > > in the TLB handling routines (measured with mftbl) so make it more > > > simple althought it adds one instruction > > > > > > Signed-off-by: Christophe Leroy > > > --- > > > No change in v2 > > > > > > arch/powerpc/kernel/head_8xx.S | 15 --- > > > 1 file changed, 4 insertions(+), 11 deletions(-) > > Why is this a separate patch from 1/25? > > > > Same comments as on that patch. > > > > > Just because here there is no real need behind the simplification of the > code, whereas the first one was a pre-requisite for the following patch. > Should I merge them together anyway ? If there's no real need, why do it? It's not really a major readability enhancement... -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones
On Tue, 2015-10-06 at 16:35 +0200, Christophe Leroy wrote: > Le 29/09/2015 02:03, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote: > > > Move 8xx SPRN defines into reg_8xx.h and add some missing ones > > > > > > Signed-off-by: Christophe Leroy > > > --- > > > No change in v2 > > Why are they being moved? Why are they being separated from the bit > > definitions? > > > > > It was to keep asm/reg_8xx.h self sufficient for the following patch. Again, it would have been nice if this were in the commit message. > Also because including asm/mmu-8xx.h creates circular inclusion issue > (mmu-8xx.h needs page.h which includes page-32.h, page-32.h includes > cache.h, cache.h include reg.h which includes reg_8xx). The circle > starts with an inclusion of asm/cache.h by linux/cache.h, himself > included by linux/printk.h, and I end up with 'implicit declaration' issues. > > How can I fix that ? mmu-8xx.h should have been including page.h instead of assuming the caller h as done so... but another option is to do what mmu-book3e.h does, and use the kconfig symbols instead of PAGE_SHIFT. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Missing operand for tlbie instruction on Power7
On 10/05/2015 08:35 PM, Michael Ellerman wrote: On Fri, 2015-10-02 at 08:43 -0700, Laura Abbott wrote: Hi, We received a report (https://bugzilla.redhat.com/show_bug.cgi?id=1267395) of bad assembly when compiling on powerpc with little endian ... After some discussion with the binutils folks, it turns out that the tlbie instruction actually requires another operand and binutils was updated to check for this https://sourceware.org/ml/binutils/2015-05/msg00133.html . The code sequence in arch/powerpc/include/asm/ppc_asm.h now needs to be updated: #if !defined(CONFIG_4xx) && !defined(CONFIG_8xx) #define tlbia \ li r4,1024;\ mtctr r4; \ lis r4,KERNELBASE@h;\ 0: tlbie r4; \ addir4,r4,0x1000; \ bdnz0b #endif I don't know enough ppc assembly to properly fix this but I can test. How are you testing? This code is fairly old and I'm dubious if it still works. These days we have a ppc_md hook for flushing the TLB, ppc_md.flush_tlb(). Ideally the swsusp code would use that. cheers Testing would probably just be compile and maybe boot. I don't have regular access to the hardware. This problem just showed up for me when someone tried to compile Fedora rawhide with the latest binutils. From what I can tell, it looks like the .flush_tlb of the cpu_spec is only defined for power7 and power8 and I don't see a ppc_md.flush_tlb on the master branch. It's not clear what to do for the case where there is no flush_tlb function. Would filling in a .flush_tlb for all the PPC_BOOK3S_64 with the existing tlbia sequence work? It's also worth noting that the __flush_power7 uses tlbiel instead of tlbie. Thanks, Laura ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH-RFC 3/7] powerpc: convert to generic builtin command line
This updates the powerpc code to use the CONFIG_GENERIC_CMDLINE option. Cc: xe-ker...@external.cisco.com Cc: Daniel Walker Signed-off-by: Daniel Walker --- arch/powerpc/Kconfig| 23 +-- arch/powerpc/kernel/prom.c | 4 arch/powerpc/kernel/prom_init.c | 8 3 files changed, 9 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 9a7057e..26252dc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -160,6 +160,7 @@ config PPC select EDAC_ATOMIC_SCRUB select ARCH_HAS_DMA_SET_COHERENT_MASK select HAVE_ARCH_SECCOMP_FILTER + select GENERIC_CMDLINE config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN @@ -640,28 +641,6 @@ config PPC_DENORMALISATION Add support for handling denormalisation of single precision values. Useful for bare metal only. If unsure say Y here. -config CMDLINE_BOOL - bool "Default bootloader kernel arguments" - -config CMDLINE - string "Initial kernel command string" - depends on CMDLINE_BOOL - default "console=ttyS0,9600 console=tty0 root=/dev/sda2" - help - On some platforms, there is currently no way for the boot loader to - pass arguments to the kernel. For these platforms, you can supply - some command-line options at build time by entering them here. In - most cases you will need to specify the root device here. - -config CMDLINE_FORCE - bool "Always use the default kernel command string" - depends on CMDLINE_BOOL - help - Always use the default kernel command string, even if the boot - loader passes other arguments to the kernel. - This is useful if you cannot or don't want to change the - command-line options your boot loader passes to the kernel. - config EXTRA_TARGETS string "Additional default image types" help diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index bef76c5..3281d5a 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -670,6 +671,9 @@ void __init early_init_devtree(void *params) */ of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line); + /* append and prepend any arguments built into the kernel. */ + cmdline_add_builtin(boot_command_line, NULL, COMMAND_LINE_SIZE); + /* Scan memory nodes and rebuild MEMBLOCKs */ of_scan_flat_dt(early_init_dt_scan_root, NULL); of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 15099c4..2dd2608 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -595,11 +596,10 @@ static void __init early_cmdline_parse(void) p = prom_cmd_line; if ((long)prom.chosen > 0) l = prom_getprop(prom.chosen, "bootargs", p, COMMAND_LINE_SIZE-1); -#ifdef CONFIG_CMDLINE + if (l <= 0 || p[0] == '\0') /* dbl check */ - strlcpy(prom_cmd_line, - CONFIG_CMDLINE, sizeof(prom_cmd_line)); -#endif /* CONFIG_CMDLINE */ + cmdline_add_builtin(prom_cmd_line, NULL, sizeof(prom_cmd_line)); + prom_printf("command line: %s\n", prom_cmd_line); #ifdef CONFIG_PPC64 -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
Le 06/10/2015 18:46, Scott Wood a écrit : On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: Le 29/09/2015 00:07, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction. Signed-off-by: Christophe Leroy Does this just make it simpler or does it make it faster? What is the performance impact? Is the performance impact seen with or without CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an mtspr/mfspr combo in order to replace one mfspr. The performance impact is not noticeable. Theoritically it adds 1 cycle on a mean of 65 cycles, that is 1.5%. Even in the worst case where we spend around 10% of the time in TLB handling exceptions, that represents only 0.15% of the total CPU time. So that's almost nothing. Behind the fact to get in simpler, the main reason is because I need a third register for the following patch in the set, otherwise I would spend a more time saving and restoring CR several times. FWIW, the added instruction is an SPR access and I doubt that's only one cycle. According to the mpc885 reference manual (table 9-1), Instruction Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR and CTR and to SPRs external to the core" is "serialize + 1 cycle". Taking into account we preeceeding instructions are also 'mtspr', we are already serialized, so it is only one cycle I believe. Am I interpreting it wrong ? Christophe --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler
On Tue, 2015-10-06 at 22:30 +0200, christophe leroy wrote: > Le 06/10/2015 18:46, Scott Wood a écrit : > > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > > > Le 29/09/2015 00:07, Scott Wood a écrit : > > > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > > > We are spending between 40 and 160 cycles with a mean of 65 cycles > > > > > in > > > > > the TLB handling routines (measured with mftbl) so make it more > > > > > simple althought it adds one instruction. > > > > > > > > > > Signed-off-by: Christophe Leroy > > > > Does this just make it simpler or does it make it faster? What is the > > > > performance impact? Is the performance impact seen with or without > > > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > > > mtspr/mfspr combo in order to replace one mfspr. > > > > > > > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > > > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > > > spend around 10% of the time in TLB handling exceptions, that represents > > > only 0.15% of the total CPU time. So that's almost nothing. > > > Behind the fact to get in simpler, the main reason is because I need a > > > third register for the following patch in the set, otherwise I would > > > spend a more time saving and restoring CR several times. > > FWIW, the added instruction is an SPR access and I doubt that's only one > > cycle. > > > > > According to the mpc885 reference manual (table 9-1), Instruction > Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR > and CTR and to SPRs external to the core" is "serialize + 1 cycle". > Taking into account we preeceeding instructions are also 'mtspr', we are > already serialized, so it is only one cycle I believe. > Am I interpreting it wrong ? I don't know. The manual doesn't go into much detail about the mechanics of serialization. If it's just about "block[ing] all execution units" without any effect on fetching, decoding, etc. then maybe you're right. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/3] ppc64: Fix warnings
Produce a warning-free build on ppc64 (at least, when built as 64-bit userspace -- if a 64-bit binary for ppc64 is a requirement, why is -m64 set only on purgatory?). Mostly unused (or write-only) variable warnings, but also one nasty one where reserve() was used without a prototype, causing long long arguments to be passed as int. Signed-off-by: Scott Wood --- v2: no change kexec/arch/ppc64/crashdump-ppc64.c | 3 ++- kexec/arch/ppc64/kexec-elf-ppc64.c | 9 + 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/kexec/arch/ppc64/crashdump-ppc64.c b/kexec/arch/ppc64/crashdump-ppc64.c index 6214b83..b3c8928 100644 --- a/kexec/arch/ppc64/crashdump-ppc64.c +++ b/kexec/arch/ppc64/crashdump-ppc64.c @@ -33,6 +33,7 @@ #include "../../kexec-syscall.h" #include "../../crashdump.h" #include "kexec-ppc64.h" +#include "../../fs2dt.h" #include "crashdump-ppc64.h" static struct crash_elf_info elf_info64 = @@ -187,7 +188,7 @@ static int get_crash_memory_ranges(struct memory_range **range, int *ranges) DIR *dir, *dmem; FILE *file; struct dirent *dentry, *mentry; - int i, n, crash_rng_len = 0; + int n, crash_rng_len = 0; unsigned long long start, end; int page_size; diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c b/kexec/arch/ppc64/kexec-elf-ppc64.c index 4a1540e..adcee4c 100644 --- a/kexec/arch/ppc64/kexec-elf-ppc64.c +++ b/kexec/arch/ppc64/kexec-elf-ppc64.c @@ -97,7 +97,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len, struct mem_ehdr ehdr; char *cmdline, *modified_cmdline = NULL; const char *devicetreeblob; - int cmdline_len, modified_cmdline_len; uint64_t max_addr, hole_addr; char *seg_buf = NULL; off_t seg_size = 0; @@ -107,7 +106,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len, uint64_t *rsvmap_ptr; struct bootblock *bb_ptr; #endif - int i; int result, opt; uint64_t my_kernel, my_dt_offset; uint64_t my_opal_base = 0, my_opal_entry = 0; @@ -162,10 +160,7 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len, } } - cmdline_len = 0; - if (cmdline) - cmdline_len = strlen(cmdline) + 1; - else + if (!cmdline) fprintf(stdout, "Warning: append= option is not passed. Using the first kernel root partition\n"); if (ramdisk && reuse_initrd) @@ -181,7 +176,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len, strncpy(modified_cmdline, cmdline, COMMAND_LINE_SIZE); modified_cmdline[COMMAND_LINE_SIZE - 1] = '\0'; } - modified_cmdline_len = strlen(modified_cmdline); } /* Parse the Elf file */ @@ -219,7 +213,6 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, off_t len, return -1; /* Use new command line. */ cmdline = modified_cmdline; - cmdline_len = strlen(modified_cmdline) + 1; } /* Add v2wrap to the current image */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/3] ppc64: Avoid rfid if no need to clear MSR_LE
Commit a304e2d82a8c3 ("ppc64: purgatory: Reset primary cpu endian to big-endian) changed bctr to rfid. rfid is book3s-only and will cause a fatal exception on book3e. Purgatory is an isolated environment which makes importing information about the subarch awkward, so instead rely on the fact that MSR_LE should never be set on book3e, and the rfid is only needed if MSR_LE is set (and thus needs to be cleared). In theory that MSR bit is reserved on book3e, rather than zero, but in practice I have not seen it set. Signed-off-by: Scott Wood Cc: Samuel Mendoza-Jonas --- v2: new patch purgatory/arch/ppc64/v2wrap.S | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/purgatory/arch/ppc64/v2wrap.S b/purgatory/arch/ppc64/v2wrap.S index 179ade9..3534080 100644 --- a/purgatory/arch/ppc64/v2wrap.S +++ b/purgatory/arch/ppc64/v2wrap.S @@ -116,9 +116,17 @@ master: stw 7,0x5c(4) # and patch it into the kernel mr 3,16# restore dt address + mfmsr 5 + andi. 10,5,1 # test MSR_LE + bne little_endian + + li 5,0 # r5 will be 0 for kernel + mtctr 4 # prepare branch to + bctr# start kernel + +little_endian: # book3s-only mtsrr0 4 # prepare branch to - mfmsr 5 clrrdi 5,5,1 # clear MSR_LE mtsrr1 5 -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/3] ppc64: Add a flag to tell the kernel it's booting from kexec
It needs to know this because the SMP release mechanism for Freescale book3e is different from when booting with normal hardware. In theory we could simulate the normal spin table mechanism, but not (easily) at the addresses U-Boot put in the device tree -- so there'd need to be even more communication between the kernel and kexec to set that up. Signed-off-by: Scott Wood --- v2: Use a device tree property rather than setting a flag in the kernel image, as requested by Michael Ellerman. --- kexec/arch/ppc64/Makefile | 6 +++ kexec/arch/ppc64/fdt.c | 78 + kexec/arch/ppc64/include/arch/fdt.h | 9 + kexec/arch/ppc64/kexec-elf-ppc64.c | 7 4 files changed, 100 insertions(+) create mode 100644 kexec/arch/ppc64/fdt.c create mode 100644 kexec/arch/ppc64/include/arch/fdt.h diff --git a/kexec/arch/ppc64/Makefile b/kexec/arch/ppc64/Makefile index 9a6e475..37cd233 100644 --- a/kexec/arch/ppc64/Makefile +++ b/kexec/arch/ppc64/Makefile @@ -1,11 +1,15 @@ # # kexec ppc64 (linux booting linux) # +include $(srcdir)/kexec/libfdt/Makefile.libfdt + ppc64_KEXEC_SRCS = kexec/arch/ppc64/kexec-elf-rel-ppc64.c ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-zImage-ppc64.c ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-elf-ppc64.c ppc64_KEXEC_SRCS += kexec/arch/ppc64/kexec-ppc64.c ppc64_KEXEC_SRCS += kexec/arch/ppc64/crashdump-ppc64.c +ppc64_KEXEC_SRCS += kexec/arch/ppc64/fdt.c +ppc64_KEXEC_SRCS += $(LIBFDT_SRCS:%=kexec/libfdt/%) ppc64_ARCH_REUSE_INITRD = @@ -13,6 +17,8 @@ ppc64_FS2DT = kexec/fs2dt.c ppc64_FS2DT_INCLUDE = -include $(srcdir)/kexec/arch/ppc64/crashdump-ppc64.h \ -include $(srcdir)/kexec/arch/ppc64/kexec-ppc64.h +ppc64_CPPFLAGS = -I$(srcdir)/kexec/libfdt + dist += kexec/arch/ppc64/Makefile $(ppc64_KEXEC_SRCS) \ kexec/arch/ppc64/kexec-ppc64.h kexec/arch/ppc64/crashdump-ppc64.h \ kexec/arch/ppc64/include/arch/options.h diff --git a/kexec/arch/ppc64/fdt.c b/kexec/arch/ppc64/fdt.c new file mode 100644 index 000..8bc6d2d --- /dev/null +++ b/kexec/arch/ppc64/fdt.c @@ -0,0 +1,78 @@ +/* + * ppc64 fdt fixups + * + * Copyright 2015 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include + +/* + * Let the kernel know it booted from kexec, as some things (e.g. + * secondary CPU release) may work differently. + */ +static int fixup_kexec_prop(void *fdt) +{ + int err, nodeoffset; + + nodeoffset = fdt_subnode_offset(fdt, 0, "chosen"); + if (nodeoffset < 0) + nodeoffset = fdt_add_subnode(fdt, 0, "chosen"); + if (nodeoffset < 0) { + printf("%s: add /chosen %s\n", __func__, + fdt_strerror(nodeoffset)); + return -1; + } + + err = fdt_setprop(fdt, nodeoffset, "linux,booted-from-kexec", + NULL, 0); + if (err < 0) { + printf("%s: couldn't write linux,booted-from-kexec: %s\n", + __func__, fdt_strerror(err)); + return -1; + } + + return 0; +} + + +/* + * For now, assume that the added content fits in the file. + * This should be the case when flattening from /proc/device-tree, + * and when passing in a dtb, dtc can be told to add padding. + */ +int fixup_dt(char **fdt, off_t *size) +{ + int ret; + + *size += 4096; + *fdt = realloc(*fdt, *size); + if (!*fdt) { + fprintf(stderr, "%s: out of memory\n", __func__); + return -1; + } + + ret = fdt_open_into(*fdt, *fdt, *size); + if (ret < 0) { + fprintf(stderr, "%s: fdt_open_into: %s\n", __func__, + fdt_strerror(ret)); + return -1; + } + + ret = fixup_kexec_prop(*fdt); + if (ret < 0) + return ret; + + return 0; +} diff --git a/kexec/arch/ppc64/include/arch/fdt.h b/kexec/arch/ppc64/include/arch/fdt.h new file mode 100644 index 000..14f8be2 --- /dev/null +++ b/kexec/arch/ppc64/include/arch/fdt.h @@ -0,0 +1,9 @@ +#ifndef KEXEC_ARCH_PPC64_FDT +#define KEXEC_ARCH_PPC64_FDT + +#include + +int fixup_dt(char **fdt, off_t *size); + +#endif + diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c b/kexec/arch/ppc64/kexec-elf-ppc64.c index adcee4c..ddd3de8 100644 --- a/kexec/arch/ppc64/kexec-elf-ppc64.c +++ b/kexec/arch/ppc64/kexec-elf-ppc64.c @@ -37,6 +37,8 @@ #include "kexec-ppc64.h" #include "../../fs2dt.h" #include "crashdump-pp
Re: [kexec-lite PATCH V2] trampoline: Reset primary cpu endian to big-endian
On Wed, 2015-07-08 at 13:49 +1000, Samuel Mendoza-Jonas wrote: > On 08/07/15 13:37, Scott Wood wrote: > > On Wed, 2015-07-08 at 13:29 +1000, Samuel Mendoza-Jonas wrote: > > > Older big-endian ppc64 kernels don't include the FIXUP_ENDIAN check, > > > meaning if we kexec from a little-endian kernel the target kernel will > > > fail to boot. > > > Returning to big-endian before we enter the target kernel ensures that > > > the target kernel can boot whether or not it includes FIXUP_ENDIAN. > > > > > > Signed-off-by: Samuel Mendoza-Jonas > > > --- > > > V2: As suggested by Anton take advantage of the rfid call and switch off > > > MSR_LE and branch to the target kernel in the same step. > > > > > > kexec_trampoline.S | 11 +-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/kexec_trampoline.S b/kexec_trampoline.S > > > index a3eb314..3751112 100644 > > > --- a/kexec_trampoline.S > > > +++ b/kexec_trampoline.S > > > @@ -88,8 +88,15 @@ start: > > > > > > li r5,0 > > > > > > - mtctr r4 > > > - bctr > > > + mtsrr0 r4 > > > + > > > + mfmsr r5 > > > + clrrdi r5,r5,1 /* Clear MSR_LE */ > > > + mtsrr1 r5 > > > + > > > + li r5,0 > > > + > > > + rfid > > > > Is kexec-lite meant to be specific to book3s-64? The README just says "A > > simple kexec for flattened device tree platforms" and I see a > > __powerpc64__ > > ifdef in kexec_trampoline.S (but not in the above patch)... > > > > -Scott > > > > I believe that particular ifdef is to check if we're little-endian when > reading > the device tree, but that's still a good point - I'll check with Anton. It looks like this ended up going into main kexec, which means I get to find some way to distinguish book3s from book3e to avoid that rfid. Yay. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] perf: Fix build break on powerpc due to sample_reg_masks
On Wed, 2015-09-30 at 16:45 -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 30, 2015 at 09:09:09PM +0200, Jiri Olsa escreveu: > > On Wed, Sep 30, 2015 at 11:28:36AM -0700, Sukadev Bhattiprolu wrote: > > > From e29a7236122c4d807ec9ebc721b5d7d75c8d Mon Sep 17 00:00:00 2001 > > > From: Sukadev Bhattiprolu > > > Date: Thu, 24 Sep 2015 17:53:49 -0400 > > > Subject: [PATCH v2] perf: Fix build break on powerpc due to > > > sample_reg_masks > > > > > > perf_regs.c does not get built on Powerpc as CONFIG_PERF_REGS is false. > > > So the weak definition for 'sample_regs_masks' doesn't get picked up. > > > > > > Adding perf_regs.o to util/Build unconditionally, exposes a redefinition > > > error for 'perf_reg_value()' function (due to the static inline version > > > in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that > > > function. > > > > > > Signed-off-by: Sukadev Bhattiprolu > > > > Acked-by: Jiri Olsa > > Thanks, applied. Is this going to Linus' tree any time soon? I have folks pinging me to say that perf is broken on powerpc. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v9 1/4] perf, kvm/{x86, s390}: Remove dependency on uapi/kvm_perf.h
Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic discovery of kvm events (if its needed). To do this, some extern variables have been introduced with which we can keep the generic functions generic. Signed-off-by: Hemant Kumar --- Changelog: v8 to v9: - Removed the macro definitions. - Changed the access of kvm_entry_trace and kvm_exit_trace - Removed unnecessary formatting. v7 to v8: - Removed unnecessary __unused_parameter modifiers. tools/perf/arch/s390/util/kvm-stat.c | 8 +++- tools/perf/arch/x86/util/kvm-stat.c | 14 +++--- tools/perf/builtin-kvm.c | 32 ++-- tools/perf/util/kvm-stat.h | 5 + 4 files changed, 45 insertions(+), 14 deletions(-) diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c index a5dbc07..b85a94b 100644 --- a/tools/perf/arch/s390/util/kvm-stat.c +++ b/tools/perf/arch/s390/util/kvm-stat.c @@ -10,7 +10,7 @@ */ #include "../../util/kvm-stat.h" -#include +#include define_exit_reasons_table(sie_exit_reasons, sie_intercept_code); define_exit_reasons_table(sie_icpt_insn_codes, icpt_insn_codes); @@ -18,6 +18,12 @@ define_exit_reasons_table(sie_sigp_order_codes, sigp_order_codes); define_exit_reasons_table(sie_diagnose_codes, diagnose_codes); define_exit_reasons_table(sie_icpt_prog_codes, icpt_prog_codes); +const char *vcpu_id_str = "id"; +const int decode_str_len = 40; +const char *kvm_exit_reason = "icptcode"; +const char *kvm_entry_trace = "kvm:kvm_s390_sie_enter"; +const char *kvm_exit_trace = "kvm:kvm_s390_sie_exit"; + static void event_icpt_insn_get_key(struct perf_evsel *evsel, struct perf_sample *sample, struct event_key *key) diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c index 14e4e66..babefda 100644 --- a/tools/perf/arch/x86/util/kvm-stat.c +++ b/tools/perf/arch/x86/util/kvm-stat.c @@ -1,5 +1,7 @@ #include "../../util/kvm-stat.h" -#include +#include +#include +#include define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS); define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS); @@ -11,6 +13,12 @@ static struct kvm_events_ops exit_events = { .name = "VM-EXIT" }; +const char *vcpu_id_str = "vcpu_id"; +const int decode_str_len = 20; +const char *kvm_exit_reason = "exit_reason"; +const char *kvm_entry_trace = "kvm:kvm_entry"; +const char *kvm_exit_trace = "kvm:kvm_exit"; + /* * For the mmio events, we treat: * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry @@ -65,7 +73,7 @@ static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, char *decode) { - scnprintf(decode, DECODE_STR_LEN, "%#lx:%s", + scnprintf(decode, decode_str_len, "%#lx:%s", (unsigned long)key->key, key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); } @@ -109,7 +117,7 @@ static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, char *decode) { - scnprintf(decode, DECODE_STR_LEN, "%#llx:%s", + scnprintf(decode, decode_str_len, "%#llx:%s", (unsigned long long)key->key, key->info ? "POUT" : "PIN"); } diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index fc1cffb..5104c7e 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -31,7 +31,6 @@ #include #ifdef HAVE_KVM_STAT_SUPPORT -#include #include "util/kvm-stat.h" void exit_event_get_key(struct perf_evsel *evsel, @@ -39,12 +38,12 @@ void exit_event_get_key(struct perf_evsel *evsel, struct event_key *key) { key->info = 0; - key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON); + key->key = perf_evsel__intval(evsel, sample, kvm_exit_reason); } bool kvm_exit_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, KVM_EXIT_TRACE); + return !strcmp(evsel->name, kvm_exit_trace); } bool exit_event_begin(struct perf_evsel *evsel, @@ -60,7 +59,7 @@ bool exit_event_begin(struct perf_evsel *evsel, bool kvm_entry_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, KVM_ENTRY_TRACE); + return !strcmp(evsel->name, kvm_entry_trace); } bool exit_event_end(struct perf_evsel *evsel, @@ -92,7 +91,7 @@ void exit_event_decode_key(struct perf_kvm_stat *kvm, const char *exit_reason = get_exit_reason(kvm, key->exit_reasons, key->key); - scnprintf(decode, DECODE_STR_LEN, "%s", exit_reason); + scnprintf(decode, decode_str_len, "%s", exit_reason); } static bool register_kvm_events_ops(struct perf_kvm_stat *kvm) @@ -358,7 +357,12 @@
[PATCH v9 2/4] perf,kvm/{x86,s390}: Remove const from kvm_events_tp
This patch removes the "const" qualifier from kvm_events_tp declaration to account for the fact that some architectures may need to update this variable dynamically. For instance, powerpc will need to update this variable dynamically depending on the machine type. Signed-off-by: Hemant Kumar --- tools/perf/arch/s390/util/kvm-stat.c | 2 +- tools/perf/arch/x86/util/kvm-stat.c | 2 +- tools/perf/util/kvm-stat.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c index b85a94b..ed57df2 100644 --- a/tools/perf/arch/s390/util/kvm-stat.c +++ b/tools/perf/arch/s390/util/kvm-stat.c @@ -79,7 +79,7 @@ static struct kvm_events_ops exit_events = { .name = "VM-EXIT" }; -const char * const kvm_events_tp[] = { +const char *kvm_events_tp[] = { "kvm:kvm_s390_sie_enter", "kvm:kvm_s390_sie_exit", "kvm:kvm_s390_intercept_instruction", diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c index babefda..b63d4be 100644 --- a/tools/perf/arch/x86/util/kvm-stat.c +++ b/tools/perf/arch/x86/util/kvm-stat.c @@ -129,7 +129,7 @@ static struct kvm_events_ops ioport_events = { .name = "IO Port Access" }; -const char * const kvm_events_tp[] = { +const char *kvm_events_tp[] = { "kvm:kvm_entry", "kvm:kvm_exit", "kvm:kvm_mmio", diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h index dd55548..c965dc8 100644 --- a/tools/perf/util/kvm-stat.h +++ b/tools/perf/util/kvm-stat.h @@ -133,7 +133,7 @@ bool kvm_entry_event(struct perf_evsel *evsel); */ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid); -extern const char * const kvm_events_tp[]; +extern const char *kvm_events_tp[]; extern struct kvm_reg_events_ops kvm_reg_events_ops[]; extern const char * const kvm_skip_events[]; extern const char *vcpu_id_str; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v9 3/4] perf,kvm/powerpc: Port perf kvm stat to powerpc
perf kvm can be used to analyze guest exit reasons. This support already exists in x86. Hence, porting it to powerpc. - To trace KVM events : perf kvm stat record If many guests are running, we can track for a specific guest by using --pid as in : perf kvm stat record --pid - To see the results : perf kvm stat report The result shows the number of exits (from the guest context to host/hypervisor context) grouped by their respective exit reasons with their frequency. Since, different powerpc machines have different KVM tracepoints, this patch discovers the available tracepoints dynamically and accordingly looks for them. If any single tracepoint is not present, this support won't be enabled for reporting. To record, this will fail if any of the events we are looking to record isn't available. Right now, its only supported on PowerPC Book3S_HV architectures. To analyze the different exits, group them and present them (in a slight descriptive way) to the user, we need a mapping between the "exit code" (dumped in the kvm_guest_exit tracepoint data) and to its related Interrupt vector description (exit reason). This patch adds this mapping in book3s_hv_exits.h. It records on two available KVM tracepoints for book3s_hv: "kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter". Here is a sample o/p: # pgrep qemu 19378 60515 2 Guests are running on the host. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 Analyze events for pid(s) 60515, all VCPUs: VM-EXITSamples Samples% Time%Min TimeMax Time Avg time SYSCALL 914163.67% 7.49% 1.26us 5782.39us 9.87us ( +- 6.46% ) H_DATA_STORAGE 411428.66% 5.07% 1.72us 4597.68us 14.84us ( +- 20.06% ) HV_DECREMENTER418 2.91% 4.26% 0.70us 30002.22us 122.58us ( +- 70.29% ) EXTERNAL392 2.73% 0.06% 0.64us104.10us 1.94us ( +- 18.83% ) RETURN_TO_HOST287 2.00%83.11% 1.53us 124240.15us 3486.52us ( +- 16.81% ) H_INST_STORAGE 5 0.03% 0.00% 1.88us 3.73us 2.39us ( +- 14.20% ) Total Samples:14357, Total events handled time:1203918.42us. Signed-off-by: Srikar Dronamraju Signed-off-by: Hemant Kumar --- Changelog: v8 to v9: - Moved the book3s specific setup into one function. - Removed the macros (which were being used only once). - Formatting changes. v7 to v8: - Fixed a perf kvm stat live bug. v6 to v7: - Removed dependency on uapi. v4 to v5: - Removed dependency on arch/powerpc/kvm/trace_book3s.h and added them in the userspace side. - No more arch side dependency. v1 to v3: - Split the patches for powerpc and perf tools/perf/arch/powerpc/Makefile | 2 + tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/book3s_hv_exits.h | 33 tools/perf/arch/powerpc/util/kvm-stat.c| 100 + tools/perf/builtin-kvm.c | 18 + tools/perf/util/kvm-stat.h | 1 + 6 files changed, 155 insertions(+) create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile index 7fbca17..9f9cea3 100644 --- a/tools/perf/arch/powerpc/Makefile +++ b/tools/perf/arch/powerpc/Makefile @@ -1,3 +1,5 @@ ifndef NO_DWARF PERF_HAVE_DWARF_REGS := 1 endif + +HAVE_KVM_STAT_SUPPORT := 1 diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index 7b8b0d1..c8fe207 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,5 +1,6 @@ libperf-y += header.o libperf-y += sym-handling.o +libperf-y += kvm-stat.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/book3s_hv_exits.h b/tools/perf/arch/powerpc/util/book3s_hv_exits.h new file mode 100644 index 000..e68ba2d --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_hv_exits.h @@ -0,0 +1,33 @@ +#ifndef ARCH_PERF_BOOK3S_HV_EXITS_H +#define ARCH_PERF_BOOK3S_HV_EXITS_H + +/* + * PowerPC Interrupt vectors : exit code to name mapping + */ + +#define kvm_trace_symbol_exit \ + {0x0, "RETURN_TO_HOST"}, \ + {0x100, "SYSTEM_RESET"}, \ + {0x200, "MACHINE_CHECK"}, \ + {0x300, "DATA_STORAGE"}, \ + {0x380, "DATA_SEGMENT"}, \ + {0x400, "INST_STORAGE"}, \ + {0x480, "INST_SEGMENT"}, \ + {0x500, "EXTERNAL"}, \ + {0x501, "EXTERNAL_LEVEL"}, \ + {0x502, "EXTERNAL_HV"}, \ + {0x600, "ALIGNMENT"}, \ + {0x700, "PROGRAM"}, \ + {0x800, "FP_UNAVAIL"}, \ + {0x900, "DECREMENTER"}, \ + {0x980, "HV_DECREMENTER"}, \ +
[PATCH v9 4/4] perf,kvm/powerpc: Add support for HCALL reasons
Powerpc provides hcall events that also provides insights into guest behaviour. Enhance perf kvm stat to record and analyze hcall events. - To trace hcall events : perf kvm stat record - To show the results : perf kvm stat report --event=hcall The result shows the number of hypervisor calls from the guest grouped by their respective reasons displayed with the frequency. This patch makes use of two additional tracepoints "kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall codes to their respective names, it needs a mapping. Such mapping is added in this patch in book3s_hcalls.h. # pgrep qemu A sample output : 19378 60515 2 VMs running. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 --event=hcall Analyze events for all VMs, all VCPUs: HCALL-EVENTSamples Samples% Time%Min TimeMax Time Avg time H_IPI82266.08%88.10% 0.63us 11.38us 2.05us ( +- 1.42% ) H_SEND_CRQ14411.58% 3.77% 0.41us 0.88us 0.50us ( +- 1.47% ) H_VIO_SIGNAL118 9.49% 2.86% 0.37us 0.83us 0.47us ( +- 1.43% ) H_PUT_TERM_CHAR 76 6.11% 2.07% 0.37us 0.90us 0.52us ( +- 2.43% ) H_GET_TERM_CHAR 74 5.95% 2.23% 0.37us 1.70us 0.58us ( +- 4.77% ) H_RTAS 6 0.48% 0.85% 1.10us 9.25us 2.70us ( +- 48.57% ) H_PERFMON 4 0.32% 0.12% 0.41us 0.96us 0.59us ( +- 20.92% ) Total Samples:1244, Total events handled time:1916.69us. Signed-off-by: Hemant Kumar --- Changelog: v8 to v9: - Removed the macros (which were being used only once). v6 to v7: - Removed dependency on uapi. v4 to v5: - Removed dependency on arch/powerpc/include/asm/hvall.h and added them in userspace side. - No more arch side dependency. v1 to v2: - Split the patches for powerpc and perf. tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++ tools/perf/arch/powerpc/util/kvm-stat.c | 65 +- 2 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h b/tools/perf/arch/powerpc/util/book3s_hcalls.h new file mode 100644 index 000..0dd6b7f --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h @@ -0,0 +1,123 @@ +#ifndef ARCH_PERF_BOOK3S_HV_HCALLS_H +#define ARCH_PERF_BOOK3S_HV_HCALLS_H + +/* + * PowerPC HCALL codes : hcall code to name mapping + */ +#define kvm_trace_symbol_hcall \ + {0x4, "H_REMOVE"}, \ + {0x8, "H_ENTER"}, \ + {0xc, "H_READ"},\ + {0x10, "H_CLEAR_MOD"}, \ + {0x14, "H_CLEAR_REF"}, \ + {0x18, "H_PROTECT"},\ + {0x1c, "H_GET_TCE"},\ + {0x20, "H_PUT_TCE"},\ + {0x24, "H_SET_SPRG0"}, \ + {0x28, "H_SET_DABR"}, \ + {0x2c, "H_PAGE_INIT"}, \ + {0x30, "H_SET_ASR"},\ + {0x34, "H_ASR_ON"}, \ + {0x38, "H_ASR_OFF"},\ + {0x3c, "H_LOGICAL_CI_LOAD"},\ + {0x40, "H_LOGICAL_CI_STORE"}, \ + {0x44, "H_LOGICAL_CACHE_LOAD"}, \ + {0x48, "H_LOGICAL_CACHE_STORE"},\ + {0x4c, "H_LOGICAL_ICBI"}, \ + {0x50, "H_LOGICAL_DCBF"}, \ + {0x54, "H_GET_TERM_CHAR"}, \ + {0x58, "H_PUT_TERM_CHAR"}, \ + {0x5c, "H_REAL_TO_LOGICAL"},\ + {0x60, "H_HYPERVISOR_DATA"},\ + {0x64, "H_EOI"},\ + {0x68, "H_CPPR"}, \ + {0x6c, "H_IPI"},\ + {0x70, "H_IPOLL"}, \ + {0x74, "H_XIRR"}, \ + {0x78, "H_MIGRATE_DMA"},\ + {0x7c, "H_PERFMON"},\ + {0xdc, "H_REGISTER_VPA"}, \ + {0xe0, "H_CEDE"}, \ + {0xe4, "H_CONFER"},
Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV
On Fri, 2015-10-02 at 20:07 +1000, Alexey Kardashevskiy wrote: > On 08/19/2015 12:01 PM, Wei Yang wrote: > > In original design, it tries to group VFs to enable more number of VFs in > > the > > system, when VF BAR is bigger than 64MB. This design has a flaw in which one > > error on a VF will interfere other VFs in the same group. > > > > This patch series change this design by using M64 BAR in Single PE mode to > > cover only one VF BAR. By doing so, it gives absolute isolation between VFs. > > > > Wei Yang (6): > >powerpc/powernv: don't enable SRIOV when VF BAR has non > > 64bit-prefetchable BAR > >powerpc/powernv: simplify the calculation of iov resource alignment > >powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR > >powerpc/powernv: replace the hard coded boundary with gate > >powerpc/powernv: boundary the total VF BAR size instead of the > > individual one > >powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE > > mode > > > > arch/powerpc/include/asm/pci-bridge.h |7 +- > > arch/powerpc/platforms/powernv/pci-ioda.c | 328 > > +++-- > > 2 files changed, 175 insertions(+), 160 deletions(-) > > I have posted few comments but in general the patchset makes things simpler > by removing a compound PE and does not seem to make things worse so: > > Acked-by: Alexey Kardashevskiy Thanks for reviewing it. I'll wait for a v5 that incorporates your comments. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset
On PowerNV platform, the PE is kept in frozen state until the PE reset is completed to avoid recursive EEH error caused by MMIO access during the period of EEH reset. The PE's frozen state is cleared after BARs of PCI device included in the PE are restored and enabled. However, we needn't clear the frozen state for PHB PE explicitly at this point as there is no real PE for PHB PE. As the PHB PE is always binding with PE#0, we actually clear PE#0, which is wrong. It doesn't incur any problem though. This checks if the PE is PHB PE and doesn't clear the frozen state if it is. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh_driver.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 89eb4bc..3a626ed 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus) eeh_ops->configure_bridge(pe); eeh_pe_restore_bars(pe); - /* Clear frozen state */ - rc = eeh_clear_pe_frozen_state(pe, false); - if (rc) - return rc; + /* +* If it's PHB PE, the frozen state on all available PEs should have +* been cleared by the PHB reset. Otherwise, we unfreeze the PE and its +* child PEs because they might be in frozen state. +*/ + if (!(pe->type & EEH_PE_PHB)) { + rc = eeh_clear_pe_frozen_state(pe, false); + if (rc) + return rc; + } /* Give the system 5 seconds to finish running the user-space * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/5] powerpc/eeh: Force reset on fenced PHB
On fenced PHB, the error handlers in the drivers of its subordinate devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset will be issued during the recovery. It's conflicting with the fact that fenced PHB won't be recovered without reset. This limits the return value from the error handlers in the drivers of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during recovery. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh_driver.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 32178a4..76d918b 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -664,9 +664,17 @@ static void eeh_handle_normal_event(struct eeh_pe *pe) * to accomplish the reset. Each child gets a report of the * status ... if any child can't handle the reset, then the entire * slot is dlpar removed and added. +* +* When the PHB is fenced, we have to issue a reset to recover from +* the error. Override the result if necessary to have partially +* hotplug for this case. */ pr_info("EEH: Notify device drivers to shutdown\n"); eeh_pe_dev_traverse(pe, eeh_report_error, &result); + if ((pe->type & EEH_PE_PHB) && + result != PCI_ERS_RESULT_NEED_NONE && + result != PCI_ERS_RESULT_NEED_RESET) + result = PCI_ERS_RESULT_NEED_RESET; /* Get the current PCI slot state. This can take a long time, * sometimes over 300 seconds for certain systems. -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state()
This cleans up pseries_eeh_get_state(), no functional changes: * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output argument is zero to avoid nested if statements. * Skip clearing bits in the PE state represented by variable "result" to simplify the code. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/pseries/eeh_pseries.c | 60 1 file changed, 26 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 1ba55d0..ac3ffd9 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -433,42 +433,34 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state) return ret; /* Parse the result out */ - result = 0; - if (rets[1]) { - switch(rets[0]) { - case 0: - result &= ~EEH_STATE_RESET_ACTIVE; - result |= EEH_STATE_MMIO_ACTIVE; - result |= EEH_STATE_DMA_ACTIVE; - break; - case 1: - result |= EEH_STATE_RESET_ACTIVE; - result |= EEH_STATE_MMIO_ACTIVE; - result |= EEH_STATE_DMA_ACTIVE; - break; - case 2: - result &= ~EEH_STATE_RESET_ACTIVE; - result &= ~EEH_STATE_MMIO_ACTIVE; - result &= ~EEH_STATE_DMA_ACTIVE; - break; - case 4: - result &= ~EEH_STATE_RESET_ACTIVE; - result &= ~EEH_STATE_MMIO_ACTIVE; - result &= ~EEH_STATE_DMA_ACTIVE; - result |= EEH_STATE_MMIO_ENABLED; - break; - case 5: - if (rets[2]) { - if (state) *state = rets[2]; - result = EEH_STATE_UNAVAILABLE; - } else { - result = EEH_STATE_NOT_SUPPORT; - } - break; - default: + if (!rets[1]) + return EEH_STATE_NOT_SUPPORT; + + switch(rets[0]) { + case 0: + result = EEH_STATE_MMIO_ACTIVE | +EEH_STATE_DMA_ACTIVE; + break; + case 1: + result = EEH_STATE_RESET_ACTIVE | +EEH_STATE_MMIO_ACTIVE | +EEH_STATE_DMA_ACTIVE; + break; + case 2: + result = 0; + break; + case 4: + result = EEH_STATE_MMIO_ENABLED; + break; + case 5: + if (rets[2]) { + if (state) *state = rets[2]; + result = EEH_STATE_UNAVAILABLE; + } else { result = EEH_STATE_NOT_SUPPORT; } - } else { + break; + default: result = EEH_STATE_NOT_SUPPORT; } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path
When one of below two flags or both of them are marked in the PE state, the PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE or EEH_STATE_MMIO_ENABLED. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index e968533..ddbf406 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -630,7 +630,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function) */ switch (function) { case EEH_OPT_THAW_MMIO: - active_flag = EEH_STATE_MMIO_ACTIVE; + active_flag = EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED; break; case EEH_OPT_THAW_DMA: active_flag = EEH_STATE_DMA_ACTIVE; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/5] powerpc/eeh: More relexed hotplug criterion
Currently, we rely on the existence of struct pci_driver::err_handler to judge if the corresponding PCI device should be unplugged during EEH recovery (partially hotplug case). However, it's not elaborate. some device drivers are implementing part of the EEH error handlers to collect diag-data. That means the driver still expects a hotplug to recover from the EEH error. This makes the hotplug criterion more relaxed: if the device driver doesn't provide all necessary EEH error handlers, it will experience hotplug during EEH recovery. Signed-off-by: Gavin Shan --- arch/powerpc/kernel/eeh_driver.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 3a626ed..32178a4 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata) driver = eeh_pcid_get(dev); if (driver) { eeh_pcid_put(dev); - if (driver->err_handler) + if (driver->err_handler && + driver->err_handler->error_detected && + driver->err_handler->slot_reset && + driver->err_handler->resume) return NULL; } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] cxl: Fix number of allocated pages in SPA
Excerpts from Michael Ellerman's message of 2015-10-06 17:19:02 +1100: > On Fri, 2015-10-02 at 16:01 +0200, Christophe Lombard wrote: > > This moves the initialisation of the num_procs to before the SPA > > allocation. > > Why? What does it fix? I can't tell from the diff or the change log. This will mean we only ever allocate a fixed number of pages for the scheduled process area (which in itself looks like it has a minor bug as it will start trying at two pages instead of one), which will limit us to 958 processes with 2 x 64K pages. If we actually try to use more processes than that we'd probably overrun the buffer and corrupt memory or crash. The only reason we haven't hit this out in the field so far is any AFUs the requires at least three interrupts per process is already limited to less processes than that anyway (e.g. min of 4 interrupts limits it to 509 processes, and all the AFUs I'm aware of require at least that many interrupts), but we could hit it on an AFU that requires 0, 1 or 2 interrupts per process, or when using 4K pages. This fix should go to stable. @Christophe, can you resend with this info in the commit message? Cheers, -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 00/18] powerpc/fsl-book3e-64: kexec/kdump support
This patchset adds support for kexec and kdump to e5500 and e6500 based systems running 64-bit kernels. It depends on the kexec-tools patch http://patchwork.ozlabs.org/patch/527050/ ("ppc64: Add a flag to tell the kernel it's booting from kexec"). Scott Wood (12): powerpc/fsl-booke-64: Allow booting from the secondary thread powerpc/fsl-corenet: Disable coreint if kexec is enabled powerpc/85xx: Don't use generic timebase sync on 64-bit powerpc/fsl_pci: Don't set up inbound windows in kdump crash kernel powerpc/85xx: Load all early TLB entries at once powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry powerpc/e6500: kexec: Handle hardware threads powerpc/book3e/kdump: Enable crash_kexec_wait_realmode powerpc/book3e-64: Don't limit paca to 256 MiB powerpc/book3e-64/kexec: Enable SMP release powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32 powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop Tiejun Chen (6): powerpc/85xx: Implement 64-bit kexec support powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts powerpc/booke64: Fix args to copy_and_flush powerpc/book3e: support CONFIG_RELOCATABLE powerpc/book3e-64/kexec: create an identity TLB mapping powerpc/book3e-64: Enable kexec Documentation/devicetree/bindings/chosen.txt | 8 +++ arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/exception-64e.h | 4 +- arch/powerpc/include/asm/page.h | 7 ++- arch/powerpc/kernel/crash.c | 6 +- arch/powerpc/kernel/exceptions-64e.S | 17 -- arch/powerpc/kernel/head_64.S | 43 -- arch/powerpc/kernel/machine_kexec_64.c| 18 ++ arch/powerpc/kernel/misc_64.S | 60 ++- arch/powerpc/kernel/paca.c| 6 +- arch/powerpc/kernel/setup_64.c| 25 +++- arch/powerpc/mm/fsl_booke_mmu.c | 35 --- arch/powerpc/mm/mmu_decl.h| 4 +- arch/powerpc/mm/tlb_nohash.c | 41 ++--- arch/powerpc/mm/tlb_nohash_low.S | 63 arch/powerpc/platforms/85xx/corenet_generic.c | 4 ++ arch/powerpc/platforms/85xx/smp.c | 86 --- arch/powerpc/sysdev/fsl_pci.c | 84 +++--- 18 files changed, 443 insertions(+), 70 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 01/18] powerpc/fsl-booke-64: Allow booting from the secondary thread
This allows SMP kernels to work as kdump crash kernels. While crash kernels don't really need to be SMP, this prevents things from breaking if a user does it anyway (which is not something you want to only find out once the main kernel has crashed in the field, especially if whether it works or not depends on which cpu crashed). Signed-off-by: Scott Wood --- arch/powerpc/platforms/85xx/smp.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index b8b8216..c2ded03 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -173,15 +173,22 @@ static inline u32 read_spin_table_addr_l(void *spin_table) static void wake_hw_thread(void *info) { void fsl_secondary_thread_init(void); - unsigned long imsr1, inia1; + unsigned long imsr, inia; int nr = *(const int *)info; - imsr1 = MSR_KERNEL; - inia1 = *(unsigned long *)fsl_secondary_thread_init; - - mttmr(TMRN_IMSR1, imsr1); - mttmr(TMRN_INIA1, inia1); - mtspr(SPRN_TENS, TEN_THREAD(1)); + imsr = MSR_KERNEL; + inia = *(unsigned long *)fsl_secondary_thread_init; + + if (cpu_thread_in_core(nr) == 0) { + /* For when we boot on a secondary thread with kdump */ + mttmr(TMRN_IMSR0, imsr); + mttmr(TMRN_INIA0, inia); + mtspr(SPRN_TENS, TEN_THREAD(0)); + } else { + mttmr(TMRN_IMSR1, imsr); + mttmr(TMRN_INIA1, inia); + mtspr(SPRN_TENS, TEN_THREAD(1)); + } smp_generic_kick_cpu(nr); } @@ -224,6 +231,12 @@ static int smp_85xx_kick_cpu(int nr) smp_call_function_single(primary, wake_hw_thread, &nr, 0); return 0; + } else if (cpu_thread_in_core(boot_cpuid) != 0 && + cpu_first_thread_sibling(boot_cpuid) == nr) { + if (WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_SMT))) + return -ENOENT; + + smp_call_function_single(boot_cpuid, wake_hw_thread, &nr, 0); } #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 02/18] powerpc/fsl-corenet: Disable coreint if kexec is enabled
Problems have been observed in coreint (EPR) mode if interrupts are left pending (due to the lack of device quiescence with kdump) after having tried to deliver to a CPU but unable to deliver due to MSR[EE] -- interrupts no longer get reliably delivered in the new kernel. I tried various ways of fixing it up inside the crash kernel itself, and none worked (including resetting the entire mpic). Masking all interrupts and issuing EOIs in the crashing kernel did help a lot of the time, but the behavior was not consistent. Thus, stick to standard IACK mode when kdump is a possibility. Signed-off-by: Scott Wood --- Previously I discussed the possibility of removing coreint entirely, but I think we want to keep it for virtualized guests. --- arch/powerpc/platforms/85xx/corenet_generic.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c index b395571..04ffbcb 100644 --- a/arch/powerpc/platforms/85xx/corenet_generic.c +++ b/arch/powerpc/platforms/85xx/corenet_generic.c @@ -214,7 +214,11 @@ define_machine(corenet_generic) { .pcibios_fixup_bus = fsl_pcibios_fixup_bus, .pcibios_fixup_phb = fsl_pcibios_fixup_phb, #endif +#ifdef CONFIG_KEXEC + .get_irq= mpic_get_irq, +#else .get_irq= mpic_get_coreint_irq, +#endif .restart= fsl_rstcr_restart, .calibrate_decr = generic_calibrate_decr, .progress = udbg_progress, -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 03/18] powerpc/85xx: Don't use generic timebase sync on 64-bit
85xx currently uses the generic timebase sync mechanism when CONFIG_KEXEC is enabled, because 32-bit 85xx kexec support does a hard reset of each core. 64-bit 85xx kexec does not do this, so we neither need nor want this (nor is the generic timebase sync code built on ppc64). FWIW, I don't like the fact that the hard reset is done on 32-bit kexec, and I especially don't like the timebase sync being triggered only on the presence of CONFIG_KEXEC rather than actually booting in that environment, but that's beyond the scope of this patch... Signed-off-by: Scott Wood --- arch/powerpc/platforms/85xx/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index c2ded03..a0763be 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -344,7 +344,7 @@ struct smp_ops_t smp_85xx_ops = { .cpu_disable= generic_cpu_disable, .cpu_die= generic_cpu_die, #endif -#ifdef CONFIG_KEXEC +#if defined(CONFIG_KEXEC) && !defined(CONFIG_PPC64) .give_timebase = smp_generic_give_timebase, .take_timebase = smp_generic_take_timebase, #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 04/18] powerpc/fsl_pci: Don't set up inbound windows in kdump crash kernel
Otherwise, because the top end of the crash kernel is treated as the absolute top of memory rather than the beginning of a reserved region, in-flight DMA from the previous kernel that targets areas above the crash kernel can trigger a storm of PCI errors. We only do this for kdump, not normal kexec, in case kexec is being used to upgrade to a kernel that wants a different inbound memory map. Signed-off-by: Scott Wood Cc: Mingkai Hu --- v2: new patch arch/powerpc/sysdev/fsl_pci.c | 84 +++ 1 file changed, 61 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index ebc1f412..98d671c 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -179,6 +179,19 @@ static int setup_one_atmu(struct ccsr_pci __iomem *pci, return i; } +static bool is_kdump(void) +{ + struct device_node *node; + + node = of_find_node_by_type(NULL, "memory"); + if (!node) { + WARN_ON_ONCE(1); + return false; + } + + return of_property_read_bool(node, "linux,usable-memory"); +} + /* atmu setup for fsl pci/pcie controller */ static void setup_pci_atmu(struct pci_controller *hose) { @@ -192,6 +205,16 @@ static void setup_pci_atmu(struct pci_controller *hose) const char *name = hose->dn->full_name; const u64 *reg; int len; + bool setup_inbound; + + /* +* If this is kdump, we don't want to trigger a bunch of PCI +* errors by closing the window on in-flight DMA. +* +* We still run most of the function's logic so that things like +* hose->dma_window_size still get set. +*/ + setup_inbound = !is_kdump(); if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) { if (in_be32(&pci->block_rev1) >= PCIE_IP_REV_2_2) { @@ -204,8 +227,11 @@ static void setup_pci_atmu(struct pci_controller *hose) /* Disable all windows (except powar0 since it's ignored) */ for(i = 1; i < 5; i++) out_be32(&pci->pow[i].powar, 0); - for (i = start_idx; i < end_idx; i++) - out_be32(&pci->piw[i].piwar, 0); + + if (setup_inbound) { + for (i = start_idx; i < end_idx; i++) + out_be32(&pci->piw[i].piwar, 0); + } /* Setup outbound MEM window */ for(i = 0, j = 1; i < 3; i++) { @@ -278,6 +304,7 @@ static void setup_pci_atmu(struct pci_controller *hose) /* Setup inbound mem window */ mem = memblock_end_of_DRAM(); + pr_info("%s: end of DRAM %llx\n", __func__, mem); /* * The msi-address-64 property, if it exists, indicates the physical @@ -320,12 +347,14 @@ static void setup_pci_atmu(struct pci_controller *hose) piwar |= ((mem_log - 1) & PIWAR_SZ_MASK); - /* Setup inbound memory window */ - out_be32(&pci->piw[win_idx].pitar, 0x); - out_be32(&pci->piw[win_idx].piwbar, 0x); - out_be32(&pci->piw[win_idx].piwar, piwar); - win_idx--; + if (setup_inbound) { + /* Setup inbound memory window */ + out_be32(&pci->piw[win_idx].pitar, 0x); + out_be32(&pci->piw[win_idx].piwbar, 0x); + out_be32(&pci->piw[win_idx].piwar, piwar); + } + win_idx--; hose->dma_window_base_cur = 0x; hose->dma_window_size = (resource_size_t)sz; @@ -343,13 +372,15 @@ static void setup_pci_atmu(struct pci_controller *hose) piwar = (piwar & ~PIWAR_SZ_MASK) | (mem_log - 1); - /* Setup inbound memory window */ - out_be32(&pci->piw[win_idx].pitar, 0x); - out_be32(&pci->piw[win_idx].piwbear, - pci64_dma_offset >> 44); - out_be32(&pci->piw[win_idx].piwbar, - pci64_dma_offset >> 12); - out_be32(&pci->piw[win_idx].piwar, piwar); + if (setup_inbound) { + /* Setup inbound memory window */ + out_be32(&pci->piw[win_idx].pitar, 0x); + out_be32(&pci->piw[win_idx].piwbear, + pci64_dma_offset >> 44); + out_be32(&pci->piw[win_idx].piwbar, + pci64_dma_offset >> 12); + out_be32(&pci->piw[win_idx].piwar, piwar); + } /* * install our own dma_set_mask handler to fixup dma_ops @@ -362,12 +393,15 @@ static void setup_pci_atmu(stru
[PATCH v2 05/18] powerpc/85xx: Load all early TLB entries at once
Use an AS=1 trampoline TLB entry to allow all normal TLB1 entries to be loaded at once. This avoids the need to keep the translation that code is executing from in the same TLB entry in the final TLB configuration as during early boot, which in turn is helpful for relocatable kernels (e.g. kdump) where the kernel is not running from what would be the first TLB entry. On e6500, we limit map_mem_in_cams() to the primary hwthread of a core (the boot cpu is always considered primary, as a kdump kernel can be entered on any cpu). Each TLB only needs to be set up once, and when we do, we don't want another thread to be running when we create a temporary trampoline TLB1 entry. Signed-off-by: Scott Wood --- arch/powerpc/kernel/setup_64.c | 8 + arch/powerpc/mm/fsl_booke_mmu.c | 15 -- arch/powerpc/mm/mmu_decl.h | 1 + arch/powerpc/mm/tlb_nohash.c | 19 +++- arch/powerpc/mm/tlb_nohash_low.S | 63 5 files changed, 102 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index bdcbb71..505ec2c 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -108,6 +108,14 @@ static void setup_tlb_core_data(void) for_each_possible_cpu(cpu) { int first = cpu_first_thread_sibling(cpu); + /* +* If we boot via kdump on a non-primary thread, +* make sure we point at the thread that actually +* set up this TLB. +*/ + if (cpu_first_thread_sibling(boot_cpuid) == first) + first = boot_cpuid; + paca[cpu].tcd_ptr = &paca[first].tcd; /* diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index 354ba3c..36d3c55 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -105,8 +105,9 @@ unsigned long p_mapped_by_tlbcam(phys_addr_t pa) * an unsigned long (for example, 32-bit implementations cannot support a 4GB * size). */ -static void settlbcam(int index, unsigned long virt, phys_addr_t phys, - unsigned long size, unsigned long flags, unsigned int pid) +static void preptlbcam(int index, unsigned long virt, phys_addr_t phys, + unsigned long size, unsigned long flags, + unsigned int pid) { unsigned int tsize; @@ -141,7 +142,13 @@ static void settlbcam(int index, unsigned long virt, phys_addr_t phys, tlbcam_addrs[index].start = virt; tlbcam_addrs[index].limit = virt + size - 1; tlbcam_addrs[index].phys = phys; +} +void settlbcam(int index, unsigned long virt, phys_addr_t phys, + unsigned long size, unsigned long flags, + unsigned int pid) +{ + preptlbcam(index, virt, phys, size, flags, pid); loadcam_entry(index); } @@ -181,13 +188,15 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, unsigned long cam_sz; cam_sz = calc_cam_sz(ram, virt, phys); - settlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0); + preptlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0); ram -= cam_sz; amount_mapped += cam_sz; virt += cam_sz; phys += cam_sz; } + + loadcam_multi(0, i, max_cam_idx); tlbcam_index = i; #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 085b66b..27c3a2d 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -152,6 +152,7 @@ extern int switch_to_as1(void); extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu); #endif extern void loadcam_entry(unsigned int index); +extern void loadcam_multi(int first_idx, int num, int tmp_idx); struct tlbcam { u32 MAS0; diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c index 723a099..a7381fb 100644 --- a/arch/powerpc/mm/tlb_nohash.c +++ b/arch/powerpc/mm/tlb_nohash.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -628,10 +629,26 @@ static void early_init_this_mmu(void) #ifdef CONFIG_PPC_FSL_BOOK3E if (mmu_has_feature(MMU_FTR_TYPE_FSL_E)) { unsigned int num_cams; + int __maybe_unused cpu = smp_processor_id(); + bool map = true; /* use a quarter of the TLBCAM for bolted linear map */ num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4; - linear_map_top = map_mem_in_cams(linear_map_top, num_cams); + + /* +* Only do the mapping once per core, or else the +* transient mapping would cause problems. +*/ +#ifdef CONFIG_SMP + if (cpu != boot_cpuid && +
[PATCH v2 06/18] powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry
This is required for kdump to work when loaded at at an address that does not fall within the first TLB entry -- which can easily happen because while the lower limit is enforced via reserved memory, which doesn't affect how much is mapped, the upper limit is enforced via a different mechanism that does. Thus, more TLB entries are needed than would normally be used, as the total memory to be mapped might not be a power of two. Signed-off-by: Scott Wood --- arch/powerpc/mm/fsl_booke_mmu.c | 22 +++--- arch/powerpc/mm/mmu_decl.h | 3 ++- arch/powerpc/mm/tlb_nohash.c| 24 +--- 3 files changed, 34 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index 36d3c55..5eef7d7 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -178,7 +178,8 @@ unsigned long calc_cam_sz(unsigned long ram, unsigned long virt, } static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, - unsigned long ram, int max_cam_idx) + unsigned long ram, int max_cam_idx, + bool dryrun) { int i; unsigned long amount_mapped = 0; @@ -188,7 +189,9 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, unsigned long cam_sz; cam_sz = calc_cam_sz(ram, virt, phys); - preptlbcam(i, virt, phys, cam_sz, pgprot_val(PAGE_KERNEL_X), 0); + if (!dryrun) + preptlbcam(i, virt, phys, cam_sz, + pgprot_val(PAGE_KERNEL_X), 0); ram -= cam_sz; amount_mapped += cam_sz; @@ -196,6 +199,9 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, phys += cam_sz; } + if (dryrun) + return amount_mapped; + loadcam_multi(0, i, max_cam_idx); tlbcam_index = i; @@ -208,12 +214,12 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt, return amount_mapped; } -unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx) +unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, bool dryrun) { unsigned long virt = PAGE_OFFSET; phys_addr_t phys = memstart_addr; - return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx); + return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx, dryrun); } #ifdef CONFIG_PPC32 @@ -244,7 +250,7 @@ void __init adjust_total_lowmem(void) ram = min((phys_addr_t)__max_low_memory, (phys_addr_t)total_lowmem); i = switch_to_as1(); - __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM); + __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, false); restore_to_as0(i, 0, 0, 1); pr_info("Memory CAM mapping: "); @@ -312,10 +318,12 @@ notrace void __init relocate_init(u64 dt_ptr, phys_addr_t start) n = switch_to_as1(); /* map a 64M area for the second relocation */ if (memstart_addr > start) - map_mem_in_cams(0x400, CONFIG_LOWMEM_CAM_NUM); + map_mem_in_cams(0x400, CONFIG_LOWMEM_CAM_NUM, + false); else map_mem_in_cams_addr(start, PAGE_OFFSET + offset, - 0x400, CONFIG_LOWMEM_CAM_NUM); + 0x400, CONFIG_LOWMEM_CAM_NUM, + false); restore_to_as0(n, offset, __va(dt_ptr), 1); /* We should never reach here */ panic("Relocation error"); diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 27c3a2d..9f58ff4 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -141,7 +141,8 @@ extern void MMU_init_hw(void); extern unsigned long mmu_mapin_ram(unsigned long top); #elif defined(CONFIG_PPC_FSL_BOOK3E) -extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx); +extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, +bool dryrun); extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt, phys_addr_t phys); #ifdef CONFIG_PPC32 diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c index a7381fb..bb04e4d 100644 --- a/arch/powerpc/mm/tlb_nohash.c +++ b/arch/powerpc/mm/tlb_nohash.c @@ -648,7 +648,7 @@ static void early_init_this_mmu(void) if (map) linear_map_top = map_mem_in_cams(linear_map_top, -num_cams); +
[PATCH v2 07/18] powerpc/85xx: Implement 64-bit kexec support
From: Tiejun Chen Unlike 32-bit 85xx kexec, we don't do a core reset. Signed-off-by: Tiejun Chen [scottwood: edit changelog, and cleanup] Signed-off-by: Scott Wood --- arch/powerpc/platforms/85xx/smp.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index a0763be..2e46684 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -351,6 +351,7 @@ struct smp_ops_t smp_85xx_ops = { }; #ifdef CONFIG_KEXEC +#ifdef CONFIG_PPC32 atomic_t kexec_down_cpus = ATOMIC_INIT(0); void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary) @@ -370,9 +371,18 @@ static void mpc85xx_smp_kexec_down(void *arg) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0,1); } +#else +void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary) +{ + local_irq_disable(); + hard_irq_disable(); + mpic_teardown_this_cpu(secondary); +} +#endif static void mpc85xx_smp_machine_kexec(struct kimage *image) { +#ifdef CONFIG_PPC32 int timeout = INT_MAX; int i, num_cpus = num_present_cpus(); @@ -393,6 +403,7 @@ static void mpc85xx_smp_machine_kexec(struct kimage *image) if ( i == smp_processor_id() ) continue; mpic_reset_core(i); } +#endif default_machine_kexec(image); } -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 08/18] powerpc/e6500: kexec: Handle hardware threads
The new kernel will be expecting secondary threads to be disabled, not spinning. Signed-off-by: Scott Wood --- v2: minor cleanup arch/powerpc/kernel/head_64.S | 16 ++ arch/powerpc/platforms/85xx/smp.c | 46 +++ 2 files changed, 62 insertions(+) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index d48125d..8b2bf0d 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -182,6 +182,8 @@ exception_marker: #ifdef CONFIG_PPC_BOOK3E _GLOBAL(fsl_secondary_thread_init) + mfspr r4,SPRN_BUCSR + /* Enable branch prediction */ lis r3,BUCSR_INIT@h ori r3,r3,BUCSR_INIT@l @@ -196,10 +198,24 @@ _GLOBAL(fsl_secondary_thread_init) * number. There are two threads per core, so shift everything * but the low bit right by two bits so that the cpu numbering is * continuous. +* +* If the old value of BUCSR is non-zero, this thread has run +* before. Thus, we assume we are coming from kexec or a similar +* scenario, and PIR is already set to the correct value. This +* is a bit of a hack, but there are limited opportunities for +* getting information into the thread and the alternatives +* seemed like they'd be overkill. We can't tell just by looking +* at the old PIR value which state it's in, since the same value +* could be valid for one thread out of reset and for a different +* thread in Linux. */ + mfspr r3, SPRN_PIR + cmpwi r4,0 + bne 1f rlwimi r3, r3, 30, 2, 30 mtspr SPRN_PIR, r3 +1: #endif _GLOBAL(generic_secondary_thread_init) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index 2e46684..712764f 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -374,9 +374,55 @@ static void mpc85xx_smp_kexec_down(void *arg) #else void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary) { + int cpu = smp_processor_id(); + int sibling = cpu_last_thread_sibling(cpu); + bool notified = false; + int disable_cpu; + int disable_threadbit = 0; + long start = mftb(); + long now; + local_irq_disable(); hard_irq_disable(); mpic_teardown_this_cpu(secondary); + + if (cpu == crashing_cpu && cpu_thread_in_core(cpu) != 0) { + /* +* We enter the crash kernel on whatever cpu crashed, +* even if it's a secondary thread. If that's the case, +* disable the corresponding primary thread. +*/ + disable_threadbit = 1; + disable_cpu = cpu_first_thread_sibling(cpu); + } else if (sibling != crashing_cpu && + cpu_thread_in_core(cpu) == 0 && + cpu_thread_in_core(sibling) != 0) { + disable_threadbit = 2; + disable_cpu = sibling; + } + + if (disable_threadbit) { + while (paca[disable_cpu].kexec_state < KEXEC_STATE_REAL_MODE) { + barrier(); + now = mftb(); + if (!notified && now - start > 100) { + pr_info("%s/%d: waiting for cpu %d to enter KEXEC_STATE_REAL_MODE (%d)\n", + __func__, smp_processor_id(), + disable_cpu, + paca[disable_cpu].kexec_state); + notified = true; + } + } + + if (notified) { + pr_info("%s: cpu %d done waiting\n", + __func__, disable_cpu); + } + + mtspr(SPRN_TENC, disable_threadbit); + while (mfspr(SPRN_TENSR) & disable_threadbit) + cpu_relax(); + } } #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 09/18] powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
From: Tiejun Chen Rename 'interrupt_end_book3e' to '__end_interrupts' so that the symbol can be used by both book3s and book3e. Signed-off-by: Tiejun Chen [scottwood: edit changelog] Signed-off-by: Scott Wood --- arch/powerpc/kernel/exceptions-64e.S | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index f3bd5e7..9d4a006 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -542,8 +542,8 @@ interrupt_base_book3e: /* fake trap */ EXCEPTION_STUB(0x320, ehpriv) EXCEPTION_STUB(0x340, lrat_error) - .globl interrupt_end_book3e -interrupt_end_book3e: + .globl __end_interrupts +__end_interrupts: /* Critical Input Interrupt */ START_EXCEPTION(critical_input); @@ -736,7 +736,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) beq+1f LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) + LOAD_REG_IMMEDIATE(r15,__end_interrupts) cmpld cr0,r10,r14 cmpld cr1,r10,r15 blt+cr0,1f @@ -800,7 +800,7 @@ kernel_dbg_exc: beq+1f LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) + LOAD_REG_IMMEDIATE(r15,__end_interrupts) cmpld cr0,r10,r14 cmpld cr1,r10,r15 blt+cr0,1f -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 10/18] powerpc/booke64: Fix args to copy_and_flush
From: Tiejun Chen Convert r4/r5, not r6, to a virtual address when calling copy_and_flush. Otherwise, r3 is already virtual, and copy_to_flush tries to access r3+r6, PAGE_OFFSET gets added twice. This isn't normally seen because on book3e we normally enter with the kernel at zero and thus skip copy_to_flush -- but it will be needed for kexec support. Signed-off-by: Tiejun Chen [scottwood: split patch and rewrote changelog] Signed-off-by: Scott Wood --- arch/powerpc/kernel/head_64.S | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 8b2bf0d..a1e85ca 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -474,15 +474,15 @@ __after_prom_start: */ li r3,0/* target addr */ #ifdef CONFIG_PPC_BOOK3E - tovirt(r3,r3) /* on booke, we already run at PAGE_OFFSET */ + tovirt(r3,r3) /* on booke, we already run at PAGE_OFFSET */ #endif mr. r4,r26 /* In some cases the loader may */ +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r4,r4) +#endif beq 9f /* have already put us at zero */ li r6,0x100/* Start offset, the first 0x100 */ /* bytes were copied earlier.*/ -#ifdef CONFIG_PPC_BOOK3E - tovirt(r6,r6) /* on booke, we already run at PAGE_OFFSET */ -#endif #ifdef CONFIG_RELOCATABLE /* @@ -514,6 +514,9 @@ __after_prom_start: p_end: .llong _end - _stext 4: /* Now copy the rest of the kernel up to _end */ +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r26,r26) +#endif addis r5,r26,(p_end - _stext)@ha ld r5,(p_end - _stext)@l(r5) /* get _end */ 5: bl copy_and_flush /* copy the rest */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 11/18] powerpc/book3e: support CONFIG_RELOCATABLE
From: Tiejun Chen book3e is different with book3s since 3s includes the exception vectors code in head_64.S as it relies on absolute addressing which is only possible within this compilation unit. So we have to get that label address with got. And when boot a relocated kernel, we should reset ipvr properly again after .relocate. Signed-off-by: Tiejun Chen [scottwood: cleanup and ifdef removal] Signed-off-by: Scott Wood --- arch/powerpc/include/asm/exception-64e.h | 4 ++-- arch/powerpc/kernel/exceptions-64e.S | 9 +++-- arch/powerpc/kernel/head_64.S| 22 +++--- 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index a8b52b6..344fc43 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -204,8 +204,8 @@ exc_##label##_book3e: #endif #define SET_IVOR(vector_number, vector_offset) \ - li r3,vector_offset@l; \ - ori r3,r3,interrupt_base_book3e@l; \ + LOAD_REG_ADDR(r3,interrupt_base_book3e);\ + ori r3,r3,vector_offset@l; \ mtspr SPRN_IVOR##vector_number,r3; #endif /* _ASM_POWERPC_EXCEPTION_64E_H */ diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 9d4a006..488e631 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1351,7 +1351,10 @@ skpinv: addir6,r6,1 /* Increment */ * r4 = MAS0 w/TLBSEL & ESEL for the temp mapping */ /* Now we branch the new virtual address mapped by this entry */ - LOAD_REG_IMMEDIATE(r6,2f) + bl 1f /* Find our address */ +1: mflrr6 + addir6,r6,(2f - 1b) + tovirt(r6,r6) lis r7,MSR_KERNEL@h ori r7,r7,MSR_KERNEL@l mtspr SPRN_SRR0,r6 @@ -1583,9 +1586,11 @@ _GLOBAL(book3e_secondary_thread_init) mflrr28 b 3b + .globl init_core_book3e init_core_book3e: /* Establish the interrupt vector base */ - LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e) + tovirt(r2,r2) + LOAD_REG_ADDR(r3, interrupt_base_book3e) mtspr SPRN_IVPR,r3 sync blr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index a1e85ca..1b77956 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -457,12 +457,22 @@ __after_prom_start: /* process relocations for the final address of the kernel */ lis r25,PAGE_OFFSET@highest /* compute virtual base of kernel */ sldir25,r25,32 +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */ +#endif lwz r7,__run_at_load-_stext(r26) +#if defined(CONFIG_PPC_BOOK3E) + tophys(r26,r26) +#endif cmplwi cr0,r7,1/* flagged to stay where we are ? */ bne 1f add r25,r25,r26 1: mr r3,r25 bl relocate +#if defined(CONFIG_PPC_BOOK3E) + /* IVPR needs to be set after relocation. */ + bl init_core_book3e +#endif #endif /* @@ -490,12 +500,21 @@ __after_prom_start: * variable __run_at_load, if it is set the kernel is treated as relocatable * kernel, otherwise it will be moved to PHYSICAL_START */ +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */ +#endif lwz r7,__run_at_load-_stext(r26) cmplwi cr0,r7,1 bne 3f +#ifdef CONFIG_PPC_BOOK3E + LOAD_REG_ADDR(r5, __end_interrupts) + LOAD_REG_ADDR(r11, _stext) + sub r5,r5,r11 +#else /* just copy interrupts */ LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext) +#endif b 5f 3: #endif @@ -514,9 +533,6 @@ __after_prom_start: p_end: .llong _end - _stext 4: /* Now copy the rest of the kernel up to _end */ -#if defined(CONFIG_PPC_BOOK3E) - tovirt(r26,r26) -#endif addis r5,r26,(p_end - _stext)@ha ld r5,(p_end - _stext)@l(r5) /* get _end */ 5: bl copy_and_flush /* copy the rest */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 12/18] powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
While book3e doesn't have "real mode", we still want to wait for all the non-crash cpus to complete their shutdown. Signed-off-by: Scott Wood --- arch/powerpc/kernel/crash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 51dbace..2bb252c 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -221,8 +221,8 @@ void crash_kexec_secondary(struct pt_regs *regs) #endif /* CONFIG_SMP */ /* wait for all the CPUs to hit real mode but timeout if they don't come in */ -#if defined(CONFIG_SMP) && defined(CONFIG_PPC_STD_MMU_64) -static void crash_kexec_wait_realmode(int cpu) +#if defined(CONFIG_SMP) && defined(CONFIG_PPC64) +static void __maybe_unused crash_kexec_wait_realmode(int cpu) { unsigned int msecs; int i; @@ -244,7 +244,7 @@ static void crash_kexec_wait_realmode(int cpu) } #else static inline void crash_kexec_wait_realmode(int cpu) {} -#endif /* CONFIG_SMP && CONFIG_PPC_STD_MMU_64 */ +#endif /* CONFIG_SMP && CONFIG_PPC64 */ /* * Register a function to be called on shutdown. Only use this if you -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 13/18] powerpc/book3e-64: Don't limit paca to 256 MiB
This limit only makes sense on book3s, and on book3e it can cause problems with kdump if we don't have any memory under 256 MiB. Signed-off-by: Scott Wood --- arch/powerpc/kernel/paca.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 5a23b69..7fdff63 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -206,12 +206,16 @@ void __init allocate_pacas(void) { int cpu, limit; + limit = ppc64_rma_size; + +#ifdef CONFIG_PPC_BOOK3S_64 /* * We can't take SLB misses on the paca, and we want to access them * in real mode, so allocate them within the RMA and also within * the first segment. */ - limit = min(0x1000ULL, ppc64_rma_size); + limit = min(0x1000ULL, limit); +#endif paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 14/18] powerpc/book3e-64/kexec: create an identity TLB mapping
From: Tiejun Chen book3e has no real MMU mode so we have to create an identity TLB mapping to make sure we can access the real physical address. Signed-off-by: Tiejun Chen [scottwood: cleanup, and split off some changes] Signed-off-by: Scott Wood --- arch/powerpc/kernel/misc_64.S | 52 ++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 6e4168c..246ad8c 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -26,6 +26,7 @@ #include #include #include +#include .text @@ -496,6 +497,51 @@ kexec_flag: #ifdef CONFIG_KEXEC +#ifdef CONFIG_PPC_BOOK3E +/* + * BOOK3E has no real MMU mode, so we have to setup the initial TLB + * for a core to identity map v:0 to p:0. This current implementation + * assumes that 1G is enough for kexec. + */ +kexec_create_tlb: + /* +* Invalidate all non-IPROT TLB entries to avoid any TLB conflict. +* IPROT TLB entries should be >= PAGE_OFFSET and thus not conflict. +*/ + PPC_TLBILX_ALL(0,R0) + sync + isync + + mfspr r10,SPRN_TLB1CFG + andi. r10,r10,TLBnCFG_N_ENTRY /* Extract # entries */ + subir10,r10,1 /* Last entry: no conflict with kernel text */ + lis r9,MAS0_TLBSEL(1)@h + rlwimi r9,r10,16,4,15 /* Setup MAS0 = TLBSEL | ESEL(r9) */ + +/* Set up a temp identity mapping v:0 to p:0 and return to it. */ +#if defined(CONFIG_SMP) || defined(CONFIG_PPC_E500MC) +#define M_IF_NEEDEDMAS2_M +#else +#define M_IF_NEEDED0 +#endif + mtspr SPRN_MAS0,r9 + + lis r9,(MAS1_VALID|MAS1_IPROT)@h + ori r9,r9,(MAS1_TSIZE(BOOK3E_PAGESZ_1GB))@l + mtspr SPRN_MAS1,r9 + + LOAD_REG_IMMEDIATE(r9, 0x0 | M_IF_NEEDED) + mtspr SPRN_MAS2,r9 + + LOAD_REG_IMMEDIATE(r9, 0x0 | MAS3_SR | MAS3_SW | MAS3_SX) + mtspr SPRN_MAS3,r9 + li r9,0 + mtspr SPRN_MAS7,r9 + + tlbwe + isync + blr +#endif /* kexec_smp_wait(void) * @@ -525,6 +571,10 @@ _GLOBAL(kexec_smp_wait) * don't overwrite r3 here, it is live for kexec_wait above. */ real_mode: /* assume normal blr return */ +#ifdef CONFIG_PPC_BOOK3E + /* Create an identity mapping. */ + b kexec_create_tlb +#else 1: li r9,MSR_RI li r10,MSR_DR|MSR_IR mflrr11 /* return address to SRR0 */ @@ -536,7 +586,7 @@ real_mode: /* assume normal blr return */ mtspr SPRN_SRR1,r10 mtspr SPRN_SRR0,r11 rfid - +#endif /* * kexec_sequence(newstack, start, image, control, clear_all()) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 15/18] powerpc/book3e-64/kexec: Enable SMP release
The SMP release mechanism for FSL book3e is different from when booting with normal hardware. In theory we could simulate the normal spin table mechanism, but not at the addresses U-Boot put in the device tree -- so there'd need to be even more communication between the kernel and kexec to set that up. Instead, kexec-tools will set a boolean property linux,booted-from-kexec in the /chosen node. Signed-off-by: Scott Wood Cc: devicet...@vger.kernel.org --- v2: Use a device tree property instead of a flag in the kernel image This depends on the kexec-tools patch v2 "ppc64: Add a flag to tell the kernel it's booting from kexec": http://patchwork.ozlabs.org/patch/527050/ --- Documentation/devicetree/bindings/chosen.txt | 8 arch/powerpc/kernel/setup_64.c | 17 - 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt index ed838f4..6ae9d82 100644 --- a/Documentation/devicetree/bindings/chosen.txt +++ b/Documentation/devicetree/bindings/chosen.txt @@ -44,3 +44,11 @@ Implementation note: Linux will look for the property "linux,stdout-path" or on PowerPC "stdout" if "stdout-path" is not found. However, the "linux,stdout-path" and "stdout" properties are deprecated. New platforms should only use the "stdout-path" property. + +linux,booted-from-kexec +--- + +This property is set (currently only on PowerPC, and only needed on +book3e) by some versions of kexec-tools to tell the new kernel that it +is being booted by kexec, as the booting environment may differ (e.g. +a different secondary CPU release mechanism) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 505ec2c..5c03a6a 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -340,11 +340,26 @@ void early_setup_secondary(void) #endif /* CONFIG_SMP */ #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC) +static bool use_spinloop(void) +{ + if (!IS_ENABLED(CONFIG_PPC_BOOK3E)) + return true; + + /* +* When book3e boots from kexec, the ePAPR spin table does +* not get used. +*/ + return of_property_read_bool(of_chosen, "linux,booted-from-kexec"); +} + void smp_release_cpus(void) { unsigned long *ptr; int i; + if (!use_spinloop()) + return; + DBG(" -> smp_release_cpus()\n"); /* All secondary cpus are spinning on a common spinloop, release them @@ -524,7 +539,7 @@ void __init setup_system(void) * Freescale Book3e parts spin in a loop provided by firmware, * so smp_release_cpus() does nothing for them */ -#if defined(CONFIG_SMP) && !defined(CONFIG_PPC_FSL_BOOK3E) +#if defined(CONFIG_SMP) /* Release secondary cpus out of their spinloops at 0x60 now that * we can map physical -> logical CPU ids */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 16/18] powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
The way VIRT_PHYS_OFFSET is not correct on book3e-64, because it does not account for CONFIG_RELOCATABLE other than via the 32-bit-only virt_phys_offset. book3e-64 can (and if the comment about a GCC miscompilation is still relevant, should) use the normal ppc64 __va/__pa. At this point, only booke-32 will use VIRT_PHYS_OFFSET, so given the issues with its calculation, restrict its definition to booke-32. Signed-off-by: Scott Wood --- arch/powerpc/include/asm/page.h | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 168ca67..6b67239 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -107,12 +107,13 @@ extern long long virt_phys_offset; #endif /* See Description below for VIRT_PHYS_OFFSET */ -#ifdef CONFIG_RELOCATABLE_PPC32 +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE) +#ifdef CONFIG_RELOCATABLE #define VIRT_PHYS_OFFSET virt_phys_offset #else #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START) #endif - +#endif #ifdef CONFIG_PPC64 #define MEMORY_START 0UL @@ -205,7 +206,7 @@ extern long long virt_phys_offset; * On non-Book-E PPC64 PAGE_OFFSET and MEMORY_START are constants so use * the other definitions for __va & __pa. */ -#ifdef CONFIG_BOOKE +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE) #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET)) #define __pa(x) ((unsigned long)(x) - VIRT_PHYS_OFFSET) #else -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 17/18] powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
book3e_secondary_core_init will only create a TLB entry if r4 = 0, so do so. Signed-off-by: Scott Wood --- arch/powerpc/kernel/misc_64.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 246ad8c..ddbc535 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -485,6 +485,8 @@ _GLOBAL(kexec_wait) mtsrr1 r11 rfid #else + /* Create TLB entry in book3e_secondary_core_init */ + li r4,0 ba 0x60 #endif #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 18/18] powerpc/book3e-64: Enable kexec
From: Tiejun Chen Allow KEXEC for book3e, and bypass or convert non-book3e stuff in kexec code. Signed-off-by: Tiejun Chen [scottw...@freescale.com: move code to minimize diff, and cleanup] Signed-off-by: Scott Wood --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/machine_kexec_64.c | 18 ++ arch/powerpc/kernel/misc_64.S | 6 ++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 9a7057e..db49e0d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -419,7 +419,7 @@ config PPC64_SUPPORTS_MEMORY_FAILURE config KEXEC bool "kexec system call" - depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) + depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) || PPC_BOOK3E select KEXEC_CORE help kexec is a system call that implements the ability to shutdown your diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 1a74446..0fbd75d 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -30,6 +30,21 @@ #include #include +#ifdef CONFIG_PPC_BOOK3E +int default_machine_kexec_prepare(struct kimage *image) +{ + int i; + /* +* Since we use the kernel fault handlers and paging code to +* handle the virtual mode, we must make sure no destination +* overlaps kernel static data or bss. +*/ + for (i = 0; i < image->nr_segments; i++) + if (image->segment[i].mem < __pa(_end)) + return -ETXTBSY; + return 0; +} +#else int default_machine_kexec_prepare(struct kimage *image) { int i; @@ -95,6 +110,7 @@ int default_machine_kexec_prepare(struct kimage *image) return 0; } +#endif /* !CONFIG_PPC_BOOK3E */ static void copy_segments(unsigned long ind) { @@ -365,6 +381,7 @@ void default_machine_kexec(struct kimage *image) /* NOTREACHED */ } +#ifndef CONFIG_PPC_BOOK3E /* Values we need to export to the second kernel via the device tree. */ static unsigned long htab_base; static unsigned long htab_size; @@ -411,3 +428,4 @@ static int __init export_htab_values(void) return 0; } late_initcall(export_htab_values); +#endif /* !CONFIG_PPC_BOOK3E */ diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index ddbc535..db475d4 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -631,9 +631,13 @@ _GLOBAL(kexec_sequence) lhz r25,PACAHWCPUID(r13)/* get our phys cpu from paca */ /* disable interrupts, we are overwriting kernel data next */ +#ifdef CONFIG_PPC_BOOK3E + wrteei 0 +#else mfmsr r3 rlwinm r3,r3,0,17,15 mtmsrd r3,1 +#endif /* copy dest pages, flush whole dest image */ mr r3,r29 @@ -655,6 +659,7 @@ _GLOBAL(kexec_sequence) li r6,1 stw r6,kexec_flag-1b(5) +#ifndef CONFIG_PPC_BOOK3E /* clear out hardware hash page table and tlb */ #if !defined(_CALL_ELF) || _CALL_ELF != 2 ld r12,0(r27) /* deref function descriptor */ @@ -663,6 +668,7 @@ _GLOBAL(kexec_sequence) #endif mtctr r12 bctrl /* ppc_md.hpte_clear_all(void); */ +#endif /* !CONFIG_PPC_BOOK3E */ /* * kexec image calling is: -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] cxl: Fix number of allocated pages in SPA
The explanation probably still needs to be expanded more (e.g. this could cause a crash for an AFU that supports more than about a thousand processes) - see my other email in reply to v1 for more, but I'm happy for this to go in as is (but ultimately that's mpe's call). It should also be CCd to stable, this bug was introduced before the driver was originally upstreamed, we just never hit it because all our AFUs are limited to less processes by their interrupt requirements. Cc: stable Acked-by: Ian Munsie Excerpts from Christophe Lombard's message of 2015-10-07 01:19:49 +1100: > This moves the initialisation of the num_procs to before the SPA > allocation. > The field 'num_procs' of the structure cxl_afu is not updated to the > right value (maximum number of processes that can be supported by > the AFU) when the pages are allocated (i.e. when cxl_alloc_spa() is called). > The number of allocates pages depends on the max number of processes. > > Signed-off-by: Christophe Lombard > --- > drivers/misc/cxl/native.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c > index b37f2e8..d2e75c8 100644 > --- a/drivers/misc/cxl/native.c > +++ b/drivers/misc/cxl/native.c > @@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu) > > dev_info(&afu->dev, "Activating AFU directed mode\n"); > > +afu->num_procs = afu->max_procs_virtualised; > if (afu->spa == NULL) { > if (cxl_alloc_spa(afu)) > return -ENOMEM; > @@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu) > cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L); > > afu->current_mode = CXL_MODE_DIRECTED; > -afu->num_procs = afu->max_procs_virtualised; > > if ((rc = cxl_chardev_m_afu_add(afu))) > return rc; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] cxl: Fix number of allocated pages in SPA
On Wed, 2015-10-07 at 14:51 +1100, Ian Munsie wrote: > The explanation probably still needs to be expanded more (e.g. this > could cause a crash for an AFU that supports more than about a thousand > processes) - see my other email in reply to v1 for more, but I'm happy > for this to go in as is (but ultimately that's mpe's call). > > It should also be CCd to stable, this bug was introduced before the > driver was originally upstreamed, we just never hit it because all our > AFUs are limited to less processes by their interrupt requirements. > > Cc: stable So the driver went into 3.18, so this should be: Cc: stable # 3.18+ One of you please resend with a coherent change log with all the details included. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] cxl: Fix number of allocated pages in SPA
From: Christophe Lombard The scheduled process area is currently allocated before assigning the correct maximum processes to the AFU, which will mean we only ever allocate a fixed number of pages for the scheduled process area. This will limit us to 958 processes with 2 x 64K pages. If we try to use more processes than that we'd probably overrun the buffer and corrupt memory or crash. AFUs that require three or more interrupts per process will not be affected as they are already limited to less processes than that, but we could hit it on an AFU that requires 0, 1 or 2 interrupts per process, or when using 4K pages. This patch moves the initialisation of the num_procs to before the SPA allocation so that enough pages will be allocated for the number of processes that the AFU supports. Signed-off-by: Christophe Lombard Signed-off-by: Ian Munsie Cc: stable # 3.18+ --- Changes since v2: - Expanded commit message Changes since v1: - Expanded commit message drivers/misc/cxl/native.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c index b37f2e8..d2e75c8 100644 --- a/drivers/misc/cxl/native.c +++ b/drivers/misc/cxl/native.c @@ -457,6 +457,7 @@ static int activate_afu_directed(struct cxl_afu *afu) dev_info(&afu->dev, "Activating AFU directed mode\n"); + afu->num_procs = afu->max_procs_virtualised; if (afu->spa == NULL) { if (cxl_alloc_spa(afu)) return -ENOMEM; @@ -468,7 +469,6 @@ static int activate_afu_directed(struct cxl_afu *afu) cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L); afu->current_mode = CXL_MODE_DIRECTED; - afu->num_procs = afu->max_procs_virtualised; if ((rc = cxl_chardev_m_afu_add(afu))) return rc; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Missing operand for tlbie instruction on Power7
On Tue, 2015-10-06 at 11:25 -0700, Laura Abbott wrote: > On 10/05/2015 08:35 PM, Michael Ellerman wrote: > > On Fri, 2015-10-02 at 08:43 -0700, Laura Abbott wrote: > >> Hi, > >> > >> We received a report (https://bugzilla.redhat.com/show_bug.cgi?id=1267395) > >> of bad assembly > >> when compiling on powerpc with little endian > > > > ... > > > >> After some discussion with the binutils folks, it turns out that the tlbie > >> instruction actually requires another operand and binutils was updated to > >> check for this https://sourceware.org/ml/binutils/2015-05/msg00133.html . > >> > >> The code sequence in arch/powerpc/include/asm/ppc_asm.h now needs to be > >> updated: > >> > >> #if !defined(CONFIG_4xx) && !defined(CONFIG_8xx) > >> #define tlbia \ > >> li r4,1024;\ > >> mtctr r4; \ > >> lis r4,KERNELBASE@h;\ > >> 0: tlbie r4; \ > >> addir4,r4,0x1000; \ > >> bdnz0b > >> #endif > >> > >> I don't know enough ppc assembly to properly fix this but I can test. > > > > How are you testing? This code is fairly old and I'm dubious if it still > > works. > > > > These days we have a ppc_md hook for flushing the TLB, ppc_md.flush_tlb(). > > Ideally the swsusp code would use that. > > Testing would probably just be compile and maybe boot. I don't have regular > access to the hardware. This problem just showed up for me when someone > tried to compile Fedora rawhide with the latest binutils. Right. The code in question is for software suspend, ie. hibernation, so that's what needs testing if the code is going to change. It was mostly written for G5 (543b9fd3528f6), though it later gained support for 64-bit BookE (5a31057fc06c3). I just tested it on a G5 here and amazingly it worked. So it is working code, even if it is old and crufty. > From what I can tell, it looks like the .flush_tlb of the cpu_spec is only > defined for power7 and power8 and I don't see a ppc_md.flush_tlb on the > master branch. Yes it's only defined for Power7 and Power8 at the moment. It definitely does exist in Linus' master branch, but I'm not sure if that's the master branch you're referring to. > It's not clear what to do for the case where there is no > flush_tlb function. Would filling in a .flush_tlb for all the PPC_BOOK3S_64 > with the existing tlbia sequence work? It might, but it's not much of an improvement. Ideally we'd have an actually correct sequence for each cpu type. > It's also worth noting that the __flush_power7 uses tlbiel instead of tlbie. Yeah that's a good point. It's not clear if the swsusp code wants to a local or a global invalidate. As an alternative, can you try adding a .machine push / .machine "power4" / .machine pop, around the tlbie. That should tell the assembler to drop back to power4 mode for that instruction, which should then do the right thing. There are some examples in that file. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev