[PATCH] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-07 Thread Christophe Leroy
we patch memcpy() by replacing the temporary 'dcbt' by 'dcbz' Reported-by: Michal Sojka Signed-off-by: Christophe Leroy --- @Michal, can you please test it ? arch/powerpc/kernel/setup_32.c | 12 arch/powerpc/lib/copy_32.S | 11 ++- 2 files changed, 22 in

[PATCH v2] powerpc32: memcpy/memset: only use dcbz once cache is enabled

2015-09-09 Thread Christophe Leroy
ed-by: Michal Sojka Signed-off-by: Christophe Leroy --- changes in v2: Using feature-fixups instead of hardcoded call to patch_instruction() Handling of memset() added arch/powerpc/include/asm/cache.h | 8 arch/powerpc/include/asm/feature-fixups.h

[PATCH v3] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-11 Thread Christophe Leroy
d-off-by: Christophe Leroy --- changes in v2: Using feature-fixups instead of hardcoded call to patch_instruction() Handling of memset() added changes in v3: Not using anymore feature-fixups Handling of memset() removed arch/powerpc/kernel/setup_32.c | 3 +++ arch/powerpc/lib/copy_32.S

Re: [PATCH v3] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-12 Thread christophe leroy
Le 11/09/2015 23:35, Scott Wood a écrit : On Fri, 2015-09-11 at 16:33 +0200, Christophe Leroy wrote: memcpy() uses instruction dcbz to speed up copy by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and can

Re: [PATCH v2] powerpc32: memcpy/memset: only use dcbz once cache is enabled

2015-09-12 Thread christophe leroy
Le 11/09/2015 03:24, Michael Ellerman a écrit : On Thu, 2015-09-10 at 17:05 -0500, Scott Wood wrote: I don't think this duplication is what Michael meant by "the normal cpu feature sections". What else is going to use this very specific infrastructure? Yeah, sorry, I was hoping you could do

[PATCH v3] powerpc32: memset: only use dcbz once cache is enabled

2015-09-13 Thread Christophe Leroy
calls to it. This patch modifies memset() such that at startup, memset() unconditionally jumps to simple_memset() which doesn't use the dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memset() by replacing this inconditional jump by a NOP Signed-off-by: Christophe

Re: [PATCH v3] powerpc32: memset: only use dcbz once cache is enabled

2015-09-14 Thread Christophe LEROY
Le 14/09/2015 17:20, Scott Wood a écrit : On Mon, 2015-09-14 at 08:21 +0200, Christophe Leroy wrote: memset() uses instruction dcbz to speed up clearing by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and

[PATCH v4 0/2] powerpc32: memcpy/memset: only use dcbz once cache is enabled

2015-09-16 Thread Christophe Leroy
memset(), GCC makes calls to them. Christophe Leroy (2): powerpc32: memcpy: only use dcbz once cache is enabled powerpc32: memset: only use dcbz once cache is enabled arch/powerpc/kernel/setup_32.c | 6 ++ arch/powerpc/lib/copy_32.S | 11 +++ 2 files changed, 17 insertions

[PATCH v4 1/2] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-16 Thread Christophe Leroy
d-off-by: Christophe Leroy --- changes in v2: Using feature-fixups instead of hardcoded call to patch_instruction() Handling of memset() added changes in v3: Not using anymore feature-fixups Handling of memset() removed changes in v4: None arch/powerpc/kernel/setup_32.c | 3 +++ arch/po

[PATCH v4 2/2] powerpc32: memset: only use dcbz once cache is enabled

2015-09-16 Thread Christophe Leroy
calls to it. This patch modifies memset() such that at startup, memset() unconditionally skip the optimised bloc that uses dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memset() by replacing this inconditional jump by a NOP Signed-off-by: Christophe Leroy --- Changes

[PATCH 0/9] powerpc32: set of optimisation of network checksum functions

2015-09-22 Thread Christophe Leroy
This patch serie gather patches related to checksum functions on powerpc. Some of those patches have already been submitted individually. Christophe Leroy (9): powerpc: unexport csum_tcpudp_magic powerpc: mark xer clobbered in csum_add() powerpc32: checksum_wrappers_64 becomes

[PATCH 1/9] powerpc: unexport csum_tcpudp_magic

2015-09-22 Thread Christophe Leroy
csum_tcpudp_magic is now an inline function, so there is nothing to export Signed-off-by: Christophe Leroy --- arch/powerpc/lib/ppc_ksyms.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/lib/ppc_ksyms.c b/arch/powerpc/lib/ppc_ksyms.c index c7f8e95..f5e427e 100644 --- a/arch

[PATCH 2/9] powerpc: mark xer clobbered in csum_add()

2015-09-22 Thread Christophe Leroy
addc uses carry so xer is clobbered in csum_add() Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index e8d9ef4..d2ca07b 100644

[PATCH 4/9] powerpc: inline ip_fast_csum()

2015-09-22 Thread Christophe Leroy
In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet Signed-off-by: Christophe Leroy --- arch/powerpc

[PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()

2015-09-22 Thread Christophe Leroy
for the checksum r7 and r8 which contains pointers to error feedback are used, so we stack them. On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Signed-off-by: Christophe Leroy -

[PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()

2015-09-22 Thread Christophe Leroy
n the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 then proceding on comparisons and substractions. Signed-off-by: Christophe Leroy --- arch/powerpc/lib/checksum_32.S | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a

[PATCH 7/9] powerpc32: optimise csum_partial() loop

2015-09-22 Thread Christophe Leroy
h and parallele execution) Signed-off-by: Christophe Leroy --- arch/powerpc/lib/checksum_32.S | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 9c12602..0d34f47 100644 --- a/arch/powerpc/lib/checksum

[PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers

2015-09-22 Thread Christophe Leroy
also exists on powerpc32 This patch renames arch/powerpc/lib/checksum_wrappers_64.c to arch/powerpc/lib/checksum_wrappers.c and makes it non-conditional to CONFIG_WORD_SIZE Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 9 - arch

[PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0

2015-09-22 Thread Christophe Leroy
Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 56deea8..f8a9704 100644 --- a

[PATCH 9/9] powerpc: optimise csum_partial() call when len is constant

2015-09-22 Thread Christophe Leroy
multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 80 ++--- arch/powerpc/lib/checksum_32.S | 4 +- arch/powerpc

[PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages

2015-09-22 Thread Christophe Leroy
increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: Christophe Leroy --- v2: using bt instead of bgt and named the label explicitly arch/powerpc/kernel/head_8xx.S | 35 +- arch

[PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments

2015-09-22 Thread Christophe Leroy
other miscellaneous improvements: 1/ Handling of CPU6 ERRATA directly in mtspr() C macro to reduce code specific to PPC8xx 2/ Rewrite of a few non critical ASM functions in C 3/ Removal of some unused items See related patches for details Christophe Leroy (25): powerpc/8xx: Save r3 all the time

[PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-09-22 Thread Christophe Leroy
We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/head_8xx.S | 13 - 1 file changed

[PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter

2015-09-22 Thread Christophe Leroy
Now the noltlbs kernel parameter is also applicable to PPC8xx Signed-off-by: Christophe Leroy --- No change in v2 Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt

[PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap

2015-09-22 Thread Christophe Leroy
only 2Mbytes aligned) the same way. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/Kconfig.debug | 1 - arch/powerpc/kernel/head_8xx.S | 10 +- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 3a5

[PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c

2015-09-22 Thread Christophe Leroy
Now we have a 8xx specific .c file for that so put it in there as other powerpc variants do Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/8xx_mmu.c | 17 + arch/powerpc/mm/init_32.c | 19 --- 2 files changed, 17 insertions(+), 19 deletions

[PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either

2015-09-22 Thread Christophe Leroy
iounmap() cannot vunmap() area mapped by TLBCAMs either Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/pgtable_32.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 7692d1b..03a073a

[PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2015-09-22 Thread Christophe Leroy
x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, so lets group them into a single function. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/pgtable_32.c | 33 ++--- 1 file changed, 26 insertions(+), 7 deletions(-) diff

[PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout

2015-09-22 Thread Christophe Leroy
show IMMR area in startup memory layout Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/mem.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 22d94c3..e105ca6 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc

[PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address

2015-09-22 Thread Christophe Leroy
hence yet another 10% reduction. Signed-off-by: Christophe Leroy --- v2: - using bt instead of blt/bgt - reorganised in order to have only one taken branch for both 512k and 8M instead of a first branch for both 8M and 512k then a second branch for 512k arch/powerpc/include/asm/pgtable-ppc32.h

[PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM

2015-09-22 Thread Christophe Leroy
IMMR is now mapped at 0xff00 by page tables so it is not anymore necessary to PIN TLBs Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/Kconfig.debug | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 70168a2

[PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup

2015-09-22 Thread Christophe Leroy
. This patch adds a second 8M page to the initial mapping in order to have 16M mapped regardless of CONFIG_PIN_TLB, like several other 32 bits PPC (40x, 601, ...) Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/head_8xx.S | 2 ++ arch/powerpc/mm/8xx_mmu.c | 4 ++-- 2

[PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message

2015-09-22 Thread Christophe Leroy
Commit 771168494719 ("[POWERPC] Remove unused machine call outs") removed the call to setup_io_mappings(), so remove the associated progress line message Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/init_32.c | 4 1 file changed, 4 deletions(-) diff --

[PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations

2015-09-22 Thread Christophe Leroy
We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/head_8xx.S | 15 --- 1 file changed

[PATCH v2 14/25] powerpc32: remove ioremap_base

2015-09-22 Thread Christophe Leroy
ioremap_base is not initialised and is nowhere used so remove it Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/mmu_decl.h | 1 - arch/powerpc/mm/pgtable_32.c| 1 - arch/powerpc/platforms/embedded6xx/mpc10x.h | 8 3 files changed

[PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones

2015-09-22 Thread Christophe Leroy
Move 8xx SPRN defines into reg_8xx.h and add some missing ones Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/include/asm/mmu-8xx.h | 26 +- arch/powerpc/include/asm/reg_8xx.h | 24 2 files changed, 37 insertions(+), 13

[PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec()

2015-09-22 Thread Christophe Leroy
CPU6 ERRATA is now handled directly in mtspr(), so we can use the standard set_dec() fonction in all cases. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/include/asm/time.h | 6 +- arch/powerpc/kernel/head_8xx.S | 18 -- 2 files changed, 1 insertion

[PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C

2015-09-22 Thread Christophe Leroy
There is no real need to have set_context() in assembly. Now that we have mtspr() handling CPU6 ERRATA directly, we can rewrite set_context() in C language for easier maintenance. Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/head_8xx.S | 44

[PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro

2015-09-22 Thread Christophe Leroy
MPC8xx has an ERRATA on the use of mtspr() for some registers This patch includes the ERRATA handling directly into mtspr() macro so that mtspr() users don't need to bother about that errata Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/include/asm/reg.h | 2 +

[PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() in C

2015-09-22 Thread Christophe Leroy
On PPC8xx, flushing instruction cache is performed by writing in register SPRN_IC_CST. This registers suffers CPU6 ERRATA. The patch rewrites the fonction in C so that CPU6 ERRATA will be handled transparently Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/kernel/misc_32.S

[PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline

2015-09-22 Thread Christophe Leroy
clear_pages() is never used, and PPC32 is the only architecture (still) having this function. Neither PPC64 nor any other architecture has it. This patch removes clear_page() and move clear_page() function inline (same as PPC64) as it only is a few isns Signed-off-by: Christophe Leroy --- No

[PATCH v2 21/25] powerpc: add inline functions for cache related instructions

2015-09-22 Thread Christophe Leroy
This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst from C functions Signed-off-by: Christophe Leroy --- New in v2 arch/powerpc/include/asm/cache.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm

[PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline

2015-09-22 Thread Christophe Leroy
calling them Signed-off-by: Christophe Leroy --- New in v2 arch/powerpc/include/asm/cacheflush.h | 55 +++-- arch/powerpc/kernel/misc_32.S | 65 --- arch/powerpc/kernel/ppc_ksyms.c | 2 ++ 3 files changed, 54 insertions(+), 68

[PATCH v2 23/25] powerpc: Simplify test in __dma_sync()

2015-09-22 Thread Christophe Leroy
This simplification helps the compiler. We now have only one test instead of two, so it reduces the number of branches. Signed-off-by: Christophe Leroy --- New in v2 arch/powerpc/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/dma

[PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range()

2015-09-22 Thread Christophe Leroy
Inlining of _dcache_range() functions has shown that the compiler does the same thing a bit better with one insn less Signed-off-by: Christophe Leroy --- New in v2 arch/powerpc/kernel/misc_32.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/misc_32

[PATCH v2 25/25] powerpc32: Remove one insn in mulhdu

2015-09-22 Thread Christophe Leroy
Remove one instruction in mulhdu Signed-off-by: Christophe Leroy --- New in v2 arch/powerpc/kernel/misc_32.S | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 1597424..870dc63 100644 --- a/arch

Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline

2015-09-22 Thread Christophe Leroy
at 14:42 -0500, Scott Wood wrote: On Tue, 2015-09-22 at 19:34 +, Joakim Tjernlund wrote: On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote: On Tue, 2015-09-22 at 18:12 +, Joakim Tjernlund wrote: On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote: flush/clean/invalidate

Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline

2015-09-22 Thread Christophe Leroy
Le 23/09/2015 00:34, Scott Wood a écrit : On Tue, 2015-09-22 at 22:57 +0200, Christophe Leroy wrote: >Here is what I get in asm. First one is with "if (i) mb();". We see gcc >puts a beqlr. This is the form that is closest to what we had in the >former misc_32.S >

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 00:07, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction. Signed

Re: [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 01:41, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:40PM +0200, Christophe Leroy wrote: iounmap() cannot vunmap() area mapped by TLBCAMs either Signed-off-by: Christophe Leroy --- No change in v2 arch/powerpc/mm/pgtable_32.c | 4 +++- 1 file changed, 3 insertions

Re: [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 01:47, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:42PM +0200, Christophe Leroy wrote: x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, so lets group them into a single function. Signed-off-by: Christophe Leroy --- No change in v2 arch

Re: [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 02:00, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:54PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction Signed

Re: [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 02:03, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:58PM +0200, Christophe Leroy wrote: Move 8xx SPRN defines into reg_8xx.h and add some missing ones Signed-off-by: Christophe Leroy --- No change in v2 Why are they being moved? Why are they being separated from the

Re: [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup

2015-10-06 Thread Christophe Leroy
Le 29/09/2015 01:58, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:50PM +0200, Christophe Leroy wrote: On recent kernels, with some debug options like for instance CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough the kernel code fits in the first 8M. Today, it is necessary

Re: [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-10-06 Thread christophe leroy
Le 06/10/2015 18:46, Scott Wood a écrit : On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: Le 29/09/2015 00:07, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB

Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline

2015-10-07 Thread Christophe Leroy
Le 29/09/2015 02:29, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote: flush/clean/invalidate _dcache_range() functions are all very similar and are quite short. They are mainly used in __dma_sync() perf_event locate them in the top 3 consumming functions

Re: [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap

2015-10-08 Thread Christophe Leroy
Le 29/09/2015 01:39, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:50:38PM +0200, Christophe Leroy wrote: Memory: 124428K/131072K available (3748K kernel code, 188K rwdata, 648K rodata, 508K init, 290K bss, 6644K reserved) Kernel virtual memory layout: * 0xfffdf000..0xf000 : fixmap

Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline

2015-10-12 Thread christophe leroy
Le 08/10/2015 21:12, Scott Wood a écrit : On Wed, 2015-10-07 at 14:49 +0200, Christophe Leroy wrote: Le 29/09/2015 02:29, Scott Wood a écrit : On Tue, Sep 22, 2015 at 06:51:13PM +0200, Christophe Leroy wrote: flush/clean/invalidate _dcache_range() functions are all very similar and are

[PATCH v3 22/23] powerpc32: small optimisation in flush_icache_range()

2015-11-17 Thread Christophe Leroy
Inlining of _dcache_range() functions has shown that the compiler does the same thing a bit better with one insn less Signed-off-by: Christophe Leroy --- v2: new v3: no change arch/powerpc/kernel/misc_32.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc

[PATCH v3 14/23] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro

2015-11-17 Thread Christophe Leroy
MPC8xx has an ERRATA on the use of mtspr() for some registers This patch includes the ERRATA handling directly into mtspr() macro so that mtspr() users don't need to bother about that errata Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/include/asm/reg.h

[PATCH v3 23/23] powerpc32: Remove one insn in mulhdu

2015-11-17 Thread Christophe Leroy
Remove one instruction in mulhdu Signed-off-by: Christophe Leroy --- v2: new v3: no change arch/powerpc/kernel/misc_32.S | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 1597424..870dc63 100644

[PATCH v3 06/23] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2015-11-17 Thread Christophe Leroy
x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, and are never defined at the same time. So rename them x_block_mapped() and define them in the relevant places Signed-off-by: Christophe Leroy --- v2: no change v3: Functions are mutually exclusive so renamed iaw Scott

[PATCH v3 20/23] powerpc32: move xxxxx_dcache_range() functions inline

2015-11-17 Thread Christophe Leroy
calling them Signed-off-by: Christophe Leroy --- v2: new v3: no change arch/powerpc/include/asm/cacheflush.h | 52 ++-- arch/powerpc/kernel/misc_32.S | 65 --- arch/powerpc/kernel/ppc_ksyms.c | 2 ++ 3 files changed, 51

[PATCH v3 21/23] powerpc: Simplify test in __dma_sync()

2015-11-17 Thread Christophe Leroy
This simplification helps the compiler. We now have only one test instead of two, so it reduces the number of branches. Signed-off-by: Christophe Leroy --- v2: new v3: no change arch/powerpc/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc

[PATCH v3 19/23] powerpc32: Remove clear_pages() and define clear_page() inline

2015-11-17 Thread Christophe Leroy
: Christophe Leroy --- v2: no change v3: no change arch/powerpc/include/asm/page_32.h | 17 ++--- arch/powerpc/kernel/misc_32.S | 16 arch/powerpc/kernel/ppc_ksyms_32.c | 1 - 3 files changed, 14 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm

[PATCH v3 01/23] powerpc/8xx: Save r3 all the time in DTLB miss handler

2015-11-17 Thread Christophe Leroy
: Christophe Leroy --- v2: no change v3: no change arch/powerpc/kernel/head_8xx.S | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 78c1eba..1557926 100644 --- a/arch/powerpc/kernel/head_8xx.S

[PATCH v3 15/23] powerpc/8xx: remove special handling of CPU6 errata in set_dec()

2015-11-17 Thread Christophe Leroy
CPU6 ERRATA is now handled directly in mtspr(), so we can use the standard set_dec() fonction in all cases. Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/include/asm/time.h | 6 +- arch/powerpc/kernel/head_8xx.S | 18 -- 2 files changed, 1

[PATCH v3 13/23] powerpc/8xx: Add missing SPRN defines into reg_8xx.h

2015-11-17 Thread Christophe Leroy
Add missing SPRN defines into reg_8xx.h Some of them are defined in mmu-8xx.h, so we include mmu-8xx.h in reg_8xx.h, for that we remove references to PAGE_SHIFT in mmu-8xx.h to have it self sufficient, as includers of reg_8xx.h don't all include asm/page.h Signed-off-by: Christophe Leroy -

[PATCH v3 03/23] powerpc: Update documentation for noltlbs kernel parameter

2015-11-17 Thread Christophe Leroy
Now the noltlbs kernel parameter is also applicable to PPC8xx Signed-off-by: Christophe Leroy --- v2: no change v3: no change Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel

[PATCH v3 18/23] powerpc: add inline functions for cache related instructions

2015-11-17 Thread Christophe Leroy
This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst from C functions Signed-off-by: Christophe Leroy --- v2: new v3: no change arch/powerpc/include/asm/cache.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc

[PATCH v3 11/23] powerpc32: Remove useless/wrong MMU:setio progress message

2015-11-17 Thread Christophe Leroy
Commit 771168494719 ("[POWERPC] Remove unused machine call outs") removed the call to setup_io_mappings(), so remove the associated progress line message Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/mm/init_32.c | 4 1 file changed, 4 deletion

[PATCH v3 04/23] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c

2015-11-17 Thread Christophe Leroy
Now we have a 8xx specific .c file for that so put it in there as other powerpc variants do Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/mm/8xx_mmu.c | 17 + arch/powerpc/mm/init_32.c | 19 --- 2 files changed, 17 insertions

[PATCH v3 00/23] powerpc/8xx: Use large pages for RAM and IMMR and other improvments

2015-11-17 Thread Christophe Leroy
for mapping IMMR Christophe Leroy (23): powerpc/8xx: Save r3 all the time in DTLB miss handler powerpc/8xx: Map linear kernel RAM with 8M pages powerpc: Update documentation for noltlbs kernel parameter powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c powerpc32: Fix

[PATCH v3 09/23] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM

2015-11-17 Thread Christophe Leroy
IMMR is now mapped by page tables so it is not anymore necessary to PIN TLBs Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/Kconfig.debug | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 3a510f4

[PATCH v3 08/23] powerpc/8xx: Map IMMR area with 512k page at a fixed address

2015-11-17 Thread Christophe Leroy
will be no speculative accesses. With this patch applied, the number of DTLB misses during the 10 min period is reduced to 11.8 millions for a duration of 5.8s, which represents 2% of the non-idle time hence yet another 10% reduction. Signed-off-by: Christophe Leroy --- v2: - using bt instead of blt

[PATCH v3 05/23] powerpc32: Fix pte_offset_kernel() to return NULL for bad pages

2015-11-17 Thread Christophe Leroy
The fixmap related functions try to map kernel pages that are already mapped through Large TLBs. pte_offset_kernel() has to return NULL for LTLBs, otherwise the caller will try to access level 2 table which doesn't exist Signed-off-by: Christophe Leroy --- v3: New arch/powerpc/includ

[PATCH v3 07/23] powerpc/8xx: Fix vaddr for IMMR early remap

2015-11-17 Thread Christophe Leroy
s at 0xfa20 which overlaps with VM ioremap area This patch fixes the virtual address for remapping IMMR with the fixmap regardless of the value of IMMR. The size of IMMR area is 256kbytes (CPM at offset 0, security engine at offset 128k) so a 512k page is enough Signed-off-by: Christophe L

[PATCH v3 12/23] powerpc32: remove ioremap_base

2015-11-17 Thread Christophe Leroy
ioremap_base is not initialised and is nowhere used so remove it Signed-off-by: Christophe Leroy --- v2: no change v3: fix comment as well arch/powerpc/include/asm/pgtable-ppc32.h| 2 +- arch/powerpc/mm/mmu_decl.h | 1 - arch/powerpc/mm/pgtable_32.c| 3

[PATCH v3 16/23] powerpc/8xx: rewrite set_context() in C

2015-11-17 Thread Christophe Leroy
There is no real need to have set_context() in assembly. Now that we have mtspr() handling CPU6 ERRATA directly, we can rewrite set_context() in C language for easier maintenance. Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/kernel/head_8xx.S | 44

[PATCH v3 02/23] powerpc/8xx: Map linear kernel RAM with 8M pages

2015-11-17 Thread Christophe Leroy
increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: Christophe Leroy --- v2: using bt instead of bgt and named the label explicitly v3: no change arch/powerpc/kernel/head_8xx.S | 35

[PATCH v3 17/23] powerpc/8xx: rewrite flush_instruction_cache() in C

2015-11-17 Thread Christophe Leroy
On PPC8xx, flushing instruction cache is performed by writing in register SPRN_IC_CST. This registers suffers CPU6 ERRATA. The patch rewrites the fonction in C so that CPU6 ERRATA will be handled transparently Signed-off-by: Christophe Leroy --- v2: no change v3: no change arch/powerpc/kernel

[PATCH v3 10/23] powerpc/8xx: map more RAM at startup when needed

2015-11-17 Thread Christophe Leroy
selected. Signed-off-by: Christophe Leroy --- v2: no change v3: Automatic detection of available/needed memory instead of allocating 16M for all. arch/powerpc/kernel/head_8xx.S | 56 +++--- arch/powerpc/mm/8xx_mmu.c | 10 +++- 2 files changed, 56

Recurring Oops in link_path_walk()

2015-11-20 Thread Christophe Leroy
Al, We've been running Kernel 3.18 for several monthes on our embedded boards, and we have a recurring Oops in link_path_walk() It doesn't happen very often (approximatly once every month on one board among a set of 50 boards, never the same board). Here below is the last oops I got, with ker

Re: Recurring Oops in link_path_walk()

2015-11-21 Thread christophe leroy
Le 20/11/2015 22:17, Al Viro a écrit : On Fri, Nov 20, 2015 at 12:58:40PM -0600, Scott Wood wrote: Looks like garbage in dentry->d_inode, assuming that reconstruction of the mapping of line numbers to addresses is correct... Not sure it is, though; what's more, just how does LR manage to poi

[PATCH] crypto: talitos - add new crypto modes

2015-12-01 Thread Christophe Leroy
This patch adds the following algorithms to the talitos driver: * ecb(aes) * ctr(aes) * ecb(des) * cbc(des) * ecb(des3_ede) Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 83 drivers/crypto/talitos.h | 1 + 2 files changed, 84

Re: 86xx

2016-01-25 Thread christophe leroy
Le 25/01/2016 12:15, Alessio Igor Bogani a écrit : Hi All, Sorry for my very bad English! I'm looking for who takes care of the 86xx subtree (arch/powerpc/platform/86xx) but I haven't found any entry into MAINTEINARS file. Ciao, Alessio Looks like Kumar Gala is the commiter for many fi

[PATCH] powerpc/8xx: CONFIG_DEBUG_PAGEALLOC requires ITLBmiss for kernel addresses

2016-02-03 Thread Christophe Leroy
When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets flushed to track accesses to wrong areas. Therefore, kernel addresses will also generate ITLB misses. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[PATCH v5 00/23] powerpc/8xx: Use large pages for RAM and IMMR and other improvments

2016-02-03 Thread Christophe Leroy
for mapping IMMR Change in v4: * Fix of a wrong #if notified by kbuild robot in 07/23 Change in v5: * Removed use of pmd_val() as L-value * Adapted to match the new include files layout in Linux 4.5 Christophe Leroy (23): powerpc/8xx: Save r3 all the time in DTLB miss handler powerpc/8xx: Map

[PATCH v5 01/23] powerpc/8xx: Save r3 all the time in DTLB miss handler

2016-02-03 Thread Christophe Leroy
: Christophe Leroy --- v2: no change v3: no change v4: no change v5: no change arch/powerpc/kernel/head_8xx.S | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index e629e28..a89492e 100644 --- a/arch

[PATCH v5 03/23] powerpc: Update documentation for noltlbs kernel parameter

2016-02-03 Thread Christophe Leroy
Now the noltlbs kernel parameter is also applicable to PPC8xx Signed-off-by: Christophe Leroy --- v2: no change v3: no change v4: no change v5: no change Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt

[PATCH v5 02/23] powerpc/8xx: Map linear kernel RAM with 8M pages

2016-02-03 Thread Christophe Leroy
increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: Christophe Leroy --- v2: using bt instead of bgt and named the label explicitly v3: no change v4: no change v5: removed use of pmd_val() as L-value

[PATCH v5 04/23] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c

2016-02-03 Thread Christophe Leroy
Now we have a 8xx specific .c file for that so put it in there as other powerpc variants do Signed-off-by: Christophe Leroy --- v2: no change v3: no change v4: no change v5: no change arch/powerpc/mm/8xx_mmu.c | 17 + arch/powerpc/mm/init_32.c | 19 --- 2 files

[PATCH v5 05/23] powerpc32: Fix pte_offset_kernel() to return NULL for bad pages

2016-02-03 Thread Christophe Leroy
The fixmap related functions try to map kernel pages that are already mapped through Large TLBs. pte_offset_kernel() has to return NULL for LTLBs, otherwise the caller will try to access level 2 table which doesn't exist Signed-off-by: Christophe Leroy --- v3: new v4: no change v5: no c

[PATCH v5 06/23] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2016-02-03 Thread Christophe Leroy
x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, and are never defined at the same time. So rename them x_block_mapped() and define them in the relevant places Signed-off-by: Christophe Leroy --- v2: no change v3: Functions are mutually exclusive so renamed iaw Scott

[PATCH v5 07/23] powerpc/8xx: Fix vaddr for IMMR early remap

2016-02-03 Thread Christophe Leroy
s at 0xfa20 which overlaps with VM ioremap area This patch fixes the virtual address for remapping IMMR with the fixmap regardless of the value of IMMR. The size of IMMR area is 256kbytes (CPM at offset 0, security engine at offset 128k) so a 512k page is enough Signed-off-by: Christophe L

[PATCH v5 09/23] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM

2016-02-03 Thread Christophe Leroy
IMMR is now mapped by page tables so it is not anymore necessary to PIN TLBs Signed-off-by: Christophe Leroy --- v2: no change v3: no change v4: no change v5: no change arch/powerpc/Kconfig.debug | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc

[PATCH v5 08/23] powerpc/8xx: Map IMMR area with 512k page at a fixed address

2016-02-03 Thread Christophe Leroy
will be no speculative accesses. With this patch applied, the number of DTLB misses during the 10 min period is reduced to 11.8 millions for a duration of 5.8s, which represents 2% of the non-idle time hence yet another 10% reduction. Signed-off-by: Christophe Leroy --- v2: - using bt instead of blt

[PATCH v5 10/23] powerpc/8xx: map more RAM at startup when needed

2016-02-03 Thread Christophe Leroy
selected. Signed-off-by: Christophe Leroy --- v2: no change v3: Automatic detection of available/needed memory instead of allocating 16M for all. v4: no change v5: no change arch/powerpc/kernel/head_8xx.S | 56 +++--- arch/powerpc/mm/8xx_mmu.c | 10

[PATCH v5 11/23] powerpc32: Remove useless/wrong MMU:setio progress message

2016-02-03 Thread Christophe Leroy
Commit 771168494719 ("[POWERPC] Remove unused machine call outs") removed the call to setup_io_mappings(), so remove the associated progress line message Signed-off-by: Christophe Leroy --- v2: no change v3: no change v4: no change v5: no change arch/powerpc/mm/init_32.c | 4 --

[PATCH v5 12/23] powerpc32: remove ioremap_base

2016-02-03 Thread Christophe Leroy
ioremap_base is not initialised and is nowhere used so remove it Signed-off-by: Christophe Leroy --- v2: no change v3: fix comment as well v4: no change v5: no change arch/powerpc/include/asm/nohash/32/pgtable.h | 2 +- arch/powerpc/mm/mmu_decl.h | 1 - arch/powerpc/mm

<    1   2   3   4   5   6   7   8   9   10   >