On Friday 25 May 2018 09:18 AM, Josh Poimboeuf wrote:
The toc field in the mod_arch_specific struct isn't actually used
anywhere, so remove it.
Also the ftrace-specific fields are now common between 32-bit and
64-bit, so simplify the struct definition a bit by moving them out of
the __powerpc64_
On Thu, 2018-05-24 at 20:30 +1000, Michael Ellerman wrote:
> Michael Neuling writes:
>
> > This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on
> > powerpc.
>
> This doesn't work for me on a P8 guest:
>
> test: perf-hwbreak
> tags: git_version:bb5602e
> !! killing perf-hwbrea
Le 24/05/2018 à 19:24, Segher Boessenkool a écrit :
On Wed, May 23, 2018 at 09:47:32AM +0200, Christophe Leroy wrote:
At the time being, memcmp() compares two chunks of memory
byte per byte.
This patch optimises the comparison by comparing word by word.
A small benchmark performed on an 8xx
From: Simon Guo
This patch reworked selftest memcmp_64 so that memcmp selftest can
cover more test cases.
It adds testcases for:
- memcmp over 4K bytes size.
- s1/s2 with different/random offset on 16 bytes boundary.
- enter/exit_vmx_ops pairness.
Signed-off-by: Simon Guo
---
.../selftests/po
From: Simon Guo
This patch is based on the previous VMX patch on memcmp().
To optimize ppc64 memcmp() with VMX instruction, we need to think about
the VMX penalty brought with: If kernel uses VMX instruction, it needs
to save/restore current thread's VMX registers. There are 32 x 128 bits
VMX re
From: Simon Guo
This patch add VMX primitives to do memcmp() in case the compare size
is equal or greater than 4K bytes. KSM feature can benefit from this.
Test result with following test program(replace the "^>" with ""):
--
># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
>#incl
From: Simon Guo
Currently memcmp() 64bytes version in powerpc will fall back to .Lshort
(compare per byte mode) if either src or dst address is not 8 bytes aligned.
It can be opmitized in 2 situations:
1) if both addresses are with the same offset with 8 bytes boundary:
memcmp() can compare the
From: Simon Guo
There is some room to optimize memcmp() in powerpc 64 bits version for
following 2 cases:
(1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
memcmp() can align them and go with .Llong comparision mode without
fallback to .Lshort comparision mode do compare b
The toc field in the mod_arch_specific struct isn't actually used
anywhere, so remove it.
Also the ftrace-specific fields are now common between 32-bit and
64-bit, so simplify the struct definition a bit by moving them out of
the __powerpc64__ #ifdef.
Signed-off-by: Josh Poimboeuf
---
arch/powe
The EEH report functions now share a fair bit of code around the start
and end of each function.
So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.
Signed-off-by: Sam Bobroff
---
== v1 -> v2: ==
*
The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.
Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.
Signed-off-by: Sam Bobroff
---
To ease future refactoring, extract calls to eeh_enable_irq() and
eeh_disable_irq() from the various report functions. This makes
the report functions initial sequences more similar, as well as making
the IRQ changes visible when reading eeh_handle_normal_event().
Signed-off-by: Sam Bobroff
---
=
The same test is done in every EEH report function, so factor it out.
Since eeh_dev_removed() needs to be moved higher up in the file,
simplify it a little while we're at it.
Signed-off-by: Sam Bobroff
---
arch/powerpc/kernel/eeh_driver.c | 30 --
1 file changed, 16
To ease future refactoring, extract setting of the channel state
from the report functions out into their own functions. This increases
the amount of code that is identical across all of the report
functions.
Signed-off-by: Sam Bobroff
---
arch/powerpc/kernel/eeh_driver.c | 19 --
To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.
Signed-off-by: Sam Bobroff
---
arch/powerpc/kernel/eeh.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh.
If a device without a driver is recovered via EEH, the flag
EEH_DEV_NO_HANDLER is incorrectly left set on the device after
recovery, because the test in eeh_report_resume() for the existence of
a bound driver is done before the flag is cleared. If a driver is
later bound, and EEH experienced again,
As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
if ((*res == PCI_ERS_RESULT_NONE)
Hello everyone,
Here is a second, somewhat deeper, set of cleanups for the EEH code
(mostly eeh_drver.c).
These changes are not intended to significantly alter the actual processing,
but rather to improve the readability and maintainability of the code. They are
subjective by nature so I would ap
Signed-off-by: Sam Bobroff
---
arch/powerpc/kernel/eeh_driver.c | 14 --
1 file changed, 14 deletions(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 54333f6c9d67..ca9a73fe9cc5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/
Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.
Signed-off-by: Sam Bobroff
---
arch/powerpc/include/asm/eeh.h | 4
arch/powerpc/kernel/eeh_pe.c | 7 +++
2
Add a single log line at the end of successful EEH recovery, so that
it's clear that event processing has finished.
Signed-off-by: Sam Bobroff
---
arch/powerpc/kernel/eeh_driver.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_drive
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.
In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if
The current failure message includes the number of failures that have
occurred in the last hour (for a device) but it does not indicate
how many failures will be tolerated before the device is permanently
disabled.
Include the limit (eeh_max_freezes) to make this less surprising when
it happens.
From: Thomas Falcon
Date: Wed, 23 May 2018 13:37:54 -0500
> Introduce additional transport event hardening to handle
> events during device reset. In the driver's current state,
> if a transport event is received during device reset, it can
> cause the device to become unresponsive as invalid ope
Michael Ellerman writes:
> Thiago Jung Bauermann writes:
>
>> This test exercises read and write access to the AMR, IAMR and UAMOR.
>>
>> Signed-off-by: Thiago Jung Bauermann
>> ---
>> tools/testing/selftests/powerpc/include/reg.h | 1 +
>> tools/testing/selftests/powerpc/ptrace/Makefi
This test verifies that the AMR, IAMR and UAMOR are being written to a
process' core file.
Signed-off-by: Thiago Jung Bauermann
---
tools/testing/selftests/powerpc/ptrace/Makefile| 5 +-
tools/testing/selftests/powerpc/ptrace/core-pkey.c | 461 +
2 files changed, 465 in
This test exercises read and write access to the AMR, IAMR and UAMOR.
Signed-off-by: Thiago Jung Bauermann
---
tools/testing/selftests/powerpc/include/reg.h | 1 +
tools/testing/selftests/powerpc/ptrace/Makefile| 5 +-
tools/testing/selftests/powerpc/ptrace/child.h | 139 +++
On Thu, May 24, 2018 at 05:01:26PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo
>
> Originally PR KVM MMIO emulation uses only 0~31#(5 bits) for VSR
> reg number, and use mmio_vsx_tx_sx_enabled field together for
> 0~63# VSR regs.
>
> Currently PR KVM MMIO emulation is reimplemented w
On Wed, 2018-05-23 at 07:01:46 UTC, wei.guo.si...@gmail.com wrote:
> From: Simon Guo
>
> This patch exports tm_enable()/tm_disable/tm_abort() APIs, which
> will be used for PR KVM transaction memory logic.
>
> Signed-off-by: Simon Guo
> Reviewed-by: Paul Mackerras
Applied to powerpc topic/ppc
On Wed, 2018-05-23 at 07:01:45 UTC, wei.guo.si...@gmail.com wrote:
> From: Simon Guo
>
> This patches add some macros for CR0/TEXASR bits so that PR KVM TM
> logic(tbegin./treclaim./tabort.) can make use of them later.
>
> Signed-off-by: Simon Guo
> Reviewed-by: Paul Mackerras
Applied to powe
On Wed, 2018-05-23 at 07:01:44 UTC, wei.guo.si...@gmail.com wrote:
> From: Simon Guo
>
> PR KVM will need to reuse msr_check_and_set().
> This patch exports this API for reuse.
>
> Signed-off-by: Simon Guo
> Reviewed-by: Paul Mackerras
Applied to powerpc topic/ppc-kvm, thanks.
https://git.ke
On Wed, 2018-05-09 at 02:20:18 UTC, Nicholas Piggin wrote:
> Implement a local TLB flush for invalidating an LPID with variants for
> process or partition scope. And a global TLB flush for invalidating
> a partition scoped page of an LPID.
>
> These will be used by KVM in subsequent patches.
>
>
On Wed, 2018-03-28 at 19:58:11 UTC, Mathieu Malaterre wrote:
> Directly use fault_in_pages_readable instead of manual __get_user code. Fix
> warning treated as error with W=1:
>
> arch/powerpc/kernel/kvm.c:675:6: error: variable âtmpâ set but not used
> [-Werror=unused-but-set-variable]
>
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)
Recovery fails in pci_enable_resources() at the check on r->par
On Sun, May 20, 2018 at 08:50:34AM +0200, Wolfram Sang wrote:
> Since commit 1eace8344c02 ("i2c: add param sanity check to
> i2c_transfer()") and b7f625840267 ("i2c: add quirk checks to core"), the
> I2C core does this check now. We can remove it here.
>
> Signed-off-by: Wolfram Sang
Applied to
Looks fine to me (one comment below):
Reviewed-by: Segher Boessenkool
On Thu, May 24, 2018 at 11:33:18AM +, Christophe Leroy wrote:
> +_GLOBAL(csum_ipv6_magic)
> + lwz r8, 0(r3)
> + lwz r9, 4(r3)
> + addcr0, r7, r8
> + lwz r10, 8(r3)
> + adder0, r0, r9
On Thu, May 24, 2018 at 11:22:27AM +, Christophe Leroy wrote:
> Improve __csum_partial by interleaving loads and adds.
>
> On a 8xx, it brings neither improvement nor degradation.
> On a 83xx, it brings a 25% improvement.
Thanks! Looks fine to me.
> Signed-off-by: Christophe Leroy
Reviewe
On Thu, May 24, 2018 at 10:18:44AM +, Christophe Leroy wrote:
> On 05/24/2018 06:20 AM, Christophe LEROY wrote:
> >Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >>On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>>The generic csum_ipv6_magic() generates a pretty bad
On Thu, May 24, 2018 at 08:20:16AM +0200, Christophe LEROY wrote:
> Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>+_GLOBAL(csum_ipv6_magic)
> >>+ lwz r8, 0(r3)
> >>+ lwz r9, 4(r3)
> >>+ lwz r10, 8(r3)
>
Michael Ellerman wrote:
"Naveen N. Rao" writes:
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c
b/tools/testing/selftests/powerpc/security/rfi_flush.c
new file mode 100644
index ..a20fe8eca161
--- /dev/null
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the r
Implementing pte_update with pte_xchg (which uses cmpxchg) is
inefficient. A single larx/stcx. works fine, no need for the less
efficient cmpxchg sequence.
Then remove the memory barriers from the operation. There is a
requirement for TLB flushing to load mm_cpumask after the store
that reduces pt
The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.
However there is n
Prefetch the faulting address in update_mmu_cache to give the page
table walker perhaps 100 cycles head start as locks are dropped and
the interrupt completed.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/mm/mem.c | 4 +++-
arch/powerpc/mm/pgtable-book3s64.c | 3 ++-
2 files chan
This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.
The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU
In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this rea
Radix flushes the TLB when updating ptes to increase permissiveness
of protection (increase access authority). Book3S does not require
TLB flushing in this case, and it is not done on hash. This patch
avoids the flush for radix.
>From Power ISA v3.0B, p.1090:
Setting a Reference or Change Bit
Since last time:
- Fixed compile error on ppc32
- Significantly reworked mm_cpumask reset patch to restore the
lazy PID context switch optimisation, and not over-flush the
local CPU when flushing remotes (using IPIs).
- Moved mm_cpumask reset patch to the end of the series.
Nicholas Piggin (7)
On Wed, May 23, 2018 at 09:47:32AM +0200, Christophe Leroy wrote:
> At the time being, memcmp() compares two chunks of memory
> byte per byte.
>
> This patch optimises the comparison by comparing word by word.
>
> A small benchmark performed on an 8xx comparing two chuncks
> of 512 bytes performe
The generic implementation of strlen() reads strings byte per byte.
This patch implements strlen() in assembly for PPC32 based on
a read of entire words, in the same spirit as what some other
arches and glibc do.
For long strings, the time spent in strlen is reduced by 50-60%
Signed-off-by: Chri
On 2018-05-20 08:50, Wolfram Sang wrote:
> Since commit 1eace8344c02 ("i2c: add param sanity check to
> i2c_transfer()") and b7f625840267 ("i2c: add quirk checks to core"), the
> I2C core does this check now. We can remove it here.
>
> Signed-off-by: Wolfram Sang
Reviewed-by: Peter Rosin
> ---
"Naveen N. Rao" writes:
> diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c
> b/tools/testing/selftests/powerpc/security/rfi_flush.c
> new file mode 100644
> index ..a20fe8eca161
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
> @@ -0,0 +1,
The generic csum_ipv6_magic() generates a pretty bad result
: (PPC32)
0: 81 23 00 00 lwz r9,0(r3)
4: 81 03 00 04 lwz r8,4(r3)
8: 7c e7 4a 14 add r7,r7,r9
c: 7d 29 38 10 subfc r9,r9,r7
10: 7d 4a 51 10 subfe r10,r10,r10
14: 7d
Improve __csum_partial by interleaving loads and adds.
On a 8xx, it brings neither improvement nor degradation.
On a 83xx, it brings a 25% improvement.
Signed-off-by: Christophe Leroy
---
arch/powerpc/lib/checksum_32.S | 13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff -
New binutils generate the following warning
AS arch/powerpc/kernel/head_8xx.o
arch/powerpc/kernel/head_8xx.S: Assembler messages:
arch/powerpc/kernel/head_8xx.S:916: Warning: invalid register expression
This patch fixes it.
Signed-off-by: Christophe Leroy
---
arch/powerpc/kernel/head_8x
On Mon, 21 May 2018 11:36:12 +0530
"Aneesh Kumar K.V" wrote:
> Nicholas Piggin writes:
>
> > In the case of a spurious fault (which can happen due to a race with
> > another thread that changes the page table), the default Linux mm code
> > calls flush_tlb_page for that address. This is not req
Michael Neuling writes:
> This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on
> powerpc.
This doesn't work for me on a P8 guest:
test: perf-hwbreak
tags: git_version:bb5602e
!! killing perf-hwbreak
!! child died by signal 15
failure: perf-hwbreak
That means the harness
On 05/24/2018 06:20 AM, Christophe LEROY wrote:
Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
The generic csum_ipv6_magic() generates a pretty bad result
Please try with a more recent compiler, what you used is pret
4.4-stable review patch. If anyone has any objections, please let me know.
--
From: Jiri Slaby
commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for
On 05/24/2018 10:25 AM, Sandipan Das wrote:
> On 05/24/2018 01:04 PM, Daniel Borkmann wrote:
>> On 05/24/2018 08:56 AM, Sandipan Das wrote:
>>> For multi-function programs, loading the address of a callee
>>> function to a register requires emitting instructions whose
>>> count varies from one to f
From: Simon Guo
Originally PR KVM MMIO emulation uses only 0~31#(5 bits) for VSR
reg number, and use mmio_vsx_tx_sx_enabled field together for
0~63# VSR regs.
Currently PR KVM MMIO emulation is reimplemented with analyse_instr()
assistence. analyse_instr() returns 0~63 for VSR register number, s
On 05/24/2018 01:04 PM, Daniel Borkmann wrote:
> On 05/24/2018 08:56 AM, Sandipan Das wrote:
>> For multi-function programs, loading the address of a callee
>> function to a register requires emitting instructions whose
>> count varies from one to five depending on the nature of the
>> address.
>
Hi Michael,
On Thu, May 24, 2018 at 05:44:33PM +1000, Michael Ellerman wrote:
> Hi Simon,
>
> wei.guo.si...@gmail.com writes:
> > From: Simon Guo
> >
> > This patch add VMX primitives to do memcmp() in case the compare size
> > exceeds 4K bytes. KSM feature can benefit from this.
>
> You say "ex
Hi Simon,
wei.guo.si...@gmail.com writes:
> From: Simon Guo
>
> This patch add VMX primitives to do memcmp() in case the compare size
> exceeds 4K bytes. KSM feature can benefit from this.
You say "exceeds 4K" here.
> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> in
On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> - First qemu doesn't know that the guest will switch to "secure mode"
> in advance. There is no difference between a normal and a secure
> partition until the partition does the magic UV call to "enter secure
> mode" and qemu
On 05/24/2018 08:56 AM, Sandipan Das wrote:
> For multi-function programs, loading the address of a callee
> function to a register requires emitting instructions whose
> count varies from one to five depending on the nature of the
> address.
>
> Since we come to know of the callee's address only
On 05/24/2018 08:56 AM, Sandipan Das wrote:
> [1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.
>
> [2] Provide a way for resolving function calls because of the way JITed
> images are allocated in powerpc64.
>
> [3] Fix to get JITed instruction dumps for multi-function
On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote:
> subj: s/virito/virtio/
>
..snip..
> > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > +
> > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > +{
> > + /*
> > +* On protected guest pl
On Thu, 24 May 2018 12:26:54 +0530, Sandipan Das wrote:
> This splits up the contiguous JITed dump obtained via the bpf
> system call into more relatable chunks for each function in
> the program. If the kernel symbols corresponding to these are
> known, they are printed in the header for each JIT
This splits up the contiguous JITed dump obtained via the bpf
system call into more relatable chunks for each function in
the program. If the kernel symbols corresponding to these are
known, they are printed in the header for each JIT image dump
otherwise the masked start address is printed.
Befor
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
JITed image lengths of each function in a multi-function
program.
Signed-off-by: Sandipan Das
---
tools/include/uapi/linux/bpf.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/t
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of the JITed image lengths of each function for a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.
This can be used by userspace a
Currently, for multi-function programs, we cannot get the JITed
instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
command. Because of this, userspace tools such as bpftool fail
to identify a multi-function program as being JITed or not.
With the JIT enabled and the test program runni
Currently, we resolve the callee's address for a JITed function
call by using the imm field of the call instruction as an offset
from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
use this address to get the callee's kernel symbol's name.
For some architectures, such as powerpc64, th
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a program.
Signed-off-by: Sandipan Das
---
v3:
- Move new fields to the end of bpf_prog_info to avoid
breaking user
This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the bran
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.
When bpf_jit_kallsyms is enable
For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.
Since we come to know of the callee's address only before the
extra pass, the number of instructions requir
The imm field of a bpf instruction is a signed 32-bit integer.
For JITed bpf-to-bpf function calls, it holds the offset of the
start address of the callee's JITed image from __bpf_call_base.
For some architectures, such as powerpc64, this offset may be
as large as 64 bits and cannot be accomodated
79 matches
Mail list logo