Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 59 +
1 file changed, 52 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/pow
Define an interface to return a system-wide unique id for a given VAS
window.
The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.
The hardware ref
Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into th
From: Michael Neuling
On POWER9 DD2.1 and below there are issues when the paste instruction
generates an error. If an error occurs when thread reconfiguration
happens (ie another thread in the core goes into/out of powersave) the
core may hang.
To avoid this a special sequence is required which
A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only processes
that use/intend to use COPY/PASTE.
Define an interface, set_thread_used_vas(), that
We need the SPRN_TIDR to be set for use with fast thread-wakeup (core-
to-core wakeup) and also with CAPI.
Each thread in a process needs to have a unique id within the process.
But as explained below, for now, we assign globally unique thread ids
to all threads in the system.
Signed-off-by: Suka
Have the COPY/PASTE instructions depend on CONFIG_BOOK3S_64 rather than
CONFIG_PPC_STD_MMU_64.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/kernel/process.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process
Export the VAS Window context information to debugfs.
We need to hold a mutex when closing the window to prevent a race
with the debugfs read(). Rather than introduce a per-instance mutex,
we use the global vas_mutex for now, since it is not heavily contended.
The window->cop field is only releva
Define a helper, chip_to_vas_id() to map a given chip id to corresponding
vas id.
Normally, callers of vas_rx_win_open() and vas_tx_win_open() want the VAS
window to be on the same chip where the calling thread is executing. These
callers can pass in -1 for the VAS id.
This interface will be usef
Create a cpu to vasid mapping so callers can specify -1 instead of
trying to find a VAS id.
Changelog[v2]
[Michael Ellerman] Use per-cpu variables to simplify code.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas.c | 14 +-
1 file changed, 13 insert
Normally, the NX driver waits for the CRBs to be processed before closing
the window. But it is better to ensure that the credits are returned before
the window gets reassigned later.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 45
Save the configured max window credits for a window in the vas_window
structure. We will need this when polling for return of window credits.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 6 --
arch/powerpc/platforms/powernv/vas.h| 1 +
2 files
A VAS window is normally in "busy" state for only a short duration.
Reduce the time we wait for the window to go to "not-busy" state to
speed-up vas_win_close() a bit.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 10 ++
1 file changed, 6 insertions
Use a helper to have the hardware unpin and mark a window closed.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 22 +++---
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/vas-window.c
b/arch/pow
Polling for window cast out is listed in the spec, but turns out that
it is not strictly necessary and slows down window close. Making it a
stub for now.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 34 ++---
1 file changed, 17 inse
Clean up vas.h and the debug code around ifdef vas_debug.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 8 +++--
arch/powerpc/platforms/powernv/vas.h| 56 +++--
2 files changed, 18 insertions(+), 46 deletions(-)
diff --git
NX-842, the only user of VAS, sets the window credits to default values
but VAS should check the credits against the possible max values.
The VAS_WCREDS_MIN is not needed and can be dropped.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-window.c | 6 ++
arch/powe
Initialize a few missing window context fields from the window attributes
specified by the caller. These fields are currently set to their default
values by the caller (NX-842), but would be good to apply them anyway.
Signed-off-by: Sukadev Bhattiprolu
---
arch/powerpc/platforms/powernv/vas-wind
The first 10 patches in this set were posted earlier[1] and don't have
any significant changes since then. This set sanitizes cpu/chip id to
VAS id mapping, improves vas_win_close() performance and adds a check
for return of credits and cleans up some code.
Patch 11 adds debugfs support for the VA
From: Bjorn Helgaas
The default VGA device is normally set in vga_arbiter_add_pci_device() when
we call it for the first enabled device that can be accessed with the
legacy VGA resources ([mem 0xa-0xb], etc.)
That default device can be overridden by an EFI device that owns the boot
frame
From: Bjorn Helgaas
Daniel Axtens reported that on the HiSilicon D05 board, the VGA device is
behind a bridge that doesn't support PCI_BRIDGE_CTL_VGA, so the VGA arbiter
never selects it as the default, which means Xorg auto-detection doesn't
work.
VGA is a legacy PCI feature: a VGA device can r
These patches are supposed to fix a problem Daniel Axtens found on the
HiSilicon D05 board. The VGA device there is behind a bridge that doesn't
support PCI_BRIDGE_CTL_VGA, so the arbiter never selects the device as the
default.
The first patch extends the arbiter so that if it can't find an enab
Nathan Fontenot writes:
> This patch set provides a set of updates to de-couple the LMB information
> provided in the ibm,dynamic-memory device tree property from the device
> tree property format. The goal is to provide a data set of LMB information
> so that consumners of this data do not need t
- On Oct 6, 2017, at 5:08 PM, Paul E. McKenney paul...@linux.vnet.ibm.com
wrote:
> On Thu, Oct 05, 2017 at 06:33:26PM -0400, Mathieu Desnoyers wrote:
>> Architectures without membarrier hooks don't need to emit the
>> empty membarrier_arch_switch_mm() static inline when
>> CONFIG_MEMBARRIER=
On Thu, Oct 05, 2017 at 06:33:26PM -0400, Mathieu Desnoyers wrote:
> Architectures without membarrier hooks don't need to emit the
> empty membarrier_arch_switch_mm() static inline when
> CONFIG_MEMBARRIER=y.
>
> Adapt the CONFIG_MEMBARRIER=n counterpart to only emit the empty
> membarrier_arch_sw
On Wed, Sep 27, 2017 at 01:52:55PM +1000, Daniel Axtens wrote:
> Hi Bjorn,
>
> Yes, this works:
>
> Tested-by: Daniel Axtens # arm64, ppc64-qemu-tcg
I guess I was assuming you'd pick this up, but that doesn't really
make sense because I didn't give you a signed-off-by or anything.
I'll post thi
Hello there,
linux-4.14-rc3/arch/powerpc/perf/imc-pmu.c:599]: (style) Unsigned variable
'ncpu' can't be negative so it is unnecessary to test it.
Source code is
if (ncpu >= 0 && ncpu < nr_cpu_ids) {
but
unsigned int ncpu, core_id;
Suggest remove test.
Regards
David Binderman
Hi Michal,
As I've said in other reply this should go in only if the scenario you
describe is real. I am somehow suspicious to be honest. I simply do not
see how those weird struct pages would be in a valid pfn range of any
zone.
There are examples of both when unavailable memory is not part
Hi Anshuman,
Thank you very much for looking at this. My reply below::
On 10/06/2017 02:48 AM, Anshuman Khandual wrote:
On 10/04/2017 08:59 PM, Pavel Tatashin wrote:
This patch fixes another existing issue on systems that have holes in
zones i.e CONFIG_HOLES_IN_ZONE is defined.
In for_each_me
This patch avoids copy of buffered data to hash from bufnext to buf
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 36 ++--
1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 5c
SEC1 doesn't support S/G in descriptors so for hash operations,
the CPU has to build a buffer containing the buffered block and
the incoming data. This generates a lot of memory copies which
represents more than 50% of CPU time of a md5sum operation as
shown below with a 'perf record'.
|--86.24%--
At every request, we map and unmap the same hash hw_context.
This patch moves the dma mapping/unmapping in functions ahash_init()
and ahash_import().
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 80 ++--
1 file changed, 57 insertions
dma_map_single() is an heavy operation which doesn't need to
be done at each request as the key doesn't change.
Instead of DMA mapping the key at every request, this patch maps it
once in setkey()
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 56
Do (desc->hdr & DESC_HDR_TYPE_IPSEC_ESP) only once.
Limit number of if/else paths
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 42 --
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypt
to_talitos_ptr() and to_talitos_ptr_len() are always called together
in order to fully set a ptr, so lets merge them into a single
helper.
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 56 ++--
1 file changed, 21 insertions(+), 35 dele
The number of channels is known from the beginning, no need to
test it everytime.
This patch defines two additional done functions handling only channel 0.
Then the probe registers the correct one based on the number of channels.
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 27
Use devm_ioremap()
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index f139a0cef2e2..83b2a70a1ba7 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/cry
Use of_property_read_u32() to simplify DT read
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 21 +
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 2a53d0f2a869..f139a0cef2e2 100644
--- a
Replace kmalloc() by devm_kmalloc()
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 30 --
1 file changed, 12 insertions(+), 18 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index dd6b1fc90020..2a53d0f2a869 100644
--- a/
talitos_handle_buggy_hash() and talitos_sg_map() are only used
locally, make them static
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 266e7e626e12..dd
This patch zeroize the descriptor at allocation using memset().
This has two advantages:
- It reduces the number of places where data has to be set to 0
- It avoids reading memory and loading the cache with data that
will be entirely replaced.
Signed-off-by: Christophe Leroy
---
drivers/crypto/t
ctr-aes-talitos test fails as follows on SEC2
[0.837427] alg: skcipher: Test 1 failed (invalid result) on encryption for
ctr-aes-talitos
[0.845763] : 16 36 d5 ee 34 f8 06 25 d7 7f 8e 56 ca 88 43 45
[0.852345] 0010: f9 3f f7 17 2a b2 12 23 30 43 09 15 82 dd e1 97
[0.858
sg_link_tbl_len shall be used instead of cryptlen, otherwise
SECs which perform HW CICV verification will fail.
Signed-off-by: Christophe Leroy
---
drivers/crypto/talitos.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
sha224 AEAD test fails with:
[2.803125] talitos ff02.crypto: DEUISR 0x_
[2.808743] talitos ff02.crypto: MDEUISR 0x8010_
[2.814678] talitos ff02.crypto: DESCBUF 0x20731f21_0018
[2.820616] talitos ff02.crypto: DESCBUF 0x0628d64c_001
Crypto manager test report the following failures:
[3.061081] alg: skcipher: setkey failed on test 5 for ecb-des-talitos:
flags=100
[3.069342] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-talitos:
flags=100
[3.077754] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-
On SEC2, when using the old descriptors type (hmac snoop no afeu)
for doing IPsec, the CICV out pointeur points out of the allocated
memory.
[2.502554]
=
[2.510740] BUG dma-kmalloc-256 (Not tainted): Redzone overw
AEAD tests fail when destination SG list has more than 1 element.
[2.058752] alg: aead: Test 1 failed on encryption for
authenc-hmac-sha1-cbc-aes-talitos
[2.066965] : 53 69 6e 67 6c 65 20 62 6c 6f 63 6b 20 6d 73 67
0010: c0 43 ff 74 c0 43 ff e0 de 83 d1 20 de 84 8e 54
0020
This serie fixes and improves the talitos crypto driver.
First 6 patchs are fixes of failures reported by the new tests in the
kernel crypto test manager.
The 8 following patches are cleanups and simplifications.
The last 4 ones are performance improvement. The main improvement is
in the one bef
On Thu 05-10-17 17:11:19, Pavel Tatashin wrote:
> Some memory is reserved but unavailable: not present in memblock.memory
> (because not backed by physical pages), but present in memblock.reserved.
> Such memory has backing struct pages, but they are not initialized by going
> through __init_single
On Fri 06-10-17 12:11:42, David Laight wrote:
> From: Michal Hocko
> > Sent: 06 October 2017 12:47
> > On Fri 06-10-17 11:10:14, David Laight wrote:
> > > From: Pavel Tatashin
> > > > Sent: 05 October 2017 22:11
> > > > vmemmap_alloc_block() will no longer zero the block, so zero memory
> > > > at
From: Michal Hocko
> Sent: 06 October 2017 12:47
> On Fri 06-10-17 11:10:14, David Laight wrote:
> > From: Pavel Tatashin
> > > Sent: 05 October 2017 22:11
> > > vmemmap_alloc_block() will no longer zero the block, so zero memory
> > > at its call sites for everything except struct pages. Struct p
On Fri 06-10-17 11:10:14, David Laight wrote:
> From: Pavel Tatashin
> > Sent: 05 October 2017 22:11
> > vmemmap_alloc_block() will no longer zero the block, so zero memory
> > at its call sites for everything except struct pages. Struct page memory
> > is zero'd by struct page initialization.
>
On 2017/09/18 09:23AM, Santosh Sivaraj wrote:
> Current vDSO64 implementation does not have support for coarse clocks
> (CLOCK_MONOTONIC_COARSE, CLOCK_REALTIME_COARSE), for which it falls back
> to system call, increasing the response time, vDSO implementation reduces
> the cycle time. Below is a b
Hi Cyril,
On 06-10-2017 04:46, Cyril Bur wrote:
> [added by Cyril Bur]
> As the no-suspend firmware change is novel and untested using it should
> be opt in by users. Furthumore, currently the kernel has no method to
I forgot to mention on my last reply, but should s/Furthumore/Furthermore/ ?
Re
On 2017-10-06 16:00, Michael Ellerman wrote:
Shriya writes:
Make /proc/cpuinfo read the frequency of the CPU it is running at
instead of reading the cached value of the last requested frequency.
In conditions like WOF/throttle CPU can be running at a different
frequency than the requested freq
Hi Cyril,
On 06-10-2017 04:46, Cyril Bur wrote:
> From: Michael Neuling
>
> Unfortunately userspace can construct a sigcontext which enables
> suspend. Thus userspace can force Linux into a path where trechkpt is
> executed.
>
> This patch blocks this from happening on POWER9 but sanity checkin
From: Pavel Tatashin
> Sent: 05 October 2017 22:11
> vmemmap_alloc_block() will no longer zero the block, so zero memory
> at its call sites for everything except struct pages. Struct page memory
> is zero'd by struct page initialization.
It seems dangerous to change an allocator to stop zeroing
On Thu, 2017-09-07 at 05:05:51 UTC, Anton Blanchard wrote:
> From: Anton Blanchard
>
> Memory hot unplug on PowerNV radix hosts is broken. Our memory block
> size is 256MB but since we map the linear region with very large pages,
> each pte we tear down maps 1GB.
>
> A hot unplug of one 256MB me
Hi Linus,
Please pull some more powerpc fixes for 4.14.
This is two weeks worth of fixes, and the diffstat is reasonably small,
so I think we're on track.
The following changes since commit e19b205be43d11bff638cad4487008c48d21c103:
Linux 4.14-rc2 (2017-09-24 16:38:56 -0700)
are available in
Unless you (Daniel) think there's some reason lmb_is_removable() is
incorrectly returning false. But most likely it's correct and there's
just an unmovable allocation in that range.
I am not educated enough to say that the current behavior is wrong. What I
can say is that in 4.11 and older ker
Shriya writes:
> Make /proc/cpuinfo read the frequency of the CPU it is running at
> instead of reading the cached value of the last requested frequency.
> In conditions like WOF/throttle CPU can be running at a different
> frequency than the requested frequency.
Sounds like a bug fix to me ?
c
Benjamin Herrenschmidt writes:
> On Fri, 2017-10-06 at 18:46 +1100, Cyril Bur wrote:
...
>> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
>> index 760872916013..2cb01b48123a 100644
>> --- a/arch/powerpc/kernel/cputable.c
>> +++ b/arch/powerpc/kernel/cputable.c
>> @@
Thanks for reviewing Naveen.
"Naveen N. Rao" writes:
> On 2017/09/18 09:23AM, Santosh Sivaraj wrote:
>> diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S
>> b/arch/powerpc/kernel/vdso64/gettimeofday.S
>> index 382021324883..a0b4943811db 100644
>> --- a/arch/powerpc/kernel/vdso64/gettimeofda
On 2017/09/18 09:23AM, Santosh Sivaraj wrote:
> Current vDSO64 implementation does not have support for coarse clocks
> (CLOCK_MONOTONIC_COARSE, CLOCK_REALTIME_COARSE), for which it falls back
> to system call, increasing the response time, vDSO implementation reduces
> the cycle time. Below is a b
Hi Santosh,
On 2017/09/18 09:23AM, Santosh Sivaraj wrote:
> Reorganize code to make it easy to introduce CLOCK_REALTIME_COARSE and
> CLOCK_MONOTONIC_COARSE timer support.
>
> Signed-off-by: Santosh Sivaraj
> ---
> arch/powerpc/kernel/vdso64/gettimeofday.S | 14 --
> 1 file changed,
> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
>
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit.
On Fri, 2017-10-06 at 18:46 +1100, Cyril Bur wrote:
> From: Michael Neuling
>
> Unfortunately userspace can construct a sigcontext which enables
> suspend. Thus userspace can force Linux into a path where trechkpt is
> executed.
>
> This patch blocks this from happening on POWER9 but sanity chec
On Fri, 2017-10-06 at 18:46 +1100, Cyril Bur wrote:
> [from Michael Neulings original patch]
> Each POWER9 core is made of two super slices. Each super slice can
> only have one thread at a time in TM suspend mode. The super slice
> restricts ever entering a state where both threads are in suspend
From: Michael Neuling
Unfortunately userspace can construct a sigcontext which enables
suspend. Thus userspace can force Linux into a path where trechkpt is
executed.
This patch blocks this from happening on POWER9 but sanity checking
sigcontexts passed in.
ptrace doesn't have this problem as o
[from Michael Neulings original patch]
Each POWER9 core is made of two super slices. Each super slice can
only have one thread at a time in TM suspend mode. The super slice
restricts ever entering a state where both threads are in suspend by
aborting transactions on tsuspend or exceptions into the
Currently the kernel relies on firmware to inform it whether or not the
CPU supports HTM and as long as the kernel was built with
CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make use
of the facility.
There may be situations where it would be advantageous for the kernel
to not al
On Thu, Oct 05, 2017 at 03:38:42PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas
>
> All users of pcibios_set_master() include , which already has
> a declaration. Remove the unnecessary declarations from the
> files.
>
> Signed-off-by: Bjorn Helgaas
> ---
> arch/alpha/include/asm/pci.h
On Thu, Oct 05, 2017 at 03:38:49PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas
>
> defines struct pci_bus and struct pci_dev and includes the
> struct resource definition before including . Nobody includes
> directly, so they don't need their own declarations.
>
> Remove the redundant s
Make /proc/cpuinfo read the frequency of the CPU it is running at
instead of reading the cached value of the last requested frequency.
In conditions like WOF/throttle CPU can be running at a different
frequency than the requested frequency.
Signed-off-by: Shriya
---
arch/powerpc/platforms/powern
74 matches
Mail list logo