Hi Michael,
On Thu, Feb 7, 2019 at 3:49 AM Michael Ellerman wrote:
> The recent rework of PCI kconfig symbols exposed an existing bug in
> the CURRITUCK kconfig logic.
>
> It selects PPC4xx_PCI_EXPRESS which depends on PCI, but PCI is user
> selectable and might be disabled, leading to a warning:
On 2/7/19 3:51 AM, David Gibson wrote:
> On Wed, Feb 06, 2019 at 08:35:24AM +0100, Cédric Le Goater wrote:
>> On 2/6/19 2:18 AM, David Gibson wrote:
>>> On Wed, Feb 06, 2019 at 09:13:15AM +1100, Paul Mackerras wrote:
On Tue, Feb 05, 2019 at 12:31:28PM +0100, Cédric Le Goater wrote:
As
On 2/7/19 3:49 AM, David Gibson wrote:
> On Wed, Feb 06, 2019 at 08:21:10AM +0100, Cédric Le Goater wrote:
>> On 2/6/19 2:23 AM, David Gibson wrote:
>>> On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote:
On 2/5/19 6:28 AM, David Gibson wrote:
> On Mon, Feb 04, 2019 at 12:30:
On 07/02/2019 05:33, Michael Ellerman wrote:
> Hi Laurent,
>
> I'm not sure I'm convinced about this one. It seems like we're just
> throwing away the warning because it's annoying.
>
> Laurent Vivier writes:
>> resize_hpt_for_hotplug() reports a warning when it cannot
>> increase the hash page
On 2/7/19 3:48 AM, David Gibson wrote:
> On Wed, Feb 06, 2019 at 08:07:36AM +0100, Cédric Le Goater wrote:
>> On 2/6/19 2:24 AM, David Gibson wrote:
>>> On Wed, Feb 06, 2019 at 12:23:29PM +1100, David Gibson wrote:
On Tue, Feb 05, 2019 at 02:03:11PM +0100, Cédric Le Goater wrote:
> On 2/5/
On 07/02/2019 04:03, David Gibson wrote:
> On Tue, Feb 05, 2019 at 09:21:33PM +0100, Laurent Vivier wrote:
>> resize_hpt_for_hotplug() reports a warning when it cannot
>> increase the hash page table ("Unable to resize hash page
>> table to target order") but this is not blocking and
>> can make us
Hello Joel,
On Wed, 6 Feb 2019 11:06:58 +1030
Joel Stanley wrote:
> This converts the powernv flash driver to use SPDX, and adds some
> clarifying comments that came out of a discussion on how the mtd driver
> works.
Can you split that in 2 patches, one adding the SPDX header, and the
other on
On Thu, Feb 7, 2019 at 6:18 AM Michael Ellerman wrote:
>
> Add a generic 32-bit defconfig called ppc_defconfig. This means we'll
> have a defconfig matching "uname -m" for all cases.
Looks good to me:
$ make defconfig
*** Default configuration is based on 'ppc_defconfig'
arch/powerpc/configs/ppc
On Thu, Feb 7, 2019 at 6:20 AM Michael Ellerman wrote:
>
> Our logic for choosing defconfig doesn't work well in some situations.
>
> For example if you're on a ppc64le machine but you specify a non-empty
> CROSS_COMPILE, in order to use a non-default toolchain, then defconfig
> will give you ppc6
Chandan reported that fstests' generic/026 test hit a crash:
BUG: Unable to handle kernel data access at 0xc0062ac4
Faulting instruction address: 0xc0092240
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
CPU: 0 PID: 27828 Co
Arch code can set a "dump stack arch description string" which is
displayed with oops output to describe the hardware platform.
It is useful to initialise this as early as possible, so that an early
oops will have the hardware description.
However in practice we discover the hardware platform in
As soon as we've done some basic setup, add the PVR and CPU name to
the dump stack arch description, which is printed in case of an oops.
eg: Hardware name: ... POWER8E (raw) pvr:0x4b0201
Signed-off-by: Michael Ellerman
---
arch/powerpc/kernel/cputable.c | 1 +
arch/powerpc/kernel/prom.c |
If we detect a logical PVR add that to the dump stack arch
description, which is printed in case of an oops.
eg: Hardware name: ... lpvr:0xf04
Signed-off-by: Michael Ellerman
---
arch/powerpc/kernel/prom.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/k
As soon as we know the model of the machine we're on, add it to the
dump stack arch description, which is printed in case of an oops.
eg: Hardware name: model:'IBM,8247-22L'
Signed-off-by: Michael Ellerman
---
arch/powerpc/kernel/prom.c | 20
1 file changed, 20 insertions(+
As soon as we know the name of the machine description we're using, add
it to the dump stack arch description, which is printed in case of an
oops.
eg: Hardware name: ... machine:pSeries
Signed-off-by: Michael Ellerman
---
arch/powerpc/kernel/setup-common.c | 3 +++
1 file changed, 3 insertions
Once we have unflattened the device tree we can easily grab these opal
version details and add them to dump stack arch description, which is
printed in case of an oops.
eg: Hardware name: ... opal:v6.2
Signed-off-by: Michael Ellerman
---
arch/powerpc/platforms/powernv/setup.c | 25 +
Once we have unflattened the device tree we can easily grab these
firmware version details and add them to dump stack arch description,
which is printed in case of an oops.
Currently /hypervisor only exists on KVM, so if we don't find that look
for something that suggests we're on phyp and if so t
On Thu, Feb 07, 2019 at 10:53:13PM +1100, Michael Ellerman wrote:
> Chandan reported that fstests' generic/026 test hit a crash:
> The instruction dump decodes as:
> subfic r6,r5,8
> rlwinm r6,r6,3,0,28
> ldbrx r9,0,r3
> ldbrx r10,0,r4<-
>
> Which shows us doing an 8 byte load f
On Wed, Jan 16, 2019 at 04:59:27PM +, Christophe Leroy wrote:
> v3: Moved 'Returns:" comment after description.
> Explained in the commit log why the function is defined static inline
>
> v2: Added "Returns:" comment and removed probe_user_address()
The correct spelling is 'Return:', n
Russell Currey writes:
> On Thu, 2019-02-07 at 15:08 +1000, Nicholas Piggin wrote:
>> Russell Currey's on February 6, 2019 4:28 pm:
>> > Without restoring the IAMR after idle, execution prevention on
>> > POWER9
>> > with Radix MMU is overwritten and the kernel can freely execute
>> > userspace
On Thu, Feb 07, 2019 at 02:07:08PM -0500, Waiman Long wrote:
> +static inline int __down_read_trylock(struct rw_semaphore *sem)
> +{
> + long tmp;
> +
> + while ((tmp = atomic_long_read(&sem->count)) >= 0) {
> + if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp,
> +
On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote:
> On 32-bit architectures, there aren't enough bits to hold both.
> 64-bit architectures, however, can have enough bits to do that. For
> x86-64, the physical address can use up to 52 bits. That is 4PB of
> memory. That leaves 12 bits ava
On Thu, Feb 07, 2019 at 08:36:56PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 02:07:08PM -0500, Waiman Long wrote:
>
> > +static inline int __down_read_trylock(struct rw_semaphore *sem)
> > +{
> > + long tmp;
> > +
> > + while ((tmp = atomic_long_read(&sem->count)) >= 0) {
> > +
On Thu, 07 Feb 2019, Waiman Long wrote:
30 files changed, 1197 insertions(+), 1594 deletions(-)
Performance numbers on numerous workloads, pretty please.
I'll go and throw this at my mmap_sem intensive workloads
I've collected.
Thanks,
Davidlohr
On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote:
> On 32-bit architectures, there aren't enough bits to hold both.
> 64-bit architectures, however, can have enough bits to do that. For
> x86-64, the physical address can use up to 52 bits. That is 4PB of
> memory. That leaves 12 bits ava
This patchset revamps the current rwsem-xadd implementation to make
it saner and easier to work with. This patchset removes all the
architecture specific assembly code and uses generic C code for all
architectures. This eases maintenance and enables us to enhance the
code more easily.
This patchse
The percpu event counts used by qspinlock code can be useful for
other locking code as well. So a new set of lockevent_* counting APIs
is introduced with the lock event names extracted out into the new
lock_events_list.h header file for easier addition in the future.
The existing qstat_inc() calls
The QUEUED_LOCK_STAT option to report queued spinlocks event counts
was previously allowed only on x86 architecture. To make the locking
event counting code more useful, it is now renamed to a more generic
LOCK_EVENT_COUNTS config option. This new option will be available to
all the architectures t
The rwsem_down_read_failed*() functions were relocted from above the
optimistic spinning section to below that section. This enables the
reader functions to use optimisitic spinning in future patches. There
is no code change.
Signed-off-by: Waiman Long
---
kernel/locking/rwsem-xadd.c | 172 +
As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not be that much faster than the generic code. So we can
remove those arch specific rwsem.h and stop building asm/rwsem.h to
reduce maintenance effort.
The setting of owner field is specific to rwsem-xadd code, it is not needed
for rwsem-spinlock. This patch moves all the owner setting code closer
to the rwsem-xadd fast paths directly within rwsem.h file.
For __down_read() and __down_read_killable(), it is assumed that
rwsem will be marked as rea
The content of kernel/locking/rwsem.h is now specific to rwsem-xadd only.
Rename it to rwsem-xadd.h to indicate that it is specific to rwsem-xadd
and include it only when CONFIG_RWSEM_XCHGADD_ALGORITHM is set.
Signed-off-by: Waiman Long
---
kernel/locking/percpu-rwsem.c | 4 +-
kernel/locking/
We don't need to expose rwsem internal functions which are not supposed
to be called directly from other kernel code.
Signed-off-by: Waiman Long
---
include/linux/rwsem.h | 7 ---
kernel/locking/rwsem-xadd.h | 7 +++
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/in
When rwsem_down_read_failed*() return, the read lock is acquired
indirectly by others. So debug checks are added in __down_read() and
__down_read_killable() to make sure the rwsem is really reader-owned.
The other debug check calls in kernel/locking/rwsem.c except the
one in up_read_non_owner() ar
Currently, the DEBUG_RWSEMS_WARN_ON() macro just dumps a stack trace
when the rwsem isn't in the right state. It does not show the actual
states of the rwsem. This may not be that helpful in the debugging
process.
Enhance the DEBUG_RWSEMS_WARN_ON() macro to also show the current
content of the rws
Add lock event counting calls so that we can track the number of lock
events happening in the rwsem code.
With CONFIG_LOCK_EVENT_COUNTS on and booting a 1-socket 22-core
44-thread x86-64 system, the non-zero rwsem counts after system bootup
were as follows:
rwsem_opt_fail=113
rwsem_opt_wlock=
The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimiz
Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.
The mutex code has a lock handoff mechanism to prevent lock starvation.
This patch implements a similar lock handoff mec
With the commit 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention
in wakeup after up_read()/up_write()"), the rwsem_wake() forgoes doing
a wakeup if the wait_lock cannot be directly acquired and an optimistic
spinning locker is present. This can help performance by avoiding
spinning on the
Before combining owner and count, we are adding two new helpers for
accessing the owner value in the rwsem.
1) struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
2) bool is_rwsem_reader_owned(struct rw_semaphore *sem)
Signed-off-by: Waiman Long
---
kernel/locking/rwsem-xadd.c | 11
With separate count and owner, there are timing windows where the two
values are inconsistent. That can cause problem when trying to figure
out the exact state of the rwsem. For instance, a RT task will stop
optimistic spinning if the lock is acquired by a writer but the owner
field isn't set yet.
On 64-bit architectures, each rwsem writer will have its unique lock
word for acquiring the lock. Right now, the writer code recomputes the
lock word every time it tries to acquire the lock. This is a waste of
time. The lock word is now cached and reused when it is needed.
On 32-bit architectures,
After merging the owner value directly into the count field, it was
found that the number of failed optimistic spinning operations increased
significantly during the boot up process.
The cause of this increased failures was tracked down to the condition
that a lock holder might have just released
This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.
Signed-off-by: Waiman Long
---
kernel/locking/rwsem-xadd.c | 25 +++--
kernel/locking/rwsem-xadd.h |
This patch enables readers to optimistically spin on a
rwsem when it is owned by a writer instead of going to sleep
directly. The rwsem_can_spin_on_owner() function is extracted
out of rwsem_optimistic_spin() and is called directly by
__rwsem_down_read_failed_common() and __rwsem_down_write_failed
When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.
This patch provid
When the front of the wait queue is a reader, other readers
immediately following the first reader will also be woken up at the
same time. However, if there is a writer in between. Those readers
behind the writer will not be woken up.
Because of optimistic spinning, the lock acquisition order is n
An RT task can do optimistic spinning only if the lock holder is
actually running. If the state of the lock holder isn't known, there
is a possibility that high priority of the RT task may block forward
progress of the lock holder if they happen to reside on the same CPU.
This will lead to deadlock
On 02/07/2019 02:36 PM, Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 02:07:08PM -0500, Waiman Long wrote:
>
>> +static inline int __down_read_trylock(struct rw_semaphore *sem)
>> +{
>> +long tmp;
>> +
>> +while ((tmp = atomic_long_read(&sem->count)) >= 0) {
>> +if (tmp == ato
On 02/07/2019 02:45 PM, Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote:
>> On 32-bit architectures, there aren't enough bits to hold both.
>> 64-bit architectures, however, can have enough bits to do that. For
>> x86-64, the physical address can use up to 52 bit
On 02/07/2019 02:51 PM, Davidlohr Bueso wrote:
> On Thu, 07 Feb 2019, Waiman Long wrote:
>> 30 files changed, 1197 insertions(+), 1594 deletions(-)
>
> Performance numbers on numerous workloads, pretty please.
>
> I'll go and throw this at my mmap_sem intensive workloads
> I've collected.
>
> Thank
On 02/07/2019 03:08 PM, Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote:
>> On 32-bit architectures, there aren't enough bits to hold both.
>> 64-bit architectures, however, can have enough bits to do that. For
>> x86-64, the physical address can use up to 52 bit
On Thu, 2019-02-07 at 14:37 -0200, Thiago Jung Bauermann wrote:
> Russell Currey writes:
> > On Thu, 2019-02-07 at 15:08 +1000, Nicholas Piggin wrote:
> > > Russell Currey's on February 6, 2019 4:28 pm:
> > > >
> > > > Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel
> > > > execution of
>
On 2/6/19 10:03 PM, Tobin C. Harding wrote:
> The PowerPC docs have yet to be converted to RST format. Let's kick it
> off by doing all the files that _don't_ contain ASCII art.
>
> - Add SPDX license identifier to each new RST file.
>
> .. SPDX-License-Identifier: GPL-2.0
>
> - User correc
On Thu, 7 Feb 2019 17:03:15 +1100
"Tobin C. Harding" wrote:
> As discussed at LCA here is the start to the docs conversion for PowerPC
> to RST.
>
> This applies cleanly on top of the mainline (5.20-rc5) and Jon's tree
> (docs-next branch).
>
> I'm guessing it should go in through the PowerPC
(+ Nick)
On 7/2/19 6:49 pm, Segher Boessenkool wrote:
On Thu, Feb 07, 2019 at 05:59:48PM +1100, Andrew Donnellan wrote:
On 7/2/19 5:37 pm, Segher Boessenkool wrote:
On Thu, Feb 07, 2019 at 04:33:23PM +1100, Andrew Donnellan wrote:
Some older gccs (cpu_ftr_bit_mask)
It seems to me the warnin
Nicholas Piggin writes:
> Russell Currey's on February 6, 2019 4:28 pm:
>> Without restoring the IAMR after idle, execution prevention on POWER9
>> with Radix MMU is overwritten and the kernel can freely execute userspace
>> without
>> faulting.
>>
>> This is necessary when returning from any st
Cc-ing Steven
https://lore.kernel.org/lkml/20190207124635.3885-1-...@ellerman.id.au/T/#u
On (02/07/19 23:46), Michael Ellerman wrote:
> Arch code can set a "dump stack arch description string" which is
> displayed with oops output to describe the hardware platform.
>
> It is useful to initiali
Jann Horn writes:
> On Thu, Feb 7, 2019 at 10:22 AM Christophe Leroy
> wrote:
>> In powerpc code, there are several places implementing safe
>> access to user data. This is sometimes implemented using
>> probe_kernel_address() with additional access_ok() verification,
>> sometimes with get_user()
Andrew Donnellan writes:
> (+ Nick)
>
> On 7/2/19 6:49 pm, Segher Boessenkool wrote:
>> On Thu, Feb 07, 2019 at 05:59:48PM +1100, Andrew Donnellan wrote:
>>> On 7/2/19 5:37 pm, Segher Boessenkool wrote:
On Thu, Feb 07, 2019 at 04:33:23PM +1100, Andrew Donnellan wrote:
> Some older gccs (>
There's no need to the custom getter/setter functions so we should remove
them in favour of using the generic one. While we're here, change the type
of eeh_max_freeze to uint32_t and print the value in decimal rather than
hex because printing it in hex makes no sense.
Signed-off-by: Oliver O'Hallo
The EEH address cache is used to map a physical MMIO address back to a PCI
device. It's useful to know when it's being manipulated, but currently this
requires recompiling with #define DEBUG set. This is pointless since we
have dynamic_debug nowdays, so remove the #ifdef guard and add a pr_debug()
Adds a debugfs file that can be read to view the contents of the EEH
address cache. This is pretty similar to the existing
eeh_addr_cache_print() function, but that function is intended to debug
issues inside of the kernel since it's #ifdef`ed out by default, and writes
into the kernel log.
Signed
To use this function at all #define DEBUG needs to be set in eeh_cache.c.
Considering that printing at pr_debug is probably not all that useful since
it adds the additional hurdle of requiring you to enable the debug print if
dynamic_debug is in use so this patch bumps it to pr_info.
Signed-off-by
Add a helper to find the pci_controller structure based on the domain
number / phb id.
Signed-off-by: Oliver O'Halloran
---
arch/powerpc/include/asm/pci-bridge.h | 2 ++
arch/powerpc/kernel/pci-common.c | 11 +++
2 files changed, 13 insertions(+)
diff --git a/arch/powerpc/include/
Currently when we detect an error we automatically invoke the EEH recovery
handler. This can be annoying when debugging EEH problems, or when working
on EEH itself so this patch adds a debugfs knob that will prevent a
recovery event from being queued up when an issue is detected.
Signed-off-by: Ol
This patch adds a debugfs interface to force scheduling a recovery event.
This can be used to recover a specific PE or schedule a "special" recovery
even that checks for errors at the PHB level.
To force a recovery of a normal PE, use:
echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_rec
On 8/2/19 2:02 pm, Michael Ellerman wrote:> I'd prefer a minimal fix
that we can backport. How about:
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 8be3721d9302..a1acccd25839 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_c
Jonathan Corbet writes:
> On Thu, 7 Feb 2019 17:03:15 +1100
> "Tobin C. Harding" wrote:
>
>> As discussed at LCA here is the start to the docs conversion for PowerPC
>> to RST.
>>
>> This applies cleanly on top of the mainline (5.20-rc5) and Jon's tree
>> (docs-next branch).
>>
>> I'm guessing
On Thu, Feb 07, 2019 at 09:31:06AM +0100, Cédric Le Goater wrote:
> On 2/7/19 3:51 AM, David Gibson wrote:
> > On Wed, Feb 06, 2019 at 08:35:24AM +0100, Cédric Le Goater wrote:
> >> On 2/6/19 2:18 AM, David Gibson wrote:
> >>> On Wed, Feb 06, 2019 at 09:13:15AM +1100, Paul Mackerras wrote:
> O
On Thu, Feb 07, 2019 at 10:03:15AM +0100, Cédric Le Goater wrote:
> On 2/7/19 3:49 AM, David Gibson wrote:
> > On Wed, Feb 06, 2019 at 08:21:10AM +0100, Cédric Le Goater wrote:
> >> On 2/6/19 2:23 AM, David Gibson wrote:
> >>> On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote:
>
>
> int arch_update_cpu_topology(void)
> {
> - return numa_update_cpu_topology(true);
> + int changed = topology_changed;
> +
> + topology_changed = 0;
> + return changed;
> }
>
Do we need Powerpc override for arch_update_cpu_topology() now? That
topology_changed sometime bac
Segher Boessenkool writes:
> On Thu, Feb 07, 2019 at 10:53:13PM +1100, Michael Ellerman wrote:
>> Chandan reported that fstests' generic/026 test hit a crash:
>
>> The instruction dump decodes as:
>> subfic r6,r5,8
>> rlwinm r6,r6,3,0,28
>> ldbrx r9,0,r3
>> ldbrx r10,0,r4 <-
>>
>>
On Thu, Feb 07, 2019 at 10:13:48AM +0100, Cédric Le Goater wrote:
> On 2/7/19 3:48 AM, David Gibson wrote:
> > On Wed, Feb 06, 2019 at 08:07:36AM +0100, Cédric Le Goater wrote:
> >> On 2/6/19 2:24 AM, David Gibson wrote:
> >>> On Wed, Feb 06, 2019 at 12:23:29PM +1100, David Gibson wrote:
> On
>> With the dual mode, the interrupt mode
>> is negotiated at CAS time and when merged, the KVM device will be created
>> at reset. In case of failure, QEMU will abort.
>>
>> I am not saying it is not possible but we will need some internal
>> infrastructure to handle dynamically the fall back
On 2/8/19 6:15 AM, David Gibson wrote:
> On Thu, Feb 07, 2019 at 10:03:15AM +0100, Cédric Le Goater wrote:
>> On 2/7/19 3:49 AM, David Gibson wrote:
>>> On Wed, Feb 06, 2019 at 08:21:10AM +0100, Cédric Le Goater wrote:
On 2/6/19 2:23 AM, David Gibson wrote:
> On Tue, Feb 05, 2019 at 01:55:
76 matches
Mail list logo