Re: [PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

2017-03-14 Thread Mauricio Faria de Oliveira

On 02/20/2017 11:41 PM, Alexey Kardashevskiy wrote:

Cc: Gavin Shan 
Signed-off-by: Alexey Kardashevskiy 


Tested-by: Mauricio Faria de Oliveira 

P.S.: sorry, late, but for the record.


--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: 1M hugepage size being registered on Linux

2017-06-21 Thread Mauricio Faria de Oliveira

On 06/21/2017 07:33 AM, Michael Ellerman wrote:

I am working on a bug related to 1M hugepage size being registered on
Linux (Power 8 Baremetal - Garrison).



Wasn't that caused by a firmware bug?


Ben/Stewart, does that ring a bell, something new, intended or not? :- )

Thanks,


I was checking dmesg and it seems that 1M page size is coming from
firmware to Linux.

[0.00] base_shift=20: shift=20, sllp=0x0130, avpnm=0x,
tlbiel=0, penc=2
[1.528867] HugeTLB registered 1 MB page size, pre-allocated 0 pages

Should Linux support this page size?

Does it work?:)

The user manual says it's a supported size, but I thought it didn't work
(in hardware) for some reason.


--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH] drivers: of: increase MAX_RESERVED_REGIONS to 32

2017-10-02 Thread Mauricio Faria de Oliveira

On 09/26/2017 05:40 AM, Stewart Smith wrote:

The simple fix is to bump the length of the array to 32 which "should be
enough for everyone(TM)".


Tested-by: Mauricio Faria de Oliveira 

# uname -r
4.14.0-rc3

# dmesg
[0.00] opal: OPAL detected !
[0.00] crashkernel: memory value expected
[0.00] OF: reserved mem: not enough space all defined regions.
[0.00] OF: reserved mem: not enough space all defined regions.
[0.00] OF: reserved mem: not enough space all defined regions.
[0.00] OF: reserved mem: not enough space all defined regions.
[0.00] Allocated 2883584 bytes for 2048 pacas at cfd4
<...>

# uname -r
4.14.0-rc3.of32maxrsvdmemregions

# dmesg | head
[0.00] opal: OPAL detected !
[0.00] crashkernel: memory value expected
[0.00] Allocated 2883584 bytes for 2048 pacas at cfd4
<...>


--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: Kernel 4.15 lost set_robust_list support on POWER 9

2018-02-05 Thread Mauricio Faria de Oliveira

Nick, Michael,

On 02/05/2018 10:48 AM, Florian Weimer wrote:
7041  set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not 
implemented)


The regression was introduced by commit 371b8044 ("powerpc/64s: 
Initialize ISAv3 MMU registers before setting partition table").


The problem is Radix MMU specific (does not occur with 'disable_radix'),
and does not occur with that code reverted (ie do not set PIDR to zero).

Do you see any reasons why?
(wondering if at all related to access_ok() in include/asm/uaccess.h)

with:

# strace -e set_robust_list -f ./test
set_robust_list(0x7fffa4b03910, 24) = -1 ENOSYS (Function not 
implemented)

+++ exited with 1 +++

# uname -r
4.15.0

without:

# strace -e set_robust_list -f ./test
set_robust_list(0x7fff889c3910, 24) = 0
+++ exited with 0 +++

# uname -r
4.15.0.nopidr



Re: Kernel 4.15 lost set_robust_list support on POWER 9

2018-02-06 Thread Mauricio Faria de Oliveira

On 02/05/2018 11:06 PM, Nicholas Piggin wrote:

Does this help?

powerpc/64s/radix: allocate guard-PID for kernel contexts at boot


Yes, the test-case passes:

# strace -e set_robust_list -f ./test
set_robust_list(0x7fff8d453910, 24) = 0
+++ exited with 0 +++

# uname -r
4.15.0.guardpid

Thanks!

cheers,
Mauricio



[PATCH 1/3] rfi-flush: Move the logic to avoid a redo into the debugfs code

2018-02-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

rfi_flush_enable() includes a check to see if we're already
enabled (or disabled), and in that case does nothing.

But that means calling setup_rfi_flush() a 2nd time doesn't actually
work, which is a bit confusing.

Move that check into the debugfs code, where it really belongs.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/setup_64.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index c388cc3..3efc01a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -846,9 +846,6 @@ static void do_nothing(void *unused)
 
 void rfi_flush_enable(bool enable)
 {
-   if (rfi_flush == enable)
-   return;
-
if (enable) {
do_rfi_flush_fixups(enabled_flush_types);
on_each_cpu(do_nothing, NULL, 1);
@@ -902,13 +899,19 @@ void __init setup_rfi_flush(enum l1d_flush_type types, 
bool enable)
 #ifdef CONFIG_DEBUG_FS
 static int rfi_flush_set(void *data, u64 val)
 {
+   bool enable;
+
if (val == 1)
-   rfi_flush_enable(true);
+   enable = true;
else if (val == 0)
-   rfi_flush_enable(false);
+   enable = false;
else
return -EINVAL;
 
+   /* Only do anything if we're changing state */
+   if (enable != rfi_flush)
+   rfi_flush_enable(enable);
+
return 0;
 }
 
-- 
2.7.4



[PATCH 0/3] Setup RFI flush after PowerVM LPM migration

2018-02-14 Thread Mauricio Faria de Oliveira
This patchset allows for setup_rfi_flush() to be called again
after PowerVM LPM (live partition mobility) aka LPAR migration.

It's originally written by Michael Ellerman; I just rebased it
on top of powerpc/merge as of 2018-02-14 BRST (commit 3f6f556).

I tested it with a debug patch (sending shortly) to shortcut
the LPM functions migration_store() and post_mobility_fixup()
to just reach pseries_setup_rfi_flush(), and abuse the boot
option 'no_rfi_flush' not to allocate the fallback flush area
at boot (to simulate the migration from patched to unpatched
system).

Testing:
---

Fallback flush area allocated at boot time:

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations
[   51.793238] rfi-flush: Using fallback displacement flush
[   51.793258] rfi-flush: patched 8 locations

Fallback flush area NOT allocated at boot time:

# grep -o no_rfi_flush /proc/cmdline
no_rfi_flush

# dmesg | grep rfi-flush
[0.00] rfi-flush: disabled on command line.
[0.00] rfi-flush: re-enabled

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: disabled on command line.
[0.00] rfi-flush: re-enabled
[   31.817921] rfi-flush: Error unable to use fallback displacement flush!
[   31.817927] rfi-flush: patched 8 locations

Michael Ellerman (3):
  rfi-flush: Move the logic to avoid a redo into the debugfs code
  rfi-flush: Make it possible to call setup_rfi_flush() again
  rfi-flush: Call setup_rfi_flush() after LPM migration

 arch/powerpc/include/asm/setup.h  |  2 +-
 arch/powerpc/kernel/setup_64.c| 38 +++
 arch/powerpc/platforms/pseries/mobility.c |  3 +++
 arch/powerpc/platforms/pseries/pseries.h  |  2 ++
 arch/powerpc/platforms/pseries/setup.c|  2 +-
 5 files changed, 36 insertions(+), 11 deletions(-)

-- 
2.7.4



[PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-02-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away. If so we just fail,
we don't support migrating from a patched to an unpatched machine. Or
we do support it, but there will be no RFI flush enabled on the
destination.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/setup.h |  2 +-
 arch/powerpc/kernel/setup_64.c   | 25 +
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fd..bbcdf929 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -49,7 +49,7 @@ enum l1d_flush_type {
L1D_FLUSH_MTTRIG= 0x8,
 };
 
-void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
+void setup_rfi_flush(enum l1d_flush_type, bool enable);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3efc01a..d692f71 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -855,11 +855,22 @@ void rfi_flush_enable(bool enable)
rfi_flush = enable;
 }
 
-static void init_fallback_flush(void)
+static bool init_fallback_flush(void)
 {
u64 l1d_size, limit;
int cpu;
 
+   if (l1d_flush_fallback_area)
+   return true;
+
+   /*
+* Once the slab allocator is up it's too late to allocate the fallback
+* flush area, so return an error. This could happen if we migrated from
+* a patched machine to an unpatched machine.
+*/
+   if (slab_is_available())
+   return false;
+
l1d_size = ppc64_caches.l1d.size;
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
@@ -875,13 +886,19 @@ static void init_fallback_flush(void)
paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area;
paca[cpu].l1d_flush_size = l1d_size;
}
+
+   return true;
 }
 
-void __init setup_rfi_flush(enum l1d_flush_type types, bool enable)
+void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
if (types & L1D_FLUSH_FALLBACK) {
-   pr_info("rfi-flush: Using fallback displacement flush\n");
-   init_fallback_flush();
+   if (init_fallback_flush())
+   pr_info("rfi-flush: Using fallback displacement 
flush\n");
+   else {
+   pr_warn("rfi-flush: Error unable to use fallback 
displacement flush!\n");
+   types &= ~L1D_FLUSH_FALLBACK;
+   }
}
 
if (types & L1D_FLUSH_ORI)
-- 
2.7.4



[PATCH 3/3] rfi-flush: Call setup_rfi_flush() after LPM migration

2018-02-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

We might have migrated to a machine that uses a different flush type,
or doesn't need flushing at all.

If we migrate to a machine with no flush support, ie. that would use
fallback, we just print an error and switch flushing off. We could
support that, but it would complicate the implementation of the
fallback flush, and we don't expect anyone will ever do it.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/mobility.c | 3 +++
 arch/powerpc/platforms/pseries/pseries.h  | 2 ++
 arch/powerpc/platforms/pseries/setup.c| 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 0f7fb71..8a8033a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -348,6 +348,9 @@ void post_mobility_fixup(void)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   /* Possibly switch to a new RFI flush type */
+   pseries_setup_rfi_flush();
+
return;
 }
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 1ae1d9f..27cdcb6 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -100,4 +100,6 @@ static inline unsigned long cmo_get_page_size(void)
 
 int dlpar_workqueue_init(void);
 
+void pseries_setup_rfi_flush(void);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 372d7ad..dad8197 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -459,7 +459,7 @@ static void __init find_and_init_phbs(void)
of_pci_check_probe_only();
 }
 
-static void pseries_setup_rfi_flush(void)
+void pseries_setup_rfi_flush(void)
 {
struct h_cpu_char_result result;
enum l1d_flush_type types;
-- 
2.7.4



[PATCH] DEBUG: shortcut mobility fixup/migration store, and abuse no_rfi_flush

2018-02-14 Thread Mauricio Faria de Oliveira
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/setup_64.c| 8 
 arch/powerpc/platforms/pseries/mobility.c | 4 
 2 files changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d692f71..a05b9f4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -892,6 +892,9 @@ static bool init_fallback_flush(void)
 
 void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
+   if (no_rfi_flush)
+   types = L1D_FLUSH_NONE;
+
if (types & L1D_FLUSH_FALLBACK) {
if (init_fallback_flush())
pr_info("rfi-flush: Using fallback displacement 
flush\n");
@@ -911,6 +914,11 @@ void setup_rfi_flush(enum l1d_flush_type types, bool 
enable)
 
if (!no_rfi_flush)
rfi_flush_enable(enable);
+
+   if (no_rfi_flush) {
+   pr_info("rfi-flush: re-enabled\n");
+   no_rfi_flush = 0;
+   }
 }
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a..201710e 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -326,6 +326,7 @@ int pseries_devicetree_update(s32 scope)
 
 void post_mobility_fixup(void)
 {
+#if 0
int rc;
int activate_fw_token;
 
@@ -347,6 +348,7 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
+#endif
 
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
@@ -358,6 +360,7 @@ static ssize_t migration_store(struct class *class,
   struct class_attribute *attr, const char *buf,
   size_t count)
 {
+#if 0
u64 streamid;
int rc;
 
@@ -373,6 +376,7 @@ static ssize_t migration_store(struct class *class,
 
if (rc)
return rc;
+#endif
 
post_mobility_fixup();
return count;
-- 
2.7.4



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-02-16 Thread Mauricio Faria de Oliveira

Hi Michal and Michael,

On 02/15/2018 05:13 AM, Michal Suchánek wrote:

From: Michael Ellerman

For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away. If so we just fail,
we don't support migrating from a patched to an unpatched machine. Or
we do support it, but there will be no RFI flush enabled on the
destination.


This sounds bad to me. Either we support RFI flush or we don't.

If we do the fallback area should be allocated at boot so it is always
available. [snip]


I think the problem with this is the size of the fallback area might
have to be different between the origin and destination systems, say,
a larger L1 data cache at the destination.

In that case, the original size might not be enough to fully flush
the L1 data cache.

Michael, is that the reason it is done that way?  I thought of that,
but don't know for sure.

Thanks!
Mauricio



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-05 Thread Mauricio Faria de Oliveira

Hi Michael, Michal,

I got back from vacation. Checking this one.

On 02/20/2018 02:06 PM, Michal Suchánek wrote:

I did it the way I did because otherwise we waste memory on every
system on earth just to support a use case that we don't actually
intend for anyone to ever use - ie. migrating from a patched machine
to an unpatched machine.


If this thread eventually closes in 'ok, so that memory has to be
reserved/wasted anyway', that can be done only in pseries, right?

It seems not so much memory for this particular platform/hardware.


If you have multiple hosts running some LPMs and want to update them
without shutting down the whole thing I suppose it might easily happen
that a machine (re)started on a patched host is migrated to unpatched
host.


Right, but that should be temporary, I think -- after updating some of
the hosts, the LPAR(s) can be migrated back to one of them, where the
fallback flush is not required anymore.


I think I'm inclined to leave it the way it is, unless you feel
strongly about it Michal?



I think it would be more user friendly to either support the fallback
method 100% or remove it and require patched firmware.


I beg to disagree.  Since the matter is a security issue, the option
of still have some sort of fix that works on unpatched firmware does
look good and friendly to users (rather than require 'you _must_ get
the firmware update') IMHO.

cheers,
Mauricio



[PATCH] powerpc/64: Fix section mismatch warnings for early boot symbols

2018-03-09 Thread Mauricio Faria de Oliveira
Some of the boot code located at the start of kernel text is "init"
class, in that it only runs at boot time, however marking it as normal
init code is problematic because that puts it into a different section
located at the very end of kernel text.

e.g., in case the TOC is not set up, we may not be able to tolerate a
branch trampoline to reach the init function.

Credits: code and message are based on 2016 patch by Nicholas Piggin,
and slightly modified so not to rename the powerpc code/symbol names.

Subject: [PATCH] powerpc/64: quieten section mismatch warnings
From: Nicholas Piggin 
Date: Fri Dec 23 00:14:19 AEDT 2016

Signed-off-by: Mauricio Faria de Oliveira 
---
 scripts/mod/modpost.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 9917f92..c65d5e2 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1174,8 +1174,15 @@ static const struct sectioncheck *section_mismatch(
  *   fromsec = text section
  *   refsymname = *.constprop.*
  *
+ * Pattern 6:
+ *   powerpc64 has boot functions that reference init, but must remain in text.
+ *   This pattern is identified by
+ *   tosec   = init section
+ *   fromsym = 
+ *
  **/
-static int secref_whitelist(const struct sectioncheck *mismatch,
+static int secref_whitelist(const struct elf_info *elf,
+   const struct sectioncheck *mismatch,
const char *fromsec, const char *fromsym,
const char *tosec, const char *tosym)
 {
@@ -1212,6 +1219,17 @@ static int secref_whitelist(const struct sectioncheck 
*mismatch,
match(fromsym, optim_symbols))
return 0;
 
+   /* Check for pattern 6 */
+   if (elf->hdr->e_machine == EM_PPC64)
+   if (match(tosec, init_sections) &&
+   (!strncmp(fromsym, "__boot_from_prom",
+   strlen("__boot_from_prom")) ||
+!strncmp(fromsym, "start_here_multiplatform",
+   strlen("start_here_multiplatform")) ||
+!strncmp(fromsym, "start_here_common",
+   strlen("start_here_common")))
+   return 0;
+
return 1;
 }
 
@@ -1552,7 +1570,7 @@ static void default_mismatch_handler(const char *modname, 
struct elf_info *elf,
tosym = sym_name(elf, to);
 
/* check whitelist - we may ignore it */
-   if (secref_whitelist(mismatch,
+   if (secref_whitelist(elf, mismatch,
 fromsec, fromsym, tosec, tosym)) {
report_sec_mismatch(modname, mismatch,
fromsec, r->r_offset, fromsym,
-- 
1.8.3.1



[PATCH] powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping()

2018-03-09 Thread Mauricio Faria de Oliveira
Fix the warning messages for stop_machine_change_mapping(), and a number
of other affected functions in its call chain.

All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
is okay (keeps them / does not discard them).

Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.

$ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
...
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the 
function stop_machine_change_mapping() to the function 
.meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit 
annotation or the annotation of create_physical_mapping is wrong.

WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the 
function stop_machine_change_mapping() to the function 
.meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit 
annotation or the annotation of create_physical_mapping is wrong.
...

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/mm/mem.c  |  4 ++--
 arch/powerpc/mm/pgtable-book3s64.c |  4 ++--
 arch/powerpc/mm/pgtable-radix.c| 12 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fe8c611..85245ef 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,7 +127,7 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap 
*altmap,
bool want_memblock)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
@@ -148,7 +148,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+int __meminit arch_remove_memory(u64 start, u64 size, struct vmem_altmap 
*altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 422e802..bd6ca74 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -155,7 +155,7 @@ void mmu_cleanup_all(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int create_section_mapping(unsigned long start, unsigned long end)
+int __meminit create_section_mapping(unsigned long start, unsigned long end)
 {
if (radix_enabled())
return radix__create_section_mapping(start, end);
@@ -163,7 +163,7 @@ int create_section_mapping(unsigned long start, unsigned 
long end)
return hash__create_section_mapping(start, end);
 }
 
-int remove_section_mapping(unsigned long start, unsigned long end)
+int __meminit remove_section_mapping(unsigned long start, unsigned long end)
 {
if (radix_enabled())
return radix__remove_section_mapping(start, end);
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 2e10a96..ab9db0a 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -695,7 +695,7 @@ struct change_mapping_params {
unsigned long aligned_end;
 };
 
-static int stop_machine_change_mapping(void *data)
+static int __meminit stop_machine_change_mapping(void *data)
 {
struct change_mapping_params *params =
(struct change_mapping_params *)data;
@@ -742,7 +742,7 @@ static void remove_pte_table(pte_t *pte_start, unsigned 
long addr,
 /*
  * clear the pte and potentially split the mapping helper
  */
-static void split_kernel_mapping(unsigned long addr, unsigned long end,
+static void __meminit split_kernel_mapping(unsigned long addr, unsigned long 
end,
unsigned long size, pte_t *pte)
 {
unsigned long mask = ~(size - 1);
@@ -835,7 +835,7 @@ static void remove_pud_table(pud_t *pud_start, unsigned 
long addr,
}
 }
 
-static void remove_pagetable(unsigned long start, unsigned long end)
+static void __meminit remove_pagetable(unsigned long start, unsigned long end)
 {
unsigned long addr, next;
pud_t *pud_base;
@@ -863,12 +863,12 @@ static void remove_pagetable(unsigned long start, 
unsigned long end)
radix__flush_tlb_kernel_range(start, end);
 }
 
-int __ref radix__create_section_mapping(unsigned long start, unsigned long end)
+int __meminit radix__create_section_mapping(unsigned long start, unsigned long 
end)
 {
return create_physical_mapping(star

[PATCH v2] powerpc/64: Fix section mismatch warnings for early boot symbols

2018-03-09 Thread Mauricio Faria de Oliveira
Some of the boot code located at the start of kernel text is "init"
class, in that it only runs at boot time, however marking it as normal
init code is problematic because that puts it into a different section
located at the very end of kernel text.

e.g., in case the TOC is not set up, we may not be able to tolerate a
branch trampoline to reach the init function.

Credits: code and message are based on 2016 patch by Nicholas Piggin,
and slightly modified so not to rename the powerpc code/symbol names.

Subject: [PATCH] powerpc/64: quieten section mismatch warnings
From: Nicholas Piggin 
Date: Fri Dec 23 00:14:19 AEDT 2016

Signed-off-by: Mauricio Faria de Oliveira 
---
v2: fix missing close parenthesis in conditional (wrong patch file, sorry)

 scripts/mod/modpost.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 9917f92..c65d5e2 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1174,8 +1174,15 @@ static const struct sectioncheck *section_mismatch(
  *   fromsec = text section
  *   refsymname = *.constprop.*
  *
+ * Pattern 6:
+ *   powerpc64 has boot functions that reference init, but must remain in text.
+ *   This pattern is identified by
+ *   tosec   = init section
+ *   fromsym = 
+ *
  **/
-static int secref_whitelist(const struct sectioncheck *mismatch,
+static int secref_whitelist(const struct elf_info *elf,
+   const struct sectioncheck *mismatch,
const char *fromsec, const char *fromsym,
const char *tosec, const char *tosym)
 {
@@ -1212,6 +1219,17 @@ static int secref_whitelist(const struct sectioncheck 
*mismatch,
match(fromsym, optim_symbols))
return 0;
 
+   /* Check for pattern 6 */
+   if (elf->hdr->e_machine == EM_PPC64)
+   if (match(tosec, init_sections) &&
+   (!strncmp(fromsym, "__boot_from_prom",
+   strlen("__boot_from_prom")) ||
+!strncmp(fromsym, "start_here_multiplatform",
+   strlen("start_here_multiplatform")) ||
+!strncmp(fromsym, "start_here_common",
+   strlen("start_here_common"
+   return 0;
+
return 1;
 }
 
@@ -1552,7 +1570,7 @@ static void default_mismatch_handler(const char *modname, 
struct elf_info *elf,
tosym = sym_name(elf, to);
 
/* check whitelist - we may ignore it */
-   if (secref_whitelist(mismatch,
+   if (secref_whitelist(elf, mismatch,
 fromsec, fromsym, tosec, tosym)) {
report_sec_mismatch(modname, mismatch,
fromsec, r->r_offset, fromsym,
-- 
1.8.3.1



Re: [PATCH] powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping()

2018-03-12 Thread Mauricio Faria de Oliveira

Balbir,

On 03/11/2018 03:23 AM, Balbir Singh wrote:

Looks reasonable, I'd recommend trying to compile with MEMORY_HOTPLUG
and MEMORY_HOTREMOVE enabled/disabled as well


Thanks for reviewing.

I should have mentioned it :) I did that (disable CONFIG_MEMORY_HOTPLUG)
and all the related functions are not included in vmlinux, since they
are guarded under that config option.

cheers,
Mauricio



[PATCH v2 0/4] Setup RFI flush after PowerVM LPM migration

2018-03-12 Thread Mauricio Faria de Oliveira
This patchset allows for setup_rfi_flush() to be called again
after PowerVM LPM (live partition mobility) aka LPAR migration,
in order to possibly switch to a different flush method.

The patches are mostly from Michael Ellerman, I have rebased to
powerpc/linux.git merge branch as of commit 249d7ba (March 12th),
and added one extra commit to force init of fallback flush area
on pseries.

Testing:
---

I've disabled most code in the LPM functions migration_store()
and post_mobility_fixup() to just reach pseries_setup_rfi_flush(),
and modified pseries_setup_rfi_flush() a few times to fake the
first and non-first calls to either enable instructions or the
fallback flush method, to cover all 4 possible scenarios.

Case 1) fallback -> fallback

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations
[   20.583029] rfi-flush: Using fallback displacement flush
[   20.583038] rfi-flush: patched 8 locations

Case 2) instructions -> fallback

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using ori type flush
[0.00] rfi-flush: Using mttrig type flush
[0.00] rfi-flush: patched 8 locations

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using ori type flush
[0.00] rfi-flush: Using mttrig type flush
[0.00] rfi-flush: patched 8 locations
[   36.023759] rfi-flush: Using fallback displacement flush
[   36.023764] rfi-flush: patched 8 locations

Case 3) fallback -> instructions

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using fallback displacement flush
[0.00] rfi-flush: patched 8 locations
[   23.373037] rfi-flush: Using ori type flush
[   23.373039] rfi-flush: Using mttrig type flush
[   23.373044] rfi-flush: patched 8 locations

Case 4) instructions -> instructions

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using ori type flush
[0.00] rfi-flush: Using mttrig type flush
[0.00] rfi-flush: patched 8 locations

# echo > /sys/kernel/mobility/migration 

# dmesg | grep rfi-flush
[0.00] rfi-flush: Using ori type flush
[0.00] rfi-flush: Using mttrig type flush
[0.00] rfi-flush: patched 8 locations
[   23.243564] rfi-flush: Using ori type flush
[   23.243566] rfi-flush: Using mttrig type flush
[   23.243570] rfi-flush: patched 8 locations


Mauricio Faria de Oliveira (1):
  rfi-flush: Allow pseries to force init of fallback flush area

Michael Ellerman (3):
  rfi-flush: Move the logic to avoid a redo into the debugfs code
  rfi-flush: Make it possible to call setup_rfi_flush() again
  rfi-flush: Call setup_rfi_flush() after LPM migration

 arch/powerpc/include/asm/setup.h  |  2 +-
 arch/powerpc/kernel/setup_64.c| 22 +++---
 arch/powerpc/platforms/powernv/setup.c|  3 ++-
 arch/powerpc/platforms/pseries/mobility.c |  3 +++
 arch/powerpc/platforms/pseries/pseries.h  |  2 ++
 arch/powerpc/platforms/pseries/setup.c|  6 +++---
 6 files changed, 26 insertions(+), 12 deletions(-)

-- 
2.7.4



[PATCH v2 1/4] rfi-flush: Move the logic to avoid a redo into the debugfs code

2018-03-12 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

rfi_flush_enable() includes a check to see if we're already
enabled (or disabled), and in that case does nothing.

But that means calling setup_rfi_flush() a 2nd time doesn't actually
work, which is a bit confusing.

Move that check into the debugfs code, where it really belongs.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/setup_64.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index c388cc3..3efc01a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -846,9 +846,6 @@ static void do_nothing(void *unused)
 
 void rfi_flush_enable(bool enable)
 {
-   if (rfi_flush == enable)
-   return;
-
if (enable) {
do_rfi_flush_fixups(enabled_flush_types);
on_each_cpu(do_nothing, NULL, 1);
@@ -902,13 +899,19 @@ void __init setup_rfi_flush(enum l1d_flush_type types, 
bool enable)
 #ifdef CONFIG_DEBUG_FS
 static int rfi_flush_set(void *data, u64 val)
 {
+   bool enable;
+
if (val == 1)
-   rfi_flush_enable(true);
+   enable = true;
else if (val == 0)
-   rfi_flush_enable(false);
+   enable = false;
else
return -EINVAL;
 
+   /* Only do anything if we're changing state */
+   if (enable != rfi_flush)
+   rfi_flush_enable(enable);
+
return 0;
 }
 
-- 
2.7.4



[PATCH v2 2/4] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-12 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away (i.e., boot-time only).

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/setup.h | 2 +-
 arch/powerpc/kernel/setup_64.c   | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fd..bbcdf929 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -49,7 +49,7 @@ enum l1d_flush_type {
L1D_FLUSH_MTTRIG= 0x8,
 };
 
-void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
+void setup_rfi_flush(enum l1d_flush_type, bool enable);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3efc01a..d60e2f7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -860,6 +860,10 @@ static void init_fallback_flush(void)
u64 l1d_size, limit;
int cpu;
 
+   /* Only allocate the fallback flush area once (at boot time). */
+   if (l1d_flush_fallback_area)
+   return;
+
l1d_size = ppc64_caches.l1d.size;
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
@@ -877,7 +881,7 @@ static void init_fallback_flush(void)
}
 }
 
-void __init setup_rfi_flush(enum l1d_flush_type types, bool enable)
+void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
if (types & L1D_FLUSH_FALLBACK) {
pr_info("rfi-flush: Using fallback displacement flush\n");
-- 
2.7.4



[PATCH v2 3/4] rfi-flush: Allow pseries to force init of fallback flush area

2018-03-12 Thread Mauricio Faria de Oliveira
This ensures the fallback flush area is always allocated at boot time
on the pseries platform, so PowerVM migration to an unpatched system
can rely on the fallback flush method.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/setup.h   | 2 +-
 arch/powerpc/kernel/setup_64.c | 5 +++--
 arch/powerpc/platforms/powernv/setup.c | 3 ++-
 arch/powerpc/platforms/pseries/setup.c | 4 ++--
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index bbcdf929..efdaf10 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -49,7 +49,7 @@ enum l1d_flush_type {
L1D_FLUSH_MTTRIG= 0x8,
 };
 
-void setup_rfi_flush(enum l1d_flush_type, bool enable);
+void setup_rfi_flush(enum l1d_flush_type, bool enable, bool 
force_init_fallback);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d60e2f7..cb886a8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -881,12 +881,13 @@ static void init_fallback_flush(void)
}
 }
 
-void setup_rfi_flush(enum l1d_flush_type types, bool enable)
+void setup_rfi_flush(enum l1d_flush_type types, bool enable, bool 
force_init_fallback)
 {
if (types & L1D_FLUSH_FALLBACK) {
pr_info("rfi-flush: Using fallback displacement flush\n");
init_fallback_flush();
-   }
+   } else if (force_init_fallback)
+   init_fallback_flush();
 
if (types & L1D_FLUSH_ORI)
pr_info("rfi-flush: Using ori type flush\n");
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 092715b..a4ab8f6 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -46,6 +46,7 @@ static void pnv_setup_rfi_flush(void)
struct device_node *np, *fw_features;
enum l1d_flush_type type;
int enable;
+   bool force_init_fallback = false;
 
/* Default to fallback in case fw-features are not available */
type = L1D_FLUSH_FALLBACK;
@@ -88,7 +89,7 @@ static void pnv_setup_rfi_flush(void)
of_node_put(fw_features);
}
 
-   setup_rfi_flush(type, enable > 0);
+   setup_rfi_flush(type, enable > 0, force_init_fallback);
 }
 
 static void __init pnv_setup_arch(void)
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 1a52762..2f82bbb 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -463,7 +463,7 @@ static void pseries_setup_rfi_flush(void)
 {
struct h_cpu_char_result result;
enum l1d_flush_type types;
-   bool enable;
+   bool enable, force_init_fallback = true;
long rc;
 
/* Enable by default */
@@ -490,7 +490,7 @@ static void pseries_setup_rfi_flush(void)
types = L1D_FLUSH_FALLBACK;
}
 
-   setup_rfi_flush(types, enable);
+   setup_rfi_flush(types, enable, force_init_fallback);
 }
 
 #ifdef CONFIG_PCI_IOV
-- 
2.7.4



[PATCH v2 4/4] rfi-flush: Call setup_rfi_flush() after LPM migration

2018-03-12 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

We might have migrated to a machine that uses a different flush type,
or doesn't need flushing at all.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/mobility.c | 3 +++
 arch/powerpc/platforms/pseries/pseries.h  | 2 ++
 arch/powerpc/platforms/pseries/setup.c| 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 0f7fb71..8a8033a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -348,6 +348,9 @@ void post_mobility_fixup(void)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   /* Possibly switch to a new RFI flush type */
+   pseries_setup_rfi_flush();
+
return;
 }
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 1ae1d9f..27cdcb6 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -100,4 +100,6 @@ static inline unsigned long cmo_get_page_size(void)
 
 int dlpar_workqueue_init(void);
 
+void pseries_setup_rfi_flush(void);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 2f82bbb..360efe7 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -459,7 +459,7 @@ static void __init find_and_init_phbs(void)
of_pci_check_probe_only();
 }
 
-static void pseries_setup_rfi_flush(void)
+void pseries_setup_rfi_flush(void)
 {
struct h_cpu_char_result result;
enum l1d_flush_type types;
-- 
2.7.4



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-12 Thread Mauricio Faria de Oliveira

Hi Michael and Michal,

Got back to this; sorry for the delay.

On 03/06/2018 09:55 AM, Michal Suchánek wrote:

Michael Ellerman  wrote:



I*think*  the patch below is all we need, as well as some tweaking of
patch 2, are you able to test and repost?



Enabling the fallback flush always looks a bit dodgy but
do_rfi_flush_fixups will overwrite the jump so long any other fixup is
enabled.


I agree; the 'Using fallback displacement flush' message is misleading
(is the system slower/fallback or not? Ô_o)

The suggested patch is definitely more contained within pseries, but
the informational part seemed dodgy IMHO too.

So I wrote something with a new function parameter to force the init of
the fallback flush area (true in pseries, false in powernv).  Not that
contained, but it seemed to convey the intent here in a clear way.

That's v2, just sent.

cheers,
Mauricio



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-13 Thread Mauricio Faria de Oliveira

Hi Michael,

On 03/13/2018 08:39 AM, Michael Ellerman wrote:

I agree; the 'Using fallback displacement flush' message is misleading
(is the system slower/fallback or not? Ô_o)



That message is actually just wrong.

It still prints that even if enable=false.

So we should change all those messages, perhaps:

pr_info("rfi-flush: fallback displacement flush available\n");
pr_info("rfi-flush: ori type flush available\n");
pr_info("rfi-flush: mttrig type flush available\n");


Indeed.


So I wrote something with a new function parameter to force the init of
the fallback flush area (true in pseries, false in powernv).  Not that
contained, but it seemed to convey the intent here in a clear way.

That's v2, just sent.



OK thanks. I don't really like it :D - sorry!


No worries :) fair enough. Well, I didn't like it much, either, TBH.


It's a lot of plumbing of that bool just to avoid the message, whereas I
think we could just change the message like above.


Yup.

And what you think about a more descriptive confirmation of what flush
instructions/methods are _actually_ being used?

Currently and w/ your suggestion aobve, all that is known is what is
_available_, not what has gone in (or out, in the disable case) the
nop slots.

cheers,
mauricio



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-13 Thread Mauricio Faria de Oliveira

On 03/13/2018 02:59 PM, Michal Suchánek wrote:

Maybe it would make more sense to move the messages to the function
that actually patches in the instructions?


That helps, but if the instructions are not patched (e.g., no_rfi_flush)
then there is no information about what the system actually supports,
which is useful for diagnostics/debugging (and patch verification! :-) )

cheers,
mauricio



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-13 Thread Mauricio Faria de Oliveira

On 03/13/2018 03:36 PM, Michal Suchánek wrote:

On Tue, 13 Mar 2018 15:13:11 -0300
Mauricio Faria de Oliveira  wrote:


On 03/13/2018 02:59 PM, Michal Suchánek wrote:

Maybe it would make more sense to move the messages to the function
that actually patches in the instructions?



That helps, but if the instructions are not patched (e.g.,
no_rfi_flush) then there is no information about what the system
actually supports, which is useful for diagnostics/debugging (and
patch verification!:-)  )



Can't you patch with debugfs in that case?


For development purposes, yes, sure; but unfortunately sometimes only a
dmesg output or other offline/postmortem data is available.

And there's the user case where he is not aware/willing/allowed to use
the debugfs switch.

I still think the correct, informative messages are a good way to go :)

cheers,
mauricio



[PATCH v3 0/5] Setup RFI flush after PowerVM LPM migration

2018-03-14 Thread Mauricio Faria de Oliveira
This patchset allows for setup_rfi_flush() to be called again
after PowerVM LPM (live partition mobility) aka LPAR migration,
in order to possibly switch to a different flush method.

The patches are mostly from Michael Ellerman, I have rebased to
powerpc/linux.git merge branch as of commit d8443ef (March 14).

Testcase and results sent as the last email in the series.

v3: add patch 4 to tell flush types are 'available' and not 'Using'.
remove plumbing patch.

v2: add plumbing patch between platforms and setup code to
force init fallback flush.

Mauricio Faria de Oliveira (1):
  rfi-flush: Differentiate enabled and patched flush types

Michael Ellerman (4):
  rfi-flush: Move the logic to avoid a redo into the debugfs code
  rfi-flush: Make it possible to call setup_rfi_flush() again
  rfi-flush: Always enable fallback flush on pseries
  rfi-flush: Call setup_rfi_flush() after LPM migration

 arch/powerpc/include/asm/setup.h  |  2 +-
 arch/powerpc/kernel/setup_64.c| 25 -
 arch/powerpc/lib/feature-fixups.c |  9 -
 arch/powerpc/platforms/pseries/mobility.c |  3 +++
 arch/powerpc/platforms/pseries/pseries.h  |  2 ++
 arch/powerpc/platforms/pseries/setup.c| 12 ++--
 6 files changed, 32 insertions(+), 21 deletions(-)

-- 
2.7.4



[PATCH v3 1/5] rfi-flush: Move the logic to avoid a redo into the debugfs code

2018-03-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

rfi_flush_enable() includes a check to see if we're already
enabled (or disabled), and in that case does nothing.

But that means calling setup_rfi_flush() a 2nd time doesn't actually
work, which is a bit confusing.

Move that check into the debugfs code, where it really belongs.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/setup_64.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index c388cc3..3efc01a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -846,9 +846,6 @@ static void do_nothing(void *unused)
 
 void rfi_flush_enable(bool enable)
 {
-   if (rfi_flush == enable)
-   return;
-
if (enable) {
do_rfi_flush_fixups(enabled_flush_types);
on_each_cpu(do_nothing, NULL, 1);
@@ -902,13 +899,19 @@ void __init setup_rfi_flush(enum l1d_flush_type types, 
bool enable)
 #ifdef CONFIG_DEBUG_FS
 static int rfi_flush_set(void *data, u64 val)
 {
+   bool enable;
+
if (val == 1)
-   rfi_flush_enable(true);
+   enable = true;
else if (val == 0)
-   rfi_flush_enable(false);
+   enable = false;
else
return -EINVAL;
 
+   /* Only do anything if we're changing state */
+   if (enable != rfi_flush)
+   rfi_flush_enable(enable);
+
return 0;
 }
 
-- 
2.7.4



[PATCH v3 2/5] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away (i.e., boot-time only).

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/setup.h | 2 +-
 arch/powerpc/kernel/setup_64.c   | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fd..bbcdf92 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -49,7 +49,7 @@ enum l1d_flush_type {
L1D_FLUSH_MTTRIG= 0x8,
 };
 
-void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
+void setup_rfi_flush(enum l1d_flush_type, bool enable);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3efc01a..d60e2f7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -860,6 +860,10 @@ static void init_fallback_flush(void)
u64 l1d_size, limit;
int cpu;
 
+   /* Only allocate the fallback flush area once (at boot time). */
+   if (l1d_flush_fallback_area)
+   return;
+
l1d_size = ppc64_caches.l1d.size;
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
@@ -877,7 +881,7 @@ static void init_fallback_flush(void)
}
 }
 
-void __init setup_rfi_flush(enum l1d_flush_type types, bool enable)
+void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
if (types & L1D_FLUSH_FALLBACK) {
pr_info("rfi-flush: Using fallback displacement flush\n");
-- 
2.7.4



[PATCH v3 3/5] rfi-flush: Always enable fallback flush on pseries

2018-03-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

This ensures the fallback flush area is always allocated on pseries,
so in case a LPAR is migrated from a patched to an unpatched system,
it is possible to enable the fallback flush in the target system.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/setup.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 4642e48..b20d107 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -468,26 +468,18 @@ static void pseries_setup_rfi_flush(void)
 
/* Enable by default */
enable = true;
+   types = L1D_FLUSH_FALLBACK;
 
rc = plpar_get_cpu_characteristics(&result);
if (rc == H_SUCCESS) {
-   types = L1D_FLUSH_NONE;
-
if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2)
types |= L1D_FLUSH_MTTRIG;
if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30)
types |= L1D_FLUSH_ORI;
 
-   /* Use fallback if nothing set in hcall */
-   if (types == L1D_FLUSH_NONE)
-   types = L1D_FLUSH_FALLBACK;
-
if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) ||
(!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)))
enable = false;
-   } else {
-   /* Default to fallback if case hcall is not available */
-   types = L1D_FLUSH_FALLBACK;
}
 
setup_rfi_flush(types, enable);
-- 
2.7.4



[PATCH v3 4/5] rfi-flush: Differentiate enabled and patched flush types

2018-03-14 Thread Mauricio Faria de Oliveira
Currently the rfi-flush messages print 'Using  flush' for all
enabled_flush_types, but that is not necessarily true -- as now the
fallback flush is always enabled on pseries, but the fixup function
overwrites its nop/branch slot with other flush types, if available.

So, replace the 'Using  flush' messages with ' flush is
available'.

Also, print the patched flush types in the fixup function, so users
can know what is (not) being used (e.g., the slower, fallback flush,
or no flush type at all if flush is disabled via the debugfs switch).

Suggested-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---

P.S.: Michael, you suggested only hunk 1. please feel free to discard
  hunk 2 if you don't like it.

 arch/powerpc/kernel/setup_64.c| 6 +++---
 arch/powerpc/lib/feature-fixups.c | 9 -
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d60e2f7..4ec4a27 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -884,15 +884,15 @@ static void init_fallback_flush(void)
 void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
if (types & L1D_FLUSH_FALLBACK) {
-   pr_info("rfi-flush: Using fallback displacement flush\n");
+   pr_info("rfi-flush: fallback displacement flush available\n");
init_fallback_flush();
}
 
if (types & L1D_FLUSH_ORI)
-   pr_info("rfi-flush: Using ori type flush\n");
+   pr_info("rfi-flush: ori type flush available\n");
 
if (types & L1D_FLUSH_MTTRIG)
-   pr_info("rfi-flush: Using mttrig type flush\n");
+   pr_info("rfi-flush: mttrig type flush available\n");
 
enabled_flush_types = types;
 
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 73697c4..35f80ab 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -153,7 +153,14 @@ void do_rfi_flush_fixups(enum l1d_flush_type types)
patch_instruction(dest + 2, instrs[2]);
}
 
-   printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i);
+   printk(KERN_DEBUG "rfi-flush: patched %d locations (%s flush)\n", i,
+   (types == L1D_FLUSH_NONE)   ? "no" :
+   (types == L1D_FLUSH_FALLBACK)   ? "fallback displacement" :
+   (types &  L1D_FLUSH_ORI)? (types & L1D_FLUSH_MTTRIG)
+   ? "ori+mttrig type"
+   : "ori type" :
+   (types &  L1D_FLUSH_MTTRIG) ? "mttrig type"
+   : "unknown");
 }
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-- 
2.7.4



[PATCH v3 5/5] rfi-flush: Call setup_rfi_flush() after LPM migration

2018-03-14 Thread Mauricio Faria de Oliveira
From: Michael Ellerman 

We might have migrated to a machine that uses a different flush type,
or doesn't need flushing at all.

Signed-off-by: Michael Ellerman 
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/mobility.c | 3 +++
 arch/powerpc/platforms/pseries/pseries.h  | 2 ++
 arch/powerpc/platforms/pseries/setup.c| 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 0f7fb71..8a8033a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -348,6 +348,9 @@ void post_mobility_fixup(void)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   /* Possibly switch to a new RFI flush type */
+   pseries_setup_rfi_flush();
+
return;
 }
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index c73351c..60db2ee 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -108,4 +108,6 @@ static inline unsigned long cmo_get_page_size(void)
 
 int dlpar_workqueue_init(void);
 
+void pseries_setup_rfi_flush(void);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index b20d107..f34f908 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -459,7 +459,7 @@ static void __init find_and_init_phbs(void)
of_pci_check_probe_only();
 }
 
-static void pseries_setup_rfi_flush(void)
+void pseries_setup_rfi_flush(void)
 {
struct h_cpu_char_result result;
enum l1d_flush_type types;
-- 
2.7.4



[TESTS] Create 'enabled_flush_types' boot option, and short-circuit LPM

2018-03-14 Thread Mauricio Faria de Oliveira
fallback -> fallback:

# dmesg -c | grep rfi-flush
[0.00] rfi-flush (debug): enabled flush types: 0x0
[0.00] rfi-flush: fallback displacement flush available
[0.00] rfi-flush: patched 8 locations (fallback displacement flush)

# echo > /sys/kernel/mobility/migration 

# dmesg -c | grep rfi-flush
[   32.525443] rfi-flush: fallback displacement flush available
[   32.526269] rfi-flush: patched 8 locations (fallback displacement flush)

fallback -> instructions:

# dmesg -c | grep rfi-flush
[0.00] rfi-flush (debug): enabled flush types: 0x0
[0.00] rfi-flush: fallback displacement flush available
[0.00] rfi-flush: patched 8 locations (fallback displacement flush)

# echo 24 > /sys/kernel/debug/powerpc/rfi_flush 

# dmesg -c | grep rfi-flush
[   30.984247] rfi-flush (debug): enabled flush types: 0xc

# echo > /sys/kernel/mobility/migration 

# dmesg -c | grep rfi-flush
[   51.554357] rfi-flush: fallback displacement flush available
[   51.554360] rfi-flush: ori type flush available
[   51.554361] rfi-flush: mttrig type flush available
[   51.554366] rfi-flush: patched 8 locations (ori+mttrig type flush)

instructions -> instructions:

# dmesg -c | grep rfi-flush
[0.00] rfi-flush (debug): enabled flush types: 0xc
[0.00] rfi-flush: fallback displacement flush available
[0.00] rfi-flush: ori type flush available
[0.00] rfi-flush: mttrig type flush available
[0.00] rfi-flush: patched 8 locations (ori+mttrig type flush)

# echo > /sys/kernel/mobility/migration 

# dmesg -c | grep rfi-flush
[   55.100566] rfi-flush: fallback displacement flush available
[   55.100570] rfi-flush: ori type flush available
[   55.100571] rfi-flush: mttrig type flush available
[   55.100575] rfi-flush: patched 8 locations (ori+mttrig type flush)

instructions -> fallback:

# dmesg -c | grep rfi-flush
[0.00] rfi-flush (debug): enabled flush types: 0xc
[0.00] rfi-flush: fallback displacement flush available
[0.00] rfi-flush: ori type flush available
[0.00] rfi-flush: mttrig type flush available
[0.00] rfi-flush: patched 8 locations (ori+mttrig type flush)

# echo  > /sys/kernel/debug/powerpc/rfi_flush

# dmesg -c | grep rfi-flush
[   18.730782] rfi-flush (debug): enabled flush types: 0x0

# echo > /sys/kernel/mobility/migration 

   

# dmesg -c | grep rfi-flush
[   27.120071] rfi-flush: fallback displacement flush available
[   27.120078] rfi-flush: patched 8 locations (fallback displacement flush)

debugfs switch:

# echo 0 > /sys/kernel/debug/powerpc/rfi_flush 
# echo 1 > /sys/kernel/debug/powerpc/rfi_flush 

# dmesg -c | grep rfi-flush
[  106.031993] rfi-flush: patched 8 locations (no flush)
[  109.670966] rfi-flush: patched 8 locations (fallback displacement flush)

ori type only:

# echo 8 > /sys/kernel/debug/powerpc/rfi_flush
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush 
# echo 1 > /sys/kernel/debug/powerpc/rfi_flush 

# dmesg -c | grep rfi-flush
[  308.988958] rfi-flush (debug): enabled flush types: 0x4
[  314.206548] rfi-flush: patched 8 locations (no flush)
[  316.349916] rfi-flush: patched 8 locations (ori type flush)

mttrig type only:

# echo 16 > /sys/kernel/debug/powerpc/rfi_flush
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush 
# echo 1 > /sys/kernel/debug/powerpc/rfi_flush 

# dmesg -c | grep rfi-flush
[  355.993189] rfi-flush (debug): enabled flush types: 0x8
[  360.644291] rfi-flush: patched 8 locations (no flush)
[  365.300547] rfi-flush: patched 8 locations (mttrig type flush)

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/setup_64.c| 29 +
 arch/powerpc/platforms/pseries/mobility.c |  4 
 2 files changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 4ec4a27..9c9568e 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -816,6 +816,24 @@ static void *l1d_flush_fallback_area;
 static bool no_rfi_flush;
 bool rfi_flush;
 
+static int __init handle_enabled_flush_types(char *p)
+{
+   int rc;
+   enum l1d_flush_type types;
+
+   rc = kstrtoul(p, 0, (long unsigned int *)&types);
+   if (!rc) {
+   enabled_flush_types = types;
+   pr_info("rfi-flush (debug): enabled flush types: 0x%x\n", 
enabled_flush_types);
+   } else {
+   enabled_flush_types = L1D_FLUSH_NONE;
+   pr_info("rfi-flush (debug): enabled f

Re: [PATCH v3 1/5] rfi-flush: Move the logic to avoid a redo into the debugfs code

2018-03-16 Thread Mauricio Faria de Oliveira

Hi Murilo and Michal,

On 03/16/2018 05:52 AM, Michal Suchánek wrote:

Do we need to take into account if no_rfi_flush == true?



I think it makes sense you are able to override that using debugfs.

It's interface used for diagnostics and testing.

If this was in sysfs it would be a different story.


Yes, I agree. The debugfs is way to override the cmdline option.

Thanks for looking carefully at this :)

cheers,
Mauricio



Re: [PATCH 2/3] rfi-flush: Make it possible to call setup_rfi_flush() again

2018-03-16 Thread Mauricio Faria de Oliveira

Michael,

On 03/16/2018 11:18 AM, Michael Ellerman wrote:

I still think the correct, informative messages are a good way to go:)

Yeah I agree.

We probably want to do both, print what's available at boot, and print
what's actually patched when the patching happens.


Nice.  Not sure you had a chance to review yet, but 'PATCH v3 4/5' does
exactly that :- )

I think its implementation of the latter part looks a bit strange, but I
couldn't figure an elegant way to fit that in (either that one or string
array indexed by flush-type possible values or a long if-else chain).

I'd be happy with suggestions if it's preferred in some other way.

cheers,
Mauricio



[PATCH] powerpc/pseries: Fix to clear security feature flags

2018-03-29 Thread Mauricio Faria de Oliveira
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field
of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_* flags.

Found it by playing around with QEMU's implementation of the hypercall:

Example: 
  H_CPU_CHAR=0xf000
  H_CPU_BEHAV=0x

  This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR
  so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also clears
  H_CPU_CHAR_L1D_THREAD_PRIV flag.  So there is no RFI flush mitigation
  at all for cpu_show_meltdown() to report; but currently it does:

  Original kernel:

# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush

  Patched kernel:

# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Not affected

Example:
  H_CPU_CHAR=0x
  H_CPU_BEHAV=0xf000

  This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should
  report vulnerable; but currently it doesn't:

  Original kernel:

# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Not affected

  Patched kernel:

# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Vulnerable

Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags")
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/setup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 1f12235..b11564f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -484,13 +484,13 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
 * The features below are enabled by default, so we instead look to see
 * if firmware has *disabled* them, and clear them if so.
 */
-   if (!(result->character & H_CPU_BEHAV_FAVOUR_SECURITY))
+   if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))
security_ftr_clear(SEC_FTR_FAVOUR_SECURITY);
 
-   if (!(result->character & H_CPU_BEHAV_L1D_FLUSH_PR))
+   if (!(result->behaviour & H_CPU_BEHAV_L1D_FLUSH_PR))
security_ftr_clear(SEC_FTR_L1D_FLUSH_PR);
 
-   if (!(result->character & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR))
+   if (!(result->behaviour & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR))
security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR);
 }
 
-- 
1.8.3.1



Re: [PATCH v2 03/10] powerpc/pseries: Set or clear security feature flags

2018-03-29 Thread Mauricio Faria de Oliveira

Hi Michael,

On 03/27/2018 09:01 AM, Michael Ellerman wrote:

+   if (!(result->character & H_CPU_BEHAV_FAVOUR_SECURITY))
+   security_ftr_clear(SEC_FTR_FAVOUR_SECURITY);
+
+   if (!(result->character & H_CPU_BEHAV_L1D_FLUSH_PR))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_PR);
+
+   if (!(result->character & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR))
+   security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR);


Oops, I missed this..

The H_CPU_BEHAV flags should be checked for in 'result->behaviour'.

Just sent '[PATCH] powerpc/pseries: Fix to clear security feature flags'

cheers,
Mauricio



[PATCH 1/2] powerpc: Move default security feature flags

2018-03-30 Thread Mauricio Faria de Oliveira
This moves the definition of the default security feature flags
(i.e., enabled by default) closer to the security feature flags.

This can be used to restore current flags to the default flags.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/security_features.h | 8 
 arch/powerpc/kernel/security.c   | 7 +--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 400a905..fa4d2e1 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -63,4 +63,12 @@ static inline bool security_ftr_enabled(unsigned long 
feature)
 // Firmware configuration indicates user favours security over performance
 #define SEC_FTR_FAVOUR_SECURITY0x0200ull
 
+
+// Features enabled by default
+#define SEC_FTR_DEFAULT \
+   (SEC_FTR_L1D_FLUSH_HV | \
+SEC_FTR_L1D_FLUSH_PR | \
+SEC_FTR_BNDS_CHK_SPEC_BAR | \
+SEC_FTR_FAVOUR_SECURITY)
+
 #endif /* _ASM_POWERPC_SECURITY_FEATURES_H */
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 2cee3dc..bab5a27 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -11,12 +11,7 @@
 #include 
 
 
-unsigned long powerpc_security_features __read_mostly = \
-   SEC_FTR_L1D_FLUSH_HV | \
-   SEC_FTR_L1D_FLUSH_PR | \
-   SEC_FTR_BNDS_CHK_SPEC_BAR | \
-   SEC_FTR_FAVOUR_SECURITY;
-
+unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
 
 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, 
char *buf)
 {
-- 
1.8.3.1



[PATCH 2/2] powerpc/pseries: Restore default security feature flags on setup

2018-03-30 Thread Mauricio Faria de Oliveira
After migration the security feature flags might have changed (e.g.,
destination system with unpatched firmware), but some flags are not
set/clear again in init_cpu_char_feature_flags() because it assumes
the security flags to be the defaults.

Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then
init_cpu_char_feature_flags() does not run again, which potentially
might leave the system in an insecure or sub-optimal configuration.

So, just restore the security feature flags to the defaults assumed
by init_cpu_char_feature_flags() so it can set/clear them correctly,
and to ensure safe settings are in place in case the hypercall fail.

Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags")
Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/platforms/pseries/setup.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index b11564f..2581fc8 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -462,6 +462,10 @@ static void __init find_and_init_phbs(void)
 
 static void init_cpu_char_feature_flags(struct h_cpu_char_result *result)
 {
+   /*
+* The features below are disabled by default, so we instead look to see
+* if firmware has *enabled* them, and set them if so.
+*/
if (result->character & H_CPU_CHAR_SPEC_BAR_ORI31)
security_ftr_set(SEC_FTR_SPEC_BAR_ORI31);
 
@@ -501,6 +505,13 @@ void pseries_setup_rfi_flush(void)
bool enable;
long rc;
 
+   /*
+* Set features to the defaults assumed by init_cpu_char_feature_flags()
+* so it can set/clear again any features that might have changed after
+* migration, and in case the hypercall fails and it is not even called.
+*/
+   powerpc_security_features = SEC_FTR_DEFAULT;
+
rc = plpar_get_cpu_characteristics(&result);
if (rc == H_SUCCESS)
init_cpu_char_feature_flags(&result);
-- 
1.8.3.1



[PATCH v3] powerpc/64: Fix section mismatch warnings for early boot symbols

2018-04-05 Thread Mauricio Faria de Oliveira
Some of the boot code located at the start of kernel text is "init"
class, in that it only runs at boot time, however marking it as normal
init code is problematic because that puts it into a different section
located at the very end of kernel text.

e.g., in case the TOC is not set up, we may not be able to tolerate a
branch trampoline to reach the init function.

Credits: code and message are based on 2016 patch by Nicholas Piggin,
and slightly modified so not to rename the powerpc code/symbol names.

Subject: [PATCH] powerpc/64: quieten section mismatch warnings
From: Nicholas Piggin 
Date: Fri Dec 23 00:14:19 AEDT 2016

This resolves the following section mismatch warnings:

WARNING: vmlinux.o(.text+0x2fa8): Section mismatch in reference from the 
variable __boot_from_prom to the function .init.text:prom_init()
The function __boot_from_prom() references
the function __init prom_init().
This is often because __boot_from_prom lacks a __init
annotation or the annotation of prom_init is wrong.

WARNING: vmlinux.o(.text+0x3238): Section mismatch in reference from the 
variable start_here_multiplatform to the function .init.text:early_setup()
The function start_here_multiplatform() references
the function __init early_setup().
This is often because start_here_multiplatform lacks a __init
annotation or the annotation of early_setup is wrong.

WARNING: vmlinux.o(.text+0x326c): Section mismatch in reference from the 
variable start_here_common to the function .init.text:start_kernel()
The function start_here_common() references
the function __init start_kernel().
This is often because start_here_common lacks a __init
annotation or the annotation of start_kernel is wrong.

Signed-off-by: Mauricio Faria de Oliveira 
---
v3: reword some comments and include errors in commit message
v2: fix build error due to missing parenthesis
 scripts/mod/modpost.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 4ff08a0..d10c9d8 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1173,8 +1173,15 @@ static const struct sectioncheck *section_mismatch(
  *   fromsec = text section
  *   refsymname = *.constprop.*
  *
+ * Pattern 6:
+ *   powerpc64 has boot functions that reference init, but must remain in text.
+ *   This pattern is identified by
+ *   tosec   = init section
+ *   fromsym = __boot_from_prom, start_here_common, start_here_multiplatform
+ *
  **/
-static int secref_whitelist(const struct sectioncheck *mismatch,
+static int secref_whitelist(const struct elf_info *elf,
+   const struct sectioncheck *mismatch,
const char *fromsec, const char *fromsym,
const char *tosec, const char *tosym)
 {
@@ -1211,6 +1218,17 @@ static int secref_whitelist(const struct sectioncheck 
*mismatch,
match(fromsym, optim_symbols))
return 0;
 
+   /* Check for pattern 6 */
+   if (elf->hdr->e_machine == EM_PPC64)
+   if (match(tosec, init_sections) &&
+   (!strncmp(fromsym, "__boot_from_prom",
+   strlen("__boot_from_prom")) ||
+!strncmp(fromsym, "start_here_common",
+   strlen("start_here_common")) ||
+!strncmp(fromsym, "start_here_multiplatform",
+   strlen("start_here_multiplatform"
+   return 0;
+
return 1;
 }
 
@@ -1551,7 +1569,7 @@ static void default_mismatch_handler(const char *modname, 
struct elf_info *elf,
tosym = sym_name(elf, to);
 
/* check whitelist - we may ignore it */
-   if (secref_whitelist(mismatch,
+   if (secref_whitelist(elf, mismatch,
 fromsec, fromsym, tosec, tosym)) {
report_sec_mismatch(modname, mismatch,
fromsec, r->r_offset, fromsym,
-- 
1.8.3.1



Re: [PATCH v3] powerpc/64: Fix section mismatch warnings for early boot symbols

2018-04-10 Thread Mauricio Faria de Oliveira

On 04/09/2018 11:51 PM, Michael Ellerman wrote:

Thanks for picking this one up.

I hate to be a pain ... but before we merge this and proliferate these
names, I'd like to change the names of some of these early asm
functions. They're terribly named due to historical reasons.


Indeed :) No worries.


I haven't actually thought of good names yet though:)

I'll try and come up with some and post a patch doing the renames.


Alright. Could you please copy me on that, and I can post an update.

cheers,
Mauricio



Re: powerpc: add kernel parameter iommu_alloc_quiet

2016-09-21 Thread Mauricio Faria de Oliveira

Hi Michael,

Thanks for the review.

On 09/21/2016 10:51 AM, Michael Ellerman wrote:

That is important/requirement for the distribution kernels - where
the DMA_ATTR_NO_WARN changes to 'enum dma_attr' are not acceptable
because it breaks the kernel ABI.

[snip]
> Removing an entry from an enum would break the API (Note _P_). But I 
don't see

> how adding one does.

Agree.. I should have worded 'already-built out-of-tree modules'.

If I understand it correctly, this chunk will change the value of
DMA_ATTR_MAX, and thus break the ABI for out-of-tree modules that
depend on it. (Not that I know of any that use it, but that's the
reason that causes distro kernel builds to fail w/ this applied.)

@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_NO_WARN,
DMA_ATTR_MAX,
 };


Also kernel command line parameters are really poor in terms of usability. Folks
really don't want to have to change their kernel command line.

If we really need a hack for distros then I'd suggest we add a global struct
driver * in the powerpc iommu code, and then when nvme loads it sets that to
point at itself, and then the check becomes:


Agree w/ cmdline args are poor for usability, and that this new hack is
certainly smarter -- however, it does not provide any option/choice for
the user of whether enable/disable it (ie driver automatically sets it).

Although that isn't a problem for older downstream nvme drivers, which
have a single OK-to-fail dma mapping callsite (so the message doesn't
matter at all), the newer downstream drivers also have a Not-OK-to-fail
callsite, which is still worth to get the 'iommu_alloc failed' message
(per Ben's point of 'message useful for debugging').

For the latter, I think an upstream patch like this suggestion is not
a generally acceptable approach.

So, the intent is to have a single/common hack that upstream is OK w/,
then apply that downstream in the multiple distros.  Of course, if you
are not OK w/ this patch (which is obviously fair) we'll have to try it
downstream only, and there we can diverge in the implementations.

Please let me know your thoughts on it.

Thanks!

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: powerpc: add kernel parameter iommu_alloc_quiet

2016-10-03 Thread Mauricio Faria de Oliveira

Hi Michael,

On 09/21/2016 11:34 AM, Mauricio Faria de Oliveira wrote:

So, the intent is to have a single/common hack that upstream is OK w/,
then apply that downstream in the multiple distros.  Of course, if you
are not OK w/ this patch (which is obviously fair) we'll have to try it
downstream only, and there we can diverge in the implementations.

Please let me know your thoughts on it.


Thanks!

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



[PATCH 1/3] powerpc: export cpu_to_core_id()

2016-06-01 Thread Mauricio Faria de Oliveira
Export cpu_to_core_id().  This will be used by the lpfc driver.

Tested on next-20160601.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/smp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 55c924b..67136e7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -593,6 +593,7 @@ out:
of_node_put(np);
return id;
 }
+EXPORT_SYMBOL_GPL(cpu_to_core_id);
 
 /* Helper routines for cpu to core mapping */
 int cpu_core_index_of_thread(int cpu)
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: export cpu_to_core_id()

2016-06-01 Thread Mauricio Faria de Oliveira

Please ignore the 'PATCH 1/3' in the subject; this is a single patch.


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: export cpu_to_core_id()

2016-06-02 Thread Mauricio Faria de Oliveira

Hi Michael,

On 06/02/2016 04:41 AM, Michael Ellerman wrote:

On Wed, 2016-06-01 at 17:16 -0300, Mauricio Faria de Oliveira wrote:


Export cpu_to_core_id().  This will be used by the lpfc driver.


Can you explain why?


Yup,


I would have thought there'd be architecture neutral APIs you can use - and if
there aren't maybe we should write them.


I actually use topology_core_id() from   in lpfc [1]
(defined to cpu_to_core_id() by arch/powerpc/include/asm/topology.h).

That is arch-neutral, used by eg /sys/devices/system/cpu/cpu*/topology,
but drivers/base/topology.c is built-in (obj-y in ./Makefile), and thus
didn't need the export.

Thus, since the module uses topology_core_id() and this is defined to
cpu_to_core_id(), it needs the export:

ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Thanks,

[1] http://marc.info/?l=linux-scsi&m=146481382301686

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: export cpu_to_core_id()

2016-06-02 Thread Mauricio Faria de Oliveira

On 06/02/2016 06:00 AM, Mauricio Faria de Oliveira wrote:

I actually use topology_core_id() from   in lpfc [1]


Er,  .. kinda early here :)


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: export cpu_to_core_id()

2016-06-02 Thread Mauricio Faria de Oliveira

On 06/02/2016 08:03 AM, Michael Ellerman wrote:

Can you send me a v2 with a change log that includes all that detail.


Sure; should have done it. Thanks.


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc: export cpu_to_core_id()

2016-06-02 Thread Mauricio Faria de Oliveira
Export cpu_to_core_id().  This will be used by the lpfc driver.

This enables topology_core_id() from   (defined
to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be
used by (non-builtin) modules.

That is arch-neutral, already used by eg, drivers/base/topology.c,
but it is builtin (obj-y in Makefile) thus didn't need the export.

Since the module uses topology_core_id() and this is defined to
cpu_to_core_id(), it needs the export, otherwise:

ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Tested on next-20160601.

Changelog:
 - v2: include details about the need for this patch with regards 
   to the architecture-neutral topology API.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/smp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 55c924b..67136e7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -593,6 +593,7 @@ out:
of_node_put(np);
return id;
 }
+EXPORT_SYMBOL_GPL(cpu_to_core_id);
 
 /* Helper routines for cpu to core mapping */
 int cpu_core_index_of_thread(int cpu)
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-06-10 Thread Mauricio Faria de Oliveira
This prevents flooding the logs with 'iommu_alloc failed' messages
while I/O is performed (normally) to very fast devices (e.g. NVMe).

That error is not necessarily a problem; device drivers can retry
later / reschedule the requests for which the allocation failed,
and handle things gracefully for the caller stack on top of them.

This helps at least with NVMe devices without "64-bit"/direct DMA
window scenarios (e.g., systems with more than a few terabytes of
memory, on which DDW cannot be enabled, currently), where just an
'dd' command can trigger errors.

  # dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=512k
  <...>
  # echo $?
  0

  # dmesg
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c00151c9 npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c00151c9 npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c00151c9 npages 16
  <...>
  ppc_iommu_map_sg: 8186 callbacks suppressed
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c000fa5c npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c0010044 npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c0010044 npages 16
  <...>
  ppc_iommu_map_sg: 5707 callbacks suppressed
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c000b5f5 npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c000b5c6 npages 16
  nvme :00:06.0: iommu_alloc failed, tbl c001fa67c520 vaddr 
c000b4b3 npages 16
  <...>

Tested on next-20160609.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/iommu.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..b585bdc 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,10 +479,9 @@ int ppc_iommu_map_sg(struct device *dev, struct 
iommu_table *tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
-   dev_info(dev, "iommu_alloc failed, tbl %p "
-"vaddr %lx npages %lu\n", tbl, vaddr,
-npages);
+   dev_dbg_ratelimited(dev, "iommu_alloc failed, tbl %p "
+"vaddr %lx npages %lu\n", tbl, vaddr,
+npages);
goto failure;
}
 
@@ -776,11 +775,9 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
-   dev_info(dev, "iommu_alloc failed, tbl %p "
-"vaddr %p npages %d\n", tbl, vaddr,
-npages);
-   }
+   dev_dbg_ratelimited(dev, "iommu_alloc failed, tbl %p "
+"vaddr %p npages %d\n", tbl, vaddr,
+npages);
} else
dma_handle |= (uaddr & ~IOMMU_PAGE_MASK(tbl));
}
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-06-13 Thread Mauricio Faria de Oliveira

Hi Ben,

On 06/11/2016 08:02 PM, Benjamin Herrenschmidt wrote:

I'm not fan of this. This is a very useful message to diagnose why,
for example, your network adapter is not working properly.

A lot of drivers don't deal well with IOMMU errors.

The fact that NVME trigger these is a problem that needs to be solved
differently.


Ok, I understand your points.

Thanks for the review -- helps in setting another direction.

Kind regards,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-06-13 Thread Mauricio Faria de Oliveira

Hi Ben,

On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote:

I've been thinking about this a bit... it might be worthwhile adding
a dma_* call to query the approximate size of the IOMMU window, as
a way for the device to adjust its requirements dynamically.


Ok, cool; something like it was one of the options being discussed here.

What do you mean by 'approximate'? Maybe the size of 'free regions' in 
the pools? -- not sure because iiuic the window size is static / 2 gig,

so didn't get why (or of what) to provide an approximation (for).


Another option would be to use a dma_attr for silencing mapping errors
which NVME could use provided it does handle them gracefully ...


Ah, that's new. Interesting. Thanks for suggestion!


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-06-13 Thread Mauricio Faria de Oliveira

On 06/13/2016 06:51 PM, Benjamin Herrenschmidt wrote:

Approximate wasn't a great choice of word but what I meant is:

  - The size doesn't mean you can do an allocation that size (pools
layout etc..)



  - And it might be shared with another device (though less likely
these days).


Ah, yup - ok, now I get it; thanks.


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()

2016-07-04 Thread Mauricio Faria de Oliveira
a2d400
pcibios_free_controller QUIRK! ptr = 0123456789abcdef
rpadlpar_io: slot PHB 33 removed

# kill -9 $pid
pcibios_release_device() phb = c001dea2d400, ptr = 0123456789abcdef, 
found 0

Signed-off-by: Mauricio Faria de Oliveira 

Tested on 4.7.0-rc6.
---
 arch/powerpc/kernel/pci-common.c  |  5 +++--
 arch/powerpc/kernel/pci-hotplug.c | 22 --
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..00a22e7 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -41,7 +41,7 @@
 #include 
 #include 
 
-static DEFINE_SPINLOCK(hose_spinlock);
+DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
 /* XXX kill that some day ... */
@@ -95,10 +95,11 @@ void pcibios_free_controller(struct pci_controller *phb)
 {
spin_lock(&hose_spinlock);
list_del(&phb->list_node);
-   spin_unlock(&hose_spinlock);
 
if (phb->is_dynamic)
kfree(phb);
+
+   spin_unlock(&hose_spinlock);
 }
 EXPORT_SYMBOL_GPL(pcibios_free_controller);
 
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 2d71269..49b5388 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 
+extern spinlock_t hose_spinlock;
+extern struct list_head hose_list;
+
 static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
   struct device_node *dn)
 {
@@ -58,12 +61,27 @@ EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
  */
 void pcibios_release_device(struct pci_dev *dev)
 {
-   struct pci_controller *phb = pci_bus_to_host(dev->bus);
+   struct pci_controller *phb = pci_bus_to_host(dev->bus), *e;
+   int found = 0;
 
eeh_remove_device(dev);
 
-   if (phb->controller_ops.release_device)
+   /*
+* Only access phb if it's still in hose_list; otherwise
+* it's been freed and may contain corrupt data and oops.
+*/
+   spin_lock(&hose_spinlock);
+   list_for_each_entry(e, &hose_list, list_node) {
+   if (e == phb) {
+   found = 1;
+   break;
+   }
+   }
+
+   if (found && phb->controller_ops.release_device)
phb->controller_ops.release_device(dev);
+
+   spin_unlock(&hose_spinlock);
 }
 
 /**
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()

2016-07-05 Thread Mauricio Faria de Oliveira

On 07/04/2016 11:55 PM, Benjamin Herrenschmidt wrote:

Have you considered instead adding a kref to the PHB and only freeing
it when all devices have been freed ? Or it's too hard to tract device
creation ?


Yes, considered it, but felt leery of possibly leaving the PHB unfreed
(or block until it is freed) in the call that is supposed to remove it:

for example, if it's not freed yet (because of an userspace program
that holds references, maybe misbehaving or in error)...

- Should the call block? -- in the sense of what would it mean to the
  DLPAR tools (say, drmgr and hmc) that might see a timeout as failure,
  and perhaps won't retry it?
  Or if the administrator requested the adapter/PHB to be removed,
  maybe he's aware of the situation and actually wants the removal
  to happen anyway.

- If it should not block, the phb struct would be left there at mercy
  of the reference holders, and perhaps complicate things if the PHB
  is re-added later on? (say, bring the adapter back to the LPAR)
  Could the tools think it's already added/still present, and then
  stop in error?

Now, putting the complicated scenarios aside, implementing krefs would
touch several other parts of code that I'm less comfortable to change
at this moment, which would probably take a lot more time to do.

So, for simplicity and schedule/backport/time constraints, I've shot
for a less intrusive change that still resolves the problem.

Do you think it's an acceptable solution as is, or needs change or to
be implemented in a different way?

Thanks,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] dma, nvme, powerpc: introduce and implement DMA_ATTR_NO_WARN

2016-07-06 Thread Mauricio Faria de Oliveira
This patchset introduces dma_attr DMA_ATTR_NO_WARN (just like __GFP_NOWARN),
which tells the DMA-mapping subsystem to supress allocation failure reports.

On some architectures allocation failures are reported with error messages
to the system logs.  Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.

So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.

 - Patch 1/3 introduces and documents the new dma_attr.

 - Patch 2/3 implements it on the nvme driver (which might repeatedly trip
 on allocation failures due to high load, flooding system logs
 with error messages at least on powerpc: "iommu_alloc failed")

 - Patch 3/3 implements support for it on powerpc arch (where this problem
 was observed.  It's possible to extend support for more archs
 if the patchset is welcome).

Mauricio Faria de Oliveira (3):
  dma: introduce DMA_ATTR_NO_WARN
  nvme: implement DMA_ATTR_NO_WARN
  powerpc: implement DMA_ATTR_NO_WARN

 Documentation/DMA-attributes.txt | 17 +
 arch/powerpc/kernel/iommu.c  |  4 ++--
 drivers/nvme/host/pci.c  | 10 --
 include/linux/dma-attrs.h|  1 +
 4 files changed, 28 insertions(+), 4 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] dma: introduce DMA_ATTR_NO_WARN

2016-07-06 Thread Mauricio Faria de Oliveira
Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Signed-off-by: Mauricio Faria de Oliveira 
---
 Documentation/DMA-attributes.txt | 17 +
 include/linux/dma-attrs.h|  1 +
 2 files changed, 18 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf..841e771 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_NO_WARN
+
+
+This tells the DMA-mapping subsystem to supress allocation failure reports
+(similarly to __GFP_NOWARN).
+
+On some architectures allocation failures are reported with error messages
+to the system logs.  Although this can help to identify and debug problems,
+drivers which handle failures (eg, retry later) have no problems with them,
+and can actually flood the system logs with error messages that aren't any
+problem at all, depending on the implementation of the retry mechanism.
+
+So, this provides a way for drivers to avoid those error messages on calls
+where allocation failures are not a problem, and shouldn't bother the logs.
+
+NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index f3c5aea..0577389 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_NO_WARN,
DMA_ATTR_MAX,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-06 Thread Mauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute on dma_map_sg() calls of nvme driver.

Signed-off-by: Mauricio Faria de Oliveira 
---
 drivers/nvme/host/pci.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d1a8259..c3c3348 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,6 +66,8 @@ MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory 
buffer for I/O SQes");
 
 static struct workqueue_struct *nvme_workq;
 
+static DEFINE_DMA_ATTRS(nvme_dma_attrs);
+
 struct nvme_dev;
 struct nvme_queue;
 
@@ -498,7 +501,7 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;
 
ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir, 
&nvme_dma_attrs))
goto out;
 
if (!nvme_setup_prps(dev, req, size))
@@ -516,7 +519,7 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
if (rq_data_dir(req))
nvme_dif_remap(req, nvme_dif_prep);
 
-   if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, &iod->meta_sg, 1, dma_dir, 
&nvme_dma_attrs))
goto out_unmap;
}
 
@@ -2118,6 +2121,9 @@ static int __init nvme_init(void)
result = pci_register_driver(&nvme_driver);
if (result)
destroy_workqueue(nvme_workq);
+
+   dma_set_attr(DMA_ATTR_NO_WARN, &nvme_dma_attrs);
+
return result;
 }
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] powerpc: implement DMA_ATTR_NO_WARN

2016-07-06 Thread Mauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..69bb17f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,7 +479,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) && 
printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -776,7 +776,7 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) && 
printk_ratelimit())  {
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %p npages %d\n", tbl, vaddr,
 npages);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/3] dma, nvme, powerpc: introduce and implement DMA_ATTR_NO_WARN

2016-07-07 Thread Mauricio Faria de Oliveira
This patchset introduces dma_attr DMA_ATTR_NO_WARN (just like __GFP_NOWARN),
which tells the DMA-mapping subsystem to suppress allocation failure reports.

On some architectures allocation failures are reported with error messages
to the system logs.  Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.

So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.

 - Patch 1/3 introduces and documents the new dma_attr.

 - Patch 2/3 implements it on the nvme driver (which might repeatedly trip
 on allocation failures due to high load, flooding system logs
 with error messages at least on powerpc: "iommu_alloc failed")

 - Patch 3/3 implements support for it on powerpc arch (where this problem
 was observed.  It's possible to extend support for more archs
 if the patchset is welcome).

Changelog:
 v2:
  - address warnings from checkpatch.pl (line wrapping and typos)

Mauricio Faria de Oliveira (3):
  dma: introduce DMA_ATTR_NO_WARN
  nvme: implement DMA_ATTR_NO_WARN
  powerpc: implement DMA_ATTR_NO_WARN

 Documentation/DMA-attributes.txt | 17 +
 arch/powerpc/kernel/iommu.c  |  6 --
 drivers/nvme/host/pci.c  | 12 ++--
 include/linux/dma-attrs.h|  1 +
 4 files changed, 32 insertions(+), 4 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/3] dma: introduce DMA_ATTR_NO_WARN

2016-07-07 Thread Mauricio Faria de Oliveira
Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v2:
  - address warnings from checkpatch.pl (line wrapping and typos)

 Documentation/DMA-attributes.txt | 17 +
 include/linux/dma-attrs.h|  1 +
 2 files changed, 18 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf..48150c6 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_NO_WARN
+
+
+This tells the DMA-mapping subsystem to suppress allocation failure reports
+(similarly to __GFP_NOWARN).
+
+On some architectures allocation failures are reported with error messages
+to the system logs.  Although this can help to identify and debug problems,
+drivers which handle failures (eg, retry later) have no problems with them,
+and can actually flood the system logs with error messages that aren't any
+problem at all, depending on the implementation of the retry mechanism.
+
+So, this provides a way for drivers to avoid those error messages on calls
+where allocation failures are not a problem, and shouldn't bother the logs.
+
+NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index f3c5aea..0577389 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_NO_WARN,
DMA_ATTR_MAX,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-07 Thread Mauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute on dma_map_sg() calls of nvme driver.

Signed-off-by: Mauricio Faria de Oliveira 
Reviewed-by: Gabriel Krisman Bertazi 
---
Changelog:
 v2:
  - address warnings from checkpatch.pl (line wrapping and typos)

 drivers/nvme/host/pci.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d1a8259..a7ccad8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,6 +66,8 @@ MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory 
buffer for I/O SQes");
 
 static struct workqueue_struct *nvme_workq;
 
+static DEFINE_DMA_ATTRS(nvme_dma_attrs);
+
 struct nvme_dev;
 struct nvme_queue;
 
@@ -498,7 +501,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;
 
ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
+   &nvme_dma_attrs))
goto out;
 
if (!nvme_setup_prps(dev, req, size))
@@ -516,7 +520,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
if (rq_data_dir(req))
nvme_dif_remap(req, nvme_dif_prep);
 
-   if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, &iod->meta_sg, 1, dma_dir,
+   &nvme_dma_attrs))
goto out_unmap;
}
 
@@ -2118,6 +2123,9 @@ static int __init nvme_init(void)
result = pci_register_driver(&nvme_driver);
if (result)
destroy_workqueue(nvme_workq);
+
+   dma_set_attr(DMA_ATTR_NO_WARN, &nvme_dma_attrs);
+
return result;
 }
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/3] powerpc: implement DMA_ATTR_NO_WARN

2016-07-07 Thread Mauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v2:
  - address warnings from checkpatch.pl (line wrapping and typos)

 arch/powerpc/kernel/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..f1e20ea 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,7 +479,8 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -776,7 +777,8 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())  {
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %p npages %d\n", tbl, vaddr,
 npages);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-07 Thread Mauricio Faria de Oliveira

On 07/06/2016 09:41 PM, Gabriel Krisman Bertazi wrote:

checkpatch.pl complains about line wrapping.  Other than that, this
looks good to me.


I'll submit a v2 w/ that and typos fixed.
Thanks for reviewing.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira

On 07/08/2016 04:54 AM, Masayoshi Mizuma wrote:

Here, I think the error messages should not be suppressed because
the return value of nvme_map_data() is BLK_MQ_RQ_QUEUE_ERROR, so
the IO returns as -EIO.


Agree; good point.  fixed in v3.

Thanks for reviewing.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/3] dma, nvme, powerpc: introduce and implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
This patchset introduces dma_attr DMA_ATTR_NO_WARN (just like __GFP_NOWARN),
which tells the DMA-mapping subsystem to suppress allocation failure reports.

On some architectures allocation failures are reported with error messages
to the system logs.  Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.

So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.

 - Patch 1/3 introduces and documents the new dma_attr.

 - Patch 2/3 implements it on the nvme driver (which might repeatedly trip
 on allocation failures due to high load, flooding system logs
 with error messages at least on powerpc: "iommu_alloc failed")

 - Patch 3/3 implements support for it on powerpc arch (where this problem
 was observed.  It's possible to extend support for more archs
 if the patchset is welcome).

Changelog:
 v3:
  - nvme: use DMA_ATTR_NO_WARN when ret = BLK_MQ_RQ_QUEUE_BUSY (io will be
requeued) but not when ret = BLK_MQ_RQ_QUEUE_ERROR (io will be failed).
thanks: Masayoshi Mizuma 
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

Mauricio Faria de Oliveira (3):
  dma: introduce DMA_ATTR_NO_WARN
  nvme: implement DMA_ATTR_NO_WARN
  powerpc: implement DMA_ATTR_NO_WARN

 Documentation/DMA-attributes.txt | 17 +
 arch/powerpc/kernel/iommu.c  |  6 --
 drivers/nvme/host/pci.c  |  9 -
 include/linux/dma-attrs.h|  1 +
 4 files changed, 30 insertions(+), 3 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/3] dma: introduce DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v3:
  - dma: none.
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 Documentation/DMA-attributes.txt | 17 +
 include/linux/dma-attrs.h|  1 +
 2 files changed, 18 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf..48150c6 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_NO_WARN
+
+
+This tells the DMA-mapping subsystem to suppress allocation failure reports
+(similarly to __GFP_NOWARN).
+
+On some architectures allocation failures are reported with error messages
+to the system logs.  Although this can help to identify and debug problems,
+drivers which handle failures (eg, retry later) have no problems with them,
+and can actually flood the system logs with error messages that aren't any
+problem at all, depending on the implementation of the retry mechanism.
+
+So, this provides a way for drivers to avoid those error messages on calls
+where allocation failures are not a problem, and shouldn't bother the logs.
+
+NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index f3c5aea..0577389 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_NO_WARN,
DMA_ATTR_MAX,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute on dma_map_sg() calls of nvme driver.

Signed-off-by: Mauricio Faria de Oliveira 
Reviewed-by: Gabriel Krisman Bertazi 
---
Changelog:
 v3:
  - nvme: use DMA_ATTR_NO_WARN when ret = BLK_MQ_RQ_QUEUE_BUSY (io will be
requeued) but not when ret = BLK_MQ_RQ_QUEUE_ERROR (io will be failed).
thanks: Masayoshi Mizuma 
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 drivers/nvme/host/pci.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d1a8259..187aa6b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,6 +66,8 @@ MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory 
buffer for I/O SQes");
 
 static struct workqueue_struct *nvme_workq;
 
+static DEFINE_DMA_ATTRS(nvme_dma_attrs);
+
 struct nvme_dev;
 struct nvme_queue;
 
@@ -498,7 +501,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;
 
ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
+   &nvme_dma_attrs))
goto out;
 
if (!nvme_setup_prps(dev, req, size))
@@ -2118,6 +2122,9 @@ static int __init nvme_init(void)
result = pci_register_driver(&nvme_driver);
if (result)
destroy_workqueue(nvme_workq);
+
+   dma_set_attr(DMA_ATTR_NO_WARN, &nvme_dma_attrs);
+
return result;
 }
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/3] powerpc: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v3:
  - powerpc: none
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 arch/powerpc/kernel/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..f1e20ea 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,7 +479,8 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -776,7 +777,8 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())  {
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %p npages %d\n", tbl, vaddr,
 npages);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()

2016-07-12 Thread Mauricio Faria de Oliveira

Ben,

On 07/04/2016 11:55 PM, Benjamin Herrenschmidt wrote:

Have you considered instead adding a kref to the PHB and only freeing
it when all devices have been freed ? Or it's too hard to tract device
creation ?


Can you clarify which are the devices that should be tracked w/ krefs to
the PHB?

I've been wondering if it's just the root bus (phb->bus) -- which relays
on it (ie its phb->bus->children and phb->bus->devices) being eventually
freed in order to free the phb, or perhaps track the children & devices
directly.

If that's too far from sensible, can you point some interesting places
to look at? I've read much of arch/powerpc/kernel/pci-{common,hotplug}.c
and arch/powerpc/include/asm/pci-bridge.h, and some more in drivers/pci,
but things weren't as obvious to a newcomer in this area.

Thanks,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()

2016-07-13 Thread Mauricio Faria de Oliveira

On 07/12/2016 08:07 PM, Mauricio Faria de Oliveira wrote:

Can you clarify which are the devices that should be tracked w/ krefs to
the PHB?


Last night I had forgotten about the fundamental point of krefs - track 
references to pointers - and this answers the question.


I'm looking at the holders of pointers to the phb struct, and it seems
I wasn't too far off w/ the child buses and devices idea -- as the root 
bus (phb->bus) is assigned the phb pointer in the bus->sysdata field,

and it's inherited from parent by child buses and devices.

I'll continue here. Comments always welcome.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] dma-mapping: introduce the DMA_ATTR_NO_WARN attribute

2016-08-01 Thread Mauricio Faria de Oliveira
Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Signed-off-by: Mauricio Faria de Oliveira 
---
 Documentation/DMA-attributes.txt | 17 +
 include/linux/dma-mapping.h  |  5 +
 2 files changed, 22 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 2d455a5..98bf7ac 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_NO_WARN
+
+
+This tells the DMA-mapping subsystem to suppress allocation failure reports
+(similarly to __GFP_NOWARN).
+
+On some architectures allocation failures are reported with error messages
+to the system logs.  Although this can help to identify and debug problems,
+drivers which handle failures (eg, retry later) have no problems with them,
+and can actually flood the system logs with error messages that aren't any
+problem at all, depending on the implementation of the retry mechanism.
+
+So, this provides a way for drivers to avoid those error messages on calls
+where allocation failures are not a problem, and shouldn't bother the logs.
+
+NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 66533e1..6efbd27 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -56,6 +56,11 @@
  * that gives better TLB efficiency.
  */
 #define DMA_ATTR_ALLOC_SINGLE_PAGES(1UL << 7)
+/*
+ * DMA_ATTR_NO_WARN: This tells the DMA-mapping subsystem to suppress
+ * allocation failure reports (similarly to __GFP_NOWARN).
+ */
+#define DMA_ATTR_NO_WARN   (1UL << 8)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 0/3] dma-mapping, powerpc, nvme: introduce the DMA_ATTR_NO_WARN attribute

2016-08-01 Thread Mauricio Faria de Oliveira
This patchset introduces dma_attr DMA_ATTR_NO_WARN (just like __GFP_NOWARN),
which tells the DMA-mapping subsystem to suppress allocation failure reports.

On some architectures allocation failures are reported with error messages
to the system logs.  Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.

So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.

 - Patch 1/3 introduces the dma_attr DMA_ATTR_NO_WARN.

 - Patch 2/3 implements support for it on powerpc arch (where this problem
 was observed;  it's possible to extend support for more archs)

 - Patch 3/3 implements it on the nvme driver (which might repeatedly trip
 on allocation failures due to high load, flooding system logs
 with error messages at least on powerpc: "iommu_alloc failed")

Changelog:
 v4:
  - rebase for commit 53a4b60 dma-mapping: use unsigned long for dma_attrs.
  - reorder patches 2/3 and 3/3.
 v3:
  - nvme: use DMA_ATTR_NO_WARN when ret = BLK_MQ_RQ_QUEUE_BUSY (io will be
requeued) but not when ret = BLK_MQ_RQ_QUEUE_ERROR (io will be failed).
thanks: Masayoshi Mizuma 
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

Tested on next-20160801.

Mauricio Faria de Oliveira (3):
  dma-mapping: introduce the DMA_ATTR_NO_WARN attribute
  powerpc: implement the DMA_ATTR_NO_WARN attribute
  nvme: use the DMA_ATTR_NO_WARN attribute

 Documentation/DMA-attributes.txt | 17 +
 arch/powerpc/kernel/iommu.c  |  6 --
 drivers/nvme/host/pci.c  |  3 ++-
 include/linux/dma-mapping.h  |  5 +
 4 files changed, 28 insertions(+), 3 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] nvme: use the DMA_ATTR_NO_WARN attribute

2016-08-01 Thread Mauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme
driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR).

Signed-off-by: Mauricio Faria de Oliveira 
Reviewed-by: Gabriel Krisman Bertazi 
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d7c33f9..eeaeae0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -503,7 +503,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;
 
ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
+   DMA_ATTR_NO_WARN))
goto out;
 
if (!nvme_setup_prps(dev, req, size))
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] powerpc: implement the DMA_ATTR_NO_WARN attribute

2016-08-01 Thread Mauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 37d6e74..5f202a5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,7 +479,8 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
+   if (!(attrs & DMA_ATTR_NO_WARN) &&
+   printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -776,7 +777,8 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
+   if (!(attrs & DMA_ATTR_NO_WARN) &&
+   printk_ratelimit())  {
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %p npages %d\n", tbl, vaddr,
 npages);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-08-03 Thread Mauricio Faria de Oliveira

Hi Ben,

On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote:

Another option would be to use a dma_attr for silencing mapping errors
which NVME could use provided it does handle them gracefully ...


I recently submitted patches that implement your suggestion [1].
May you please review/comment if they're OK with you?

Thanks!

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-August/146850.html

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: convert 'iommu_alloc failed' messages to dynamic debug

2016-08-03 Thread Mauricio Faria de Oliveira

On 08/03/2016 06:34 PM, Benjamin Herrenschmidt wrote:

I think this is best done by the relevant community maintainer,
I just threw an idea but I'm not that familiar with the details:-)


Ok, sure; got it.


Did you send them to the lkml list ?


Yup, plus a few others lists from get_maintainer.pl iirc.

Mailing list archive links:
- linux-kernel: http://marc.info/?l=linux-kernel&m=146798084822100&w=2
- linux-doc: http://marc.info/?l=linux-doc&m=146798085522104&w=2
- linux-nvme: 
http://lists.infradead.org/pipermail/linux-nvme/2016-July/005349.html
- linuxppc-dev: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-July/145624.html


Thanks,


--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()

2016-08-09 Thread Mauricio Faria de Oliveira
This patchset addresses the problem/suggestion discussed previously [1],
by adding a kref reference counting to the PHB (struct pci_controller):

> It's possible to hit an oops/crash if pcibios_release_device() accesses the
> phb struct and it had been freed earlier -- by pcibios_free_controller() --
> as the memory it pointed to can be reused.

> If after reuse 'phb->controller_ops.release_device' is non-NULL it will be
> called, but it points to an invalid location (that function pointer is not
> set anywhere in the code, so if it's non-NULL, that's not correct), and so
> it hits an oops and the system crashes.

> That problem can happen with the pSeries platform's DLPAR remove operation
> if references to devices are held until after the pcibios_free_controller()
> function runs, and then released - exercising pcibios_release_device() path.

More problem details/call trace are described in the original submission [1].

With the patch applied (tested on 4.8-rc1), the test-case demonstrates that
the PHB is only released/freed after the last reference to the PCI device(s)
is dropped:

  Enable debugging messages:

  # echo 'file pci-common.c +pf; file pci-hotplug.c +pf' > 
/sys/kernel/debug/dynamic_debug/control
  # echo 8 > /proc/sys/kernel/printk

  Hold references to both PCI devices in the PHB:

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>
  
  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>
  
  # cat > /dev/sdaa & pid1=$!
  # cat > /dev/sdab & pid2=$!
  
  Perform DLPAR remove of the PHB:

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  888.776964] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  888.776983] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  893.696431] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  908.352717] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  908.352744] pcibios_remove_bus: PCI 0021:01, pci_bus c001e7d59400, phb 
c001e7d57400
  [  908.352753] controller_put: PCI domain 33, phb c001e7d57400
  [  908.352811] pcibios_free_controller: PCI domain 33, phb c001e7d57400, 
phb->is_dynamic 1
  [  908.352820] controller_put: PCI domain 33, phb c001e7d57400
  [  908.352832] rpadlpar_io: slot PHB 33 removed
  
  Notice the PHB was not freed yet (controller_free() was not called)

  Drop the last references to the PCI devices:

  # kill -9 $pid1
  [  991.221998] pcibios_release_device: PCI 0021:01:00.0, pci_dev 
c001ee0b7000, phb c001e7d57400
  [  991.222005] controller_put: PCI domain 33, phb c001e7d57400
  
  # kill -9 $pid2
  [  996.076293] pcibios_release_device: PCI 0021:01:00.1, pci_dev 
c001ee0b3800, phb c001e7d57400
  [  996.076299] controller_put: PCI domain 33, phb c001e7d57400
  [  996.076303] controller_free: PCI domain: 33, phb c001e7d57400, 
phb->is_dynamic 1

  Notice that only now the PHB was freed.

Note: this patchset currently covers references from struct pci_dev/pci_bus,
which _is_ enough to resolve this particular problem; it does not yet cover
references from struct pci_dn/eeh_pe/eeh_dev (but since those are unchanged
by/unrelated to this patchset, they remain working in the very same manner).

I have gone to great lengths in time studying the relevant code for EEH in
order to implement those too, but am not yet sure of all the details (e.g.,
lifetime of eeh_dev, removal of pci_dn, etc) that need to be considered to
kfree() them - will likely ask Gavin & maintainers for RFC after some time.

Links:
  [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-July/145264.html 

Changelog:
  v2: change approach to use krefs (suggestion by benh & mpe).

Mauricio Faria de Oliveira (2):
  powerpc: add refcount to struct pci_controller
  powerpc: update pci_controller.refcount for PCI devices and buses

 arch/powerpc/include/asm/pci-bridge.h | 15 
 arch/powerpc/kernel/pci-common.c  | 72 +--
 arch/powerpc/kernel/pci-hotplug.c | 29 ++
 arch/powerpc/kernel/pci_of_scan.c |  1 +
 4 files changed, 114 insertions(+), 3 deletions(-)

-- 
1.8.3.1



[PATCH v2 1/2] powerpc: add refcount to struct pci_controller

2016-08-09 Thread Mauricio Faria de Oliveira
This commit introduces the 'refcount' field in struct pci_controller,
along with the corresponding functions 'controller_(get|put|free)()'.

The functions 'pcibios_(alloc|free)_controller()' are modified to use
that in a backwards compatible manner. (i.e., kfree(phb) is performed
when pcibios_free_controller() is called.)

So, this patch adds the infrastructure with no functional differences
to current users of pcibios_(alloc|free)_controller().  Notably, only
the pseries platform calls pcibios_free_controller() for some purpose
other than to release the pci_controller in case of errors just after
the call to pcibios_alloc_controller() (i.e., 'goto error' scenarios).

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/pci-bridge.h | 15 +++
 arch/powerpc/kernel/pci-common.c  | 47 ---
 2 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b5e88e4..6fde0a9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -128,9 +129,23 @@ struct pci_controller {
struct pci_dn *pci_data;
 #endif /* CONFIG_PPC64 */
 
+   /*
+* Reference counting for the structures:
+* - TODO pci_dev
+* - TODO pci_bus
+* - TODO pci_dn
+* - TODO eeh_pe
+* - TODO eeh_dev
+*/
+   struct kref refcount;
+
void *private_data;
 };
 
+void controller_get(struct pci_controller *phb);
+void controller_put(struct pci_controller *phb);
+void controller_free(struct kref *kref);
+
 /* These are used for config access before all the PCI probing
has been done. */
 extern int early_read_config_byte(struct pci_controller *hose, int bus,
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 08afddf..29b37d3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -114,6 +114,8 @@ struct pci_controller *pcibios_alloc_controller(struct 
device_node *dev)
phb = zalloc_maybe_bootmem(sizeof(struct pci_controller), GFP_KERNEL);
if (phb == NULL)
return NULL;
+
+   kref_init(&phb->refcount); /* use first reference for hose_list entry */
spin_lock(&hose_spinlock);
phb->global_number = get_phb_number(dev);
list_add_tail(&phb->list_node, &hose_list);
@@ -130,12 +132,53 @@ struct pci_controller *pcibios_alloc_controller(struct 
device_node *dev)
PHB_SET_NODE(phb, nid);
}
 #endif
+
+   pr_debug("PCI domain %d, phb %p, phb->is_dynamic %d\n",
+   phb->global_number, phb, phb->is_dynamic);
+
return phb;
 }
 EXPORT_SYMBOL_GPL(pcibios_alloc_controller);
 
+void controller_get(struct pci_controller *phb)
+{
+   if (unlikely(!phb)) {
+   pr_warn("%s: null PHB; refcount bug!\n", __func__);
+   WARN_ON(1);
+   } else {
+   pr_debug("PCI domain %d, phb %p\n", phb->global_number, phb);
+   kref_get(&phb->refcount);
+   }
+}
+
+void controller_put(struct pci_controller *phb)
+{
+   if (unlikely(!phb)) {
+   pr_warn("%s: null PHB; refcount bug!\n", __func__);
+   WARN_ON(1);
+   } else {
+   pr_debug("PCI domain %d, phb %p\n", phb->global_number, phb);
+   kref_put(&phb->refcount, controller_free);
+   }
+}
+
+void controller_free(struct kref *kref)
+{
+   struct pci_controller *phb = container_of(kref, struct pci_controller,
+   refcount);
+
+   pr_info("%s: PCI domain: %d, phb %p, phb->is_dynamic %d\n",
+   __func__, phb->global_number, phb, phb->is_dynamic);
+
+   if (phb->is_dynamic)
+   kfree(phb);
+}
+
 void pcibios_free_controller(struct pci_controller *phb)
 {
+   pr_debug("PCI domain %d, phb %p, phb->is_dynamic %d\n",
+   phb->global_number, phb, phb->is_dynamic);
+
spin_lock(&hose_spinlock);
 
/* Clear bit of phb_bitmap to allow reuse of this PHB number. */
@@ -143,10 +186,8 @@ void pcibios_free_controller(struct pci_controller *phb)
clear_bit(phb->global_number, phb_bitmap);
 
list_del(&phb->list_node);
+   controller_put(phb);
spin_unlock(&hose_spinlock);
-
-   if (phb->is_dynamic)
-   kfree(phb);
 }
 EXPORT_SYMBOL_GPL(pcibios_free_controller);
 
-- 
1.8.3.1



[PATCH v2 2/2] powerpc: update pci_controller.refcount for PCI devices and buses

2016-08-09 Thread Mauricio Faria de Oliveira
This patch employs the refcount in struct pci_controller to track
the references from PCI devices and buses (struct pci_dev/pci_bus).

In order to do that without modifying any PCI scan/probe approach
(e.g., PCI_PROBE_DEVTREE and PCI_PROBE_NORMAL), it leverages some
of the PCI arch-specific callback: pci_(add|release)_device() and
pci_(add|remove)_bus().

(a small change is required for PCI_PROBE_DEVTREE, which makes it
consistent with PCI_PROBE_NORMAL - the pci_dev should inherit the
parent pci_bus's phb pointer - see pci_setup_device() in probe.c)

This also has the advantage that locking for kref_(get|put)() is
satisfied by the 'pci_rescan_remove_lock' mutex, which is normal
practice for usage of the PCI subsystem - thus already in place.
More details added in comment on pcibios_release_device().

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/pci-common.c  | 25 +
 arch/powerpc/kernel/pci-hotplug.c | 29 +
 arch/powerpc/kernel/pci_of_scan.c |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 6fde0a9..d10eee3 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -131,8 +131,8 @@ struct pci_controller {
 
/*
 * Reference counting for the structures:
-* - TODO pci_dev
-* - TODO pci_bus
+* - pci_dev
+* - pci_bus
 * - TODO pci_dn
 * - TODO eeh_pe
 * - TODO eeh_dev
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 29b37d3..c55e9c0 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1047,6 +1047,17 @@ static void pcibios_setup_device(struct pci_dev *dev)
 
 int pcibios_add_device(struct pci_dev *dev)
 {
+   struct pci_controller *phb = pci_bus_to_host(dev->bus);
+
+   pr_debug("PCI %s, pci_dev %p, phb %p\n", dev_name(&dev->dev), dev, phb);
+
+   if (!phb)
+   pr_warn("%s: PCI device %s has null PHB; refcount bug!",
+   __func__, dev_name(&dev->dev)); /* WARN_ON ahead */
+
+   /* locking: see comment on pcibios_release_device(). */
+   controller_get(phb);
+
/*
 * We can only call pcibios_setup_device() after bus setup is complete,
 * since some of the platform specific DMA setup code depends on it.
@@ -1062,6 +1073,20 @@ int pcibios_add_device(struct pci_dev *dev)
return 0;
 }
 
+void pcibios_add_bus(struct pci_bus *bus)
+{
+   struct pci_controller *phb = pci_bus_to_host(bus);
+
+   pr_debug("PCI %s, pci_bus %p, phb %p\n", dev_name(&bus->dev), bus, phb);
+
+   if (!phb)
+   pr_warn("%s: PCI bus %s has null PHB; refcount bug!",
+   __func__, dev_name(&bus->dev)); /* WARN_ON ahead */
+
+   /* locking: see comment on pcibios_release_device(). */
+   controller_get(phb);
+}
+
 void pcibios_setup_bus_devices(struct pci_bus *bus)
 {
struct pci_dev *dev;
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 2d71269..24d1a2a 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -55,15 +55,44 @@ EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
  * @dev: PCI device
  *
  * The function is called before releasing the indicated PCI device.
+ *
+ * The locking for kref_get() and kref_put() of the PHB/pci_controller
+ * in pcibios_(add|release)_device() and pcibios_(add|remove)_bus() is
+ * satisfied by the pci_rescan_remove_lock mutex (required for rescan/
+ * remove paths of PCI devices/buses; the scan path doesn't require it,
+ * as there is only addition of devices/buses - no removal at all.)
  */
 void pcibios_release_device(struct pci_dev *dev)
 {
struct pci_controller *phb = pci_bus_to_host(dev->bus);
 
+   pr_debug("PCI %s, pci_dev %p, phb %p\n", dev_name(&dev->dev), dev, phb);
+
eeh_remove_device(dev);
 
if (phb->controller_ops.release_device)
phb->controller_ops.release_device(dev);
+
+   if (unlikely(!phb))
+   pr_warn("%s: PCI device %s has null PHB; refcount bug!",
+   __func__, dev_name(&dev->dev)); /* WARN_ON ahead */
+
+   /* locking: see comment on pcibios_release_device(). */
+   controller_put(phb);
+}
+
+void pcibios_remove_bus(struct pci_bus *bus)
+{
+   struct pci_controller *phb = pci_bus_to_host(bus);
+
+   pr_debug("PCI %s, pci_bus %p, phb %p\n", dev_name(&bus->dev), bus, phb);
+
+   if (unlikely(!phb))
+   pr_warn("%s: PCI bus %s has null PHB; refcount bug!",
+   __func__, dev_name(&a

Re: [PATCH v2 1/2] powerpc: add refcount to struct pci_controller

2016-08-10 Thread Mauricio Faria de Oliveira

On 08/09/2016 10:45 PM, Andrew Donnellan wrote:

[snip] Notably, only
the pseries platform calls pcibios_free_controller() for some purpose
other than to release the pci_controller in case of errors just after
the call to pcibios_alloc_controller() (i.e., 'goto error' scenarios).


cxl's vPHB API also uses pcibios_free_controller() [snip]


Cool. I see I missed this report line from grep; thanks. I was mostly
biased at arch/powerpc/ and driver/pci/ these days.


I'm currently working on a cxl defect found by an IBM test team where we
run into this - will review this patch more thoroughly and test it shortly.


That's great; thanks!

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v2 2/2] powerpc: update pci_controller.refcount for PCI devices and buses

2016-08-10 Thread Mauricio Faria de Oliveira

On 08/10/2016 12:35 AM, Andrew Donnellan wrote:

 if (phb->controller_ops.release_device)
 phb->controller_ops.release_device(dev);
+
+if (unlikely(!phb))
+pr_warn("%s: PCI device %s has null PHB; refcount bug!",
+__func__, dev_name(&dev->dev)); /* WARN_ON ahead */


This should probably go before trying to dereference phb->controller_ops
above?


You're right; I misplaced this check; will fix that.

Just a bit of explanation/history:

While trying to understand why I didn't hit that null dereference
when I initially hit the WARN_ON (the reason for the 'small change'
in the commit description), I found that back then I checked for
'pci_dev->sysdata' directly (not 'phb' -- early stages of the patch).

The former indeed was null, as it didn't inherit 'pci_bus->sysdata'
on pseries, but the code uses 'phb = dev->bus->sysdata' (and this
was not null as pci_bus->sysdata was actually set).

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v2 1/2] powerpc: add refcount to struct pci_controller

2016-08-10 Thread Mauricio Faria de Oliveira

On 08/09/2016 10:45 PM, Andrew Donnellan wrote:

I'm currently working on a cxl defect found by an IBM test team where we
run into this - will review this patch more thoroughly and test it shortly.


Gavin provided a review/suggestions via chat, pointing to rely on the
refcount that already exists in the PCI subsystem (not reinvent another)
and leverage the release of the PCI root bus -- which is much simpler!

He replied there should be no problems w/ the EEH reset path (PCI root
bus not released) nor w/ other structs w/ refs to the PHB (PCI DN, EEH
PE, EEH DEV).

I'll go down that path for a PATCH v3.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



[PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-10 Thread Mauricio Faria de Oliveira
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).

The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped).  Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).

It introduces the 'pcibios_host_bridge_release()' function to be set
as .release_fn(), which expects .release_data to hold the pointer to
the pci_controller to kfree().

It enables that functionality for pseries (although it isn't platform
-specific, and may be used by cxl). It keeps pcibios_free_controller()
backwards-compatible (i.e., kfree(phb) in it) in case no .release_fn()
is defined for the pci_controller.

Details on not-so-elegant design choices:

 - Added 'pci_controller.bridge' field (pointer to associated 'struct
   pci_host_bridge') so *not* to use 'pci_find_host_bridge(phb->bus)'
   in pcibios_free_controller().

   That's because remove_phb_dynamic() sets 'phb->bus = NULL' before
   pcibios_free_controller().  That seems to be very important, with
   commit title 'powerpc/pci: Fix various pseries PCI hotplug issues'
   (so I'll not remove it just to avoid this null pointer dereference).

 - Used 'pci_host_bridge.release_data' field (pointer to associated
   'struct pci_controller') so *not* to 'pci_bus_to_host(bridge->bus)'
   in pcibios_host_bridge_release().

   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
   (so, if the last reference is released after pci_remove_root_bus()
   runs, which eventually reaches pcibios_host_bridge_release(), that
   would hit a null pointer dereference).

   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
   are interested in this fix.

Test-case:

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>

  # cat >/dev/sdaa & pid1=$!
  # cat >/dev/sdab & pid2=$!

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  479.547020] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  479.547049] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  483.536303] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  497.072130] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  497.072209] rpadlpar_io: slot PHB 33 removed

  # kill -9 $pid1
  # kill -9 $pid2
  [  506.604458] pcibios_host_bridge_release: domain 33, dynamic 1

Suggested-By: Gavin Shan 
Signed-off-by: Mauricio Faria de Oliveira 

Changelog:
 - v3: different approach: struct pci_host_bridge.release_fn()
 - v2: different approach: struct pci_controller.refcount
---
 arch/powerpc/include/asm/pci-bridge.h |  2 ++
 arch/powerpc/kernel/pci-common.c  | 15 ++-
 arch/powerpc/platforms/pseries/pci.c  |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b5e88e4..9b11631 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -54,6 +54,7 @@ struct pci_controller_ops {
  */
 struct pci_controller {
struct pci_bus *bus;
+   struct pci_host_bridge *bridge; /* associated 'PHB' in PCI subsystem */
char is_dynamic;
 #ifdef CONFIG_PPC64
int node;
@@ -301,6 +302,7 @@ extern void pci_process_bridge_OF_ranges(struct 
pci_controller *hose,
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
+extern void pcibios_host_bridge_release(struct pci_host_bridge *bridge);
 
 #ifdef CONFIG_PCI
 extern int pcibios_vaddr_is_ioport(void __iomem *address);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a5c0153..c5b5f60 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -145,11 +145,23 @@ void pcibios_free_controller(struct pci_controller *phb)
list_del(&phb->list_node);
spin_unlock(&hose_spinlock);
 
-   if (phb->is_dynamic)
+   /* if the associated pci_host_bridge has a release_fn(), rely on that. 
*/
+   if (!phb->bridge->release_fn && phb->is_dynamic)
kfree(phb);
 }
 EXP

Re: [PATCH v2 1/2] powerpc: add refcount to struct pci_controller

2016-08-10 Thread Mauricio Faria de Oliveira

On 08/10/2016 10:53 AM, Mauricio Faria de Oliveira wrote:

I'll go down that path for a PATCH v3.


That is,
'powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)'

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-10 Thread Mauricio Faria de Oliveira

On 08/10/2016 06:45 PM, Mauricio Faria de Oliveira wrote:

Changelog:
 - v3: different approach: struct pci_host_bridge.release_fn()
 - v2: different approach: struct pci_controller.refcount


Oops, the v3 submission has no cover letter, so the subject changed
a bit from what was in v2.  This is the old subject & archive link:

[PATCH v2 0/2] powerpc: fix oops in pcibios_release_device() after 
pcibios_free_controller() [1]


https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-August/147351.html

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira

On 08/11/2016 02:01 AM, Andrew Donnellan wrote:

In cxl, we currently call:

pci_remove_root_bus(phb->bus);
pcibios_free_controller(phb);

which appears to break with this patch after I wire up
pci_set_host_bridge_release() in cxl, as phb can be freed before we call
pcibios_free_controller().


Ugh; you're right. I believe the user is expected to use either one way
or another, but now I see it's not that intuitive -- a design fault.

I'll address this w/ the other review/suggestion by Gavin; replying it.


Missing a '---' here :)


Changelog:


Ok, thanks!


--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira

Hi Gavin,

tl;dr: thanks for the comments & suggestions; i'll submit v4.

On 08/11/2016 03:40 AM, Gavin Shan wrote: [added some line breaks]

It seems the user has two options here:



(1) Setup bridge's release_fn() and call
pcibios_free_controller() explicitly;


I think the v3 design was non-intuitive in that point -- it does not
seem right for an user to use both options:

if release_fn() is set and is called before pcibios_free_controller()
(normal case w/ DLPAR/PCI hotplug/cxl, as buses/devices are supposed
to be removed before the controller is released) the latter will use
an invalid 'phb' pointer. (what Andrew reported)

In that scenario, it's not even possible for pcibios_free_controller()
to try to detect if release_fn() was already run or not, as the only
information it has is the 'phb' pointer, which may be invalid.

So, I believe the elegant way out of this is your suggestion to have
"immediate or deferred release" and make the user *choose* either one.

Obviously, let's make this explicit to the user -- w/ rename & comments.

> (2) Call pcibios_free_controller() without

a valid bridge's release_fn() initialized.


Ok, that looks legitimate for those using immediate release (default).

i.e., once an user decides to use deferred released, it's understood
that pcibios_free_controller() should not be called.

> I think we can provide better interface

to users: what we do in pcibios_free_controller() and 
pcibios_host_bridge_release()
should be (almost) same. pcibios_host_bridge_release() can be a wrapper of
pcibios_free_controller().


Right; I implemented only kfree() in pcibios_host_bridge_release()
because I was focused on when it runs *after* pcibios_free_controller();
but it turns out that if it runs *before*, phb becomes invalid pointer.

So, you're right -- both functions are expected to have the same effect
(slightly different code), that is all of what pcibios_free_controller()
does.  The only difference should be the timing. (good point on wrapper)

> With this, the users have two options: (1) Rely on bridge's

release_fn() to free the PCI controller; (2) Call pcibios_free_controller() as 
we're
doing currently. Those two options corresponds to immediately or deferred 
releasing.


Looks very good.  I'll submit a v4 like this:
-rename pcibios_host_bridge_release()/pcibios_free_controller_deferred()
-add comments about using _either_ one or another
-pcibios_free_controller_deferred() calls pcibios_free_controller().

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



[PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).

The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped).  Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).

It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.

The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.

The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).

Details on not-so-elegant design choices:

 - Use 'pci_host_bridge.release_data' field as pointer to associated
   'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
   in pcibios_free_controller_deferred().

   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
   (so, if the last reference is released after pci_remove_root_bus()
   runs, which eventually reaches pcibios_free_controller_deferred(),
   that would hit a null pointer dereference).

   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
   are interested in this fix.

Test-case #1 (hold references)

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>

  # cat >/dev/sdaa & pid1=$!
  # cat >/dev/sdab & pid2=$!

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  594.306738] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  598.236381] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  611.972140] rpadlpar_io: slot PHB 33 removed

  # kill -9 $pid1
  # kill -9 $pid2
  [  632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1

Test-case #2 (don't hold references)

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  916.357386] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  920.566527] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
  [  933.955999] rpadlpar_io: slot PHB 33 removed

Suggested-By: Gavin Shan 
Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 - v4: improve usability/design/documentation:
   - rename function to pcibios_free_controller_deferred()
   - from function call pcibios_free_controller()
   - no more struct pci_controller.bridge field
   thanks: Gavin Shan, Andrew Donnellan
 - v3: different approach: struct pci_host_bridge.release_fn()
 - v2: different approach: struct pci_controller.refcount 

 arch/powerpc/include/asm/pci-bridge.h  |  1 +
 arch/powerpc/kernel/pci-common.c   | 36 ++
 arch/powerpc/platforms/pseries/pci.c   |  4 
 arch/powerpc/platforms/pseries/pci_dlpar.c |  7 --
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b5e88e4..c0309c5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -301,6 +301,7 @@ extern void pci_process_bridge_OF_ranges(struct 
pci_controller *hose,
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
+extern void pcibios_free_controller_deferred(struct pci_host_bridge *bridge);
 
 #ifdef CONFIG_PCI
 extern int pcibios_vaddr_is_ioport(void __iomem *address);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a5c0153..8c48a78 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -151,6 +151,42 @@ void pcibios_free_controller(struct pci_controller *phb)
 EXPORT_SYMBOL_GPL(pcibios_free_controller);
 
 /*
+ * This function is used to call pcibios_free_controller()
+ * in a deferred manner: a callback from the PCI subsystem.
+ 

Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-12 Thread Mauricio Faria de Oliveira

On 08/12/2016 03:03 AM, Andrew Donnellan wrote:

On 12/08/16 15:54, Gavin Shan wrote:

It might be nicer for users to implement their own
pcibios_free_controller_deferred(),
meaning pSeries needs its own implementation for now. The reason is
more user (pSeries)
specific objects can be released together with the PHB. However, I'm
still fine without
the comment to be covered.


That's probably not a bad idea, though from a cxl perspective I'm fine
with using the current version.


I agree, but in that case the user _still_ can use another function.
This patch just provides an implementation that defaults to what was
already done, in a deferred manner.

If some users need something more specific, they can wire it up too :)
the same way we did - and it's explained in the function's comments.

Thanks for the review, suggestions and reaching a better and much more
interesting patch.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-12 Thread Mauricio Faria de Oliveira

Michael,

On 08/12/2016 02:54 AM, Gavin Shan wrote:
> I don't have more obvious comments except below one nitpicky:

I just addressed/replied this in the other e-mail; i think it's OK,
and Andrew/cxl is OK w/ it too.

> Reviewed-by: Gavin Shan 


On 08/12/2016 03:06 AM, Andrew Donnellan wrote:

Suggested-By: Gavin Shan 
Signed-off-by: Mauricio Faria de Oliveira 


Reviewed-by: Andrew Donnellan 
Tested-by: Andrew Donnellan  # cxl

Does this justify a Cc: stable?


I'd think so, since this prevents an oops & crash (w/ panic_on_oops).
Do you agree?

Thanks!

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



[PATCH] powerpc: add kernel parameter iommu_alloc_quiet

2016-09-01 Thread Mauricio Faria de Oliveira
This patch introduces the 'iommu_alloc_quiet=driver_name' parameter
to suppress the 'iommu_alloc failures' messages for that one driver.

This is an additional approach for this 'problem' of flooding logs,
not as fine-grained and not enabled by default as DMA_ATTR_NO_WARN,
but it has the advantage that it doesn't introduce any ABI changes.

That is important/requirement for the distribution kernels - where
the DMA_ATTR_NO_WARN changes to 'enum dma_attr' are not acceptable
because it breaks the kernel ABI.

Tested on next-20160825 + nvme changed not to use DMA_ATTR_NO_WARN.

 - test-case: default / no iommu_alloc_quiet
 - result:messages occur

# dmesg -c | grep iommu_alloc_quiet
#

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c
<...>
[   31.753230] nvme :00:06.0: iommu_alloc failed, tbl c003f7080c00 
vaddr c0022bf3 npages 16

 - test-case: iommu_alloc_quiet=(null)
 - result:messages occur

# dmesg -c | grep iommu_alloc_quiet
[0.00] Kernel command line: root=<...> ro disable_ddw 
iommu_alloc_quiet=
[0.00] iommu_alloc_quiet: driver ''

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c
<...>
[   29.032848] nvme :00:06.0: iommu_alloc failed, tbl c003f7190c00 
vaddr c00238fc npages 16

 - test-case: iommu_alloc_quiet=(length overflow)
 - result:messages occur

# dmesg -c | grep iommu_alloc_quiet
[0.00] Kernel command line: root=<...> ro disable_ddw 
iommu_alloc_quiet=0123456789abcdef0123456789abcdef
[0.00] iommu_alloc_quiet: driver '0123456789abcde'

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c
<...>
[   54.913279] nvme :00:06.0: iommu_alloc failed, tbl c003f7120c00 
vaddr c0022d96 npages 16

 - test-case: iommu_alloc_quiet=nvme
 - result:messages do not occur

# dmesg -c | grep iommu_alloc_quiet
[0.00] Kernel command line: root=<...> ro disable_ddw 
iommu_alloc_quiet=nvme
[0.00] iommu_alloc_quiet: driver 'nvme'

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c

# dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=128k 2>/dev/null; dmesg -c

Signed-off-by: Mauricio Faria de Oliveira 
---
 arch/powerpc/kernel/iommu.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 5f202a5..8524d91 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -65,6 +65,23 @@ static int __init setup_iommu(char *str)
 
 __setup("iommu=", setup_iommu);
 
+/*
+ * iommu_alloc_quiet: string with one driver name
+ * not to print 'iommu_alloc failed' messages for.
+ */
+#define IOMMU_ALLOC_QUIET_LEN  16 /* includes '\0' */
+static char iommu_alloc_quiet[IOMMU_ALLOC_QUIET_LEN];
+
+static int __init setup_iommu_alloc_quiet(char *str)
+{
+   strncpy(iommu_alloc_quiet, str, IOMMU_ALLOC_QUIET_LEN);
+   iommu_alloc_quiet[IOMMU_ALLOC_QUIET_LEN - 1] = '\0';
+   pr_info("iommu_alloc_quiet: driver '%s'\n", iommu_alloc_quiet);
+   return 1;
+}
+
+__setup("iommu_alloc_quiet=", setup_iommu_alloc_quiet);
+
 static DEFINE_PER_CPU(unsigned int, iommu_pool_hash);
 
 /*
@@ -479,8 +496,8 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (!(attrs & DMA_ATTR_NO_WARN) &&
-   printk_ratelimit())
+   if (strncmp(iommu_alloc_quiet, dev->driver->name, 
IOMMU_ALLOC_QUIET_LEN) &&
+   !(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -777,8 +794,8 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (!(attrs & DMA_ATTR_NO_WARN) &&
-   printk_ratelimit())  {
+   if (strncmp(iommu_alloc_q

Re: [PATCH] powerpc: add kernel parameter iommu_alloc_quiet

2016-09-01 Thread Mauricio Faria de Oliveira

Michael / Ben,

On 09/01/2016 09:56 AM, Mauricio Faria de Oliveira wrote:

+#define IOMMU_ALLOC_QUIET_LEN  16 /* includes '\0' */


Guilherme suggested MAX_PARAM_PREFIX_LEN for this, which looks
better (a few extra bytes).

Would you mind to s/16/MAX_PARAM_PREFIX_LEN/ if you like that?
I can send a v2 too; just let me know.

Thanks,

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH] powerpc: add kernel parameter iommu_alloc_quiet

2016-09-01 Thread Mauricio Faria de Oliveira

On 09/01/2016 10:39 AM, Torsten Duwe wrote:

JFYI, my strongly preferred solution would still be to just dev_dbg() the whole 
thing.
Which group of people would be interested in these messages, after all?


Certainly understandable.

Ben didn't like the idea to convert the messages to dynamic debug [1],
(with which I agree/understand nowadays) and suggested us to look for
a different approach.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-June/144196.html

--
Mauricio Faria de Oliveira
IBM Linux Technology Center