[PATCH 04/86] powerpc/pci: use uapi/linux/pci_ids.h directly
Header moved from linux/pci_ids.h to uapi/linux/pci_ids.h, use the new header directly so we can drop the wrapper in include/linux/pci_ids.h. Signed-off-by: Michael S. Tsirkin --- arch/powerpc/platforms/embedded6xx/mpc10x.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/embedded6xx/mpc10x.h b/arch/powerpc/platforms/embedded6xx/mpc10x.h index b290b63..4175800 100644 --- a/arch/powerpc/platforms/embedded6xx/mpc10x.h +++ b/arch/powerpc/platforms/embedded6xx/mpc10x.h @@ -13,7 +13,7 @@ #ifndef __PPC_KERNEL_MPC10X_H #define __PPC_KERNEL_MPC10X_H -#include +#include #include /* @@ -35,7 +35,7 @@ /* * Define the vendor/device IDs for the various bridges--should be added to - * + * */ #defineMPC10X_BRIDGE_106 ((PCI_DEVICE_ID_MOTOROLA_MPC106 << 16) | \ PCI_VENDOR_ID_MOTOROLA) -- MST ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 33/86] edac/pasemi: use uapi/linux/pci_ids.h directly
Header moved from linux/pci_ids.h to uapi/linux/pci_ids.h, use the new header directly so we can drop the wrapper in include/linux/pci_ids.h. Signed-off-by: Michael S. Tsirkin --- drivers/edac/pasemi_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/edac/pasemi_edac.c b/drivers/edac/pasemi_edac.c index 9c971b5..cd0da08 100644 --- a/drivers/edac/pasemi_edac.c +++ b/drivers/edac/pasemi_edac.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include "edac_core.h" -- MST ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 43/86] macintosh: use uapi/linux/pci_ids.h directly
Header moved from linux/pci_ids.h to uapi/linux/pci_ids.h, use the new header directly so we can drop the wrapper in include/linux/pci_ids.h. Signed-off-by: Michael S. Tsirkin --- drivers/macintosh/macio_asic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c index 4f12c6f..ccf9547 100644 --- a/drivers/macintosh/macio_asic.c +++ b/drivers/macintosh/macio_asic.c @@ -20,7 +20,7 @@ #include #include #include -#include +#include #include #include #include -- MST ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 51/86] cxl: use uapi/linux/pci_ids.h directly
Header moved from linux/pci_ids.h to uapi/linux/pci_ids.h, use the new header directly so we can drop the wrapper in include/linux/pci_ids.h. Signed-off-by: Michael S. Tsirkin --- drivers/misc/cxl/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 1ef0164..8f2a6c8 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -8,7 +8,7 @@ */ #include -#include +#include #include #include #include -- MST ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V4, 1/4] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
Michael Ellerman writes: > On Thu, 2015-26-03 at 13:13:39 UTC, "Aneesh Kumar K.V" wrote: >> pte can get updated from other CPUs as part of multiple activities >> like THP split, huge page collapse, unmap. We need to make sure we >> don't reload the pte value again and again for different checks. >> >> --- >> arch/powerpc/include/asm/kvm_book3s_64.h | 5 - >> arch/powerpc/kvm/e500_mmu_host.c | 20 >> 2 files changed, 16 insertions(+), 9 deletions(-) > > So this series is partly KVM but mostly powerpc. > > I assume you can't split it into two separate series easily? Yes, will do. But before that I was looking for feedback from Ben or Paul. > > You haven't sent it to the KVM lists or to Alex AFAICS. You'll need to do that > for the KVM pieces at least. We can probably take it all via powerpc, but I'll > need an ACK from Alex at least. > If I can get an ack from Ben, that makes it easy for the kvm list. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
FW: Finally Working RadeonHD 6570 2Gb on Quad G5
just need to share to developer the success we face today the Tursk is running in full 2D and 3D on Linux PPC (lubuntu 14.04.2 kernel 4.0.0-rc3) on Quad G5 Need to say a big thankyou to Christian (Xeno74) Zigotzky I share a video for the community https://www.youtube.com/watch?v=MYgFfRLYhLU&list=UUkb4bw4N19d-x_tn2FXLojQ Great Job to all the devs! Luigi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V4,1/4] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
On Sun, 2015-03-29 at 22:16 +0530, Aneesh Kumar K.V wrote: > Michael Ellerman writes: > > > On Thu, 2015-26-03 at 13:13:39 UTC, "Aneesh Kumar K.V" wrote: > >> pte can get updated from other CPUs as part of multiple activities > >> like THP split, huge page collapse, unmap. We need to make sure we > >> don't reload the pte value again and again for different checks. > >> > >> --- > >> arch/powerpc/include/asm/kvm_book3s_64.h | 5 - > >> arch/powerpc/kvm/e500_mmu_host.c | 20 > >> 2 files changed, 16 insertions(+), 9 deletions(-) > > > > So this series is partly KVM but mostly powerpc. > > > > I assume you can't split it into two separate series easily? > > Yes, will do. But before that I was looking for feedback from Ben or > Paul. > > > > > You haven't sent it to the KVM lists or to Alex AFAICS. You'll need to do > > that > > for the KVM pieces at least. We can probably take it all via powerpc, but > > I'll > > need an ACK from Alex at least. > > > > If I can get an ack from Ben, that makes it easy for the kvm list. Ack. Using ACCESS_ONCE in a lockless access of the PTE that does multiple checks makes sense. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] selftests/powerpc: Add transactional syscall test
On 24/03/15 13:02, Michael Ellerman wrote: > On Tue, 2015-03-24 at 12:52 +1100, Sam Bobroff wrote: >> On 20/03/15 20:25, Anshuman Khandual wrote: >>> On 03/19/2015 10:13 AM, Sam Bobroff wrote: Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff >>> >>> The test works. >> >> Great :-) >> + +int tm_syscall(void) +{ + SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM)); + setbuf(stdout, 0); + FAIL_IF(!t_active_getppid_test()); + printf("%d active transactions correctly aborted.\n", TM_TEST_RUNS); + FAIL_IF(!t_suspended_getppid_test()); + printf("%d suspended transactions succeeded.\n", TM_TEST_RUNS); + return 0; +} + +int main(void) +{ + return test_harness(tm_syscall, "tm_syscall"); +} + >>> >>> There is an extra blank line at the end of this file. Interchanging return >>> codes of 0 and 1 for various functions make it very confusing along with >>> negative FAIL_IF checks in the primary test function. Control flow >>> structures >>> like these can use some in-code documentation for readability. >>> >>> + for (i = 0; i < TM_RETRIES; i++) { >>> + if (__builtin_tbegin(0)) { >>> + getppid(); >>> + __builtin_tend(0); >>> + return 1; >>> + } >>> + if (t_failure_persistent()) >>> + return 0; >>> >>> or >>> >>> + if (__builtin_tbegin(0)) { >>> + __builtin_tsuspend(); >>> + getppid(); >>> + __builtin_tresume(); >>> + __builtin_tend(0); >>> + return 1; >>> + } >>> + if (t_failure_persistent()) >>> + return 0; >>> >> >> Good points. I'll remove the blank line and comment the code. >> >> I'm not sure I can do any better with the FAIL_IF() macro: I wanted it >> to read "fail if the test failed", but I can see what you mean about a >> double negative. Maybe it would be better to introduce a different >> macro, more like a standard assert: TEST(XXX) which fails if XXX is >> false. However, I think "TEST" would be too generic a name and I'm not >> should what would be better. Any comments/suggestions? > > FAIL_IF() is designed for things that return 0 for OK and !0 for failure. Like > most things in C. > > So I think it would be improved if you inverted your return codes in your test > routines. > > Even better to return ESOMETHING in the error cases, and zero otherwise. > > cheers Fair enough. I think the *_test() functions I added for "clarity" were just making it more confusing, so I've dropped them. Moving the code around, even a little, has also exposed the fact that transactions are very sensitive to how the code is compiled so I'm going to move the transaction blocks out into a separate assembly file where I can control exactly what instructions get used. This will also mean that it's no longer dependent on using linker magic (or some other trick) to avoid lazy symbol loading. I'll repost the series. Thanks for the review! Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] powerpc: Use PFN_PHYS() to avoid truncating the physical address
On Fri, 2015-03-27 at 13:07 -0500, Scott Wood wrote: > On Fri, 2015-03-27 at 10:45 +1100, Michael Ellerman wrote: > > On Thu, 2015-03-26 at 10:31 -0500, Emil Medve wrote: > > > Hello Kumar, > > > > > > > > > On 03/26/2015 10:18 AM, Kumar Gala wrote: > > > > Why no commit message with what issue this change was trying to fix? > > > > > > A while back, when I attempted to remove bootmem (in favor of just plain > > > memblock as in powerpc land bootmem was just a wrapper to memblock > > > anyway) I run at some point into a problem with an intermediate address > > > value because of this '<< PAGE_SHIFT' on the wrong width variable. Using > > > PFN_PHYS() took care of it (it has a cast) so I decided to get this > > > defensive patch applied. Since, I dropped my bootmem/memblock patches in > > > favor to Anton's (Blanchard) work so my concrete issue example is > > > somewhat gone > > > > I'm not a big fan of it unless it's actually fixing an issue. It's a lot of > > churn and the end result is less readable IMHO. > > It is fixing an issue -- the issue is that there are overflow errors in > the code. Some of the places Emil fixed are only for platforms that > don't have physical addresses larger than pointers, or have the needed > casts, or are known to be dealing with lowmem, but others aren't. E.g. > page_is_ram() and devmem_is_allowed() are buggy on 32-bit with 64-bit > physical. Right. So obviously I'm fine with all the cases where it fixes an actual bug. > flush_dcache_icache_page() is buggy on mpc86xx with more than 4 GiB RAM > -- though that would still be buggy even with this change, due to > __flush_dcache_icache_phys taking unsigned long. The entire concept of > that function doesn't work for sizeof(phys_addr_t) > sizeof(void *), so > in this case 86xx should be using the booke code instead. > > Even in the places where overflow can't happen due to the above > circumstances (other than having the needed cast), it's setting a bad > example that can be copied to places where it will break, or the > circumstances of the code could change (e.g. currently 64-bit-only code > being used on 32-bit). I agree with that in principle, but it looks like in a bunch of places this patch ends up assigning the result to unsigned long anyway. So those cases are just churn IMHO. The end result doesn't work with 64-bit phys if the code is ever used there, even though it looks like maybe it should, and it's also not setting a good example for other code. Those cases should either be left alone, or fixed properly to use phys_addr_t. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] powerpc/powernv: convert codes returned by OPAL calls
On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: > OPAL has its own list of return codes. The patch provides a translation > of such codes in errnos for the opal_sensor_read call, and possibly > others if needed. > > Index: linux.git/arch/powerpc/platforms/powernv/opal.c > === > --- linux.git.orig/arch/powerpc/platforms/powernv/opal.c > +++ linux.git/arch/powerpc/platforms/powernv/opal.c > @@ -894,6 +894,23 @@ void opal_free_sg_list(struct opal_sg_li > } > } > > +int opal_error_code(int rc) > +{ > + switch (rc) { > + case OPAL_SUCCESS: return 0; Obviously correct. > + case OPAL_PARAMETER:return -EINVAL; Yep. > + case OPAL_UNSUPPORTED: return -ENOSYS; You shouldn't use ENOSYS here, that should only ever mean "no such syscall", otherwise you get very confusing results like read() returning ENOSYS. > + case OPAL_ASYNC_COMPLETION: return -EAGAIN; EAGAIN means "try what you did again", I don't think that's what ASYNC_COMPLETION means, is it? It looks like it means, "don't try again, but you need to wait for the result to be ready". I'm not sure it maps well to any of the Linux codes, maybe EINPROGRESS ? > + case OPAL_BUSY_EVENT: return -EBUSY; Yep. > + case OPAL_NO_MEM: return -ENOMEM; Yep. > + case OPAL_HARDWARE: return -ENOENT; This is another one which I think you shouldn't use as it can lead to confusing results at user level. eg: $ cat /sysfs/some/file Error: No such file or directory Huh? Looking at the skiboot code this looks like EIO is a good match. > + case OPAL_INTERNAL_ERROR: return -EIO; Yeah as good as anything I guess. > + default: > + pr_err("%s: unexpected OPAL error %d\n", __func__, rc); > + return -ERANGE; I'm not sure about this one honestly, it means "Math result not representable". I suspect the reason RTAS chose it was just that it's not EINVAL. This should probably also just be EIO. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 3/3] powerpc/powernv: remove opal_sensor_mutex
On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: > The opal sensor mutex protects the opal_sensor_read call which > can return a OPAL_BUSY code on IBM Power systems if a previous > request is in progress. > > This can be handled at user level with a retry. It can, but how does it actually look in practice? It looks like the only use of opal_get_sensor_data() is show_sensor() in drivers/hwmon/ibmpowernv.c. Because that's a sysfs attribute folks will be generally just dumping that with cat, or reading it in a shell script, neither of which will cope nicely with EBUSY I think? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] kernel/reboot.c: Add orderly_reboot for graceful reboot
The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. Signed-off-by: Joel Stanley --- include/linux/reboot.h | 1 + kernel/reboot.c| 51 +++--- 2 files changed, 49 insertions(+), 3 deletions(-) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 48bf152..a4ffcd9 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -68,6 +68,7 @@ void ctrl_alt_del(void); extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN]; extern int orderly_poweroff(bool force); +extern int orderly_reboot(void); /* * Emergency restart, callable from an interrupt handler. diff --git a/kernel/reboot.c b/kernel/reboot.c index a3a9e24..d0aa1ec 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -306,8 +306,9 @@ void ctrl_alt_del(void) } char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff"; +char reboot_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/reboot"; -static int __orderly_poweroff(bool force) +static int run_cmd(const char *cmd) { char **argv; static char *envp[] = { @@ -316,8 +317,7 @@ static int __orderly_poweroff(bool force) NULL }; int ret; - - argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL); + argv = argv_split(GFP_KERNEL, cmd, NULL); if (argv) { ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC); argv_free(argv); @@ -325,8 +325,33 @@ static int __orderly_poweroff(bool force) ret = -ENOMEM; } + return ret; +} + +static int __orderly_reboot(void) +{ + int ret; + + ret = run_cmd(reboot_cmd); + + if (ret) { + pr_warn("Failed to start orderly reboot: forcing the issue\n"); + emergency_sync(); + kernel_restart(NULL); + } + + return ret; +} + +static int __orderly_poweroff(bool force) +{ + int ret; + + ret = run_cmd(reboot_cmd); + if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); + /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an @@ -364,6 +389,26 @@ int orderly_poweroff(bool force) } EXPORT_SYMBOL_GPL(orderly_poweroff); +static void reboot_work_func(struct work_struct *work) +{ + __orderly_reboot(); +} + +static DECLARE_WORK(reboot_work, reboot_work_func); + +/** + * orderly_reboot - Trigger an orderly system reboot + * + * This may be called from any context to trigger a system reboot. + * If the orderly reboot fails, it will force an immediate reboot. + */ +int orderly_reboot(void) +{ + schedule_work(&reboot_work); + return 0; +} +EXPORT_SYMBOL_GPL(orderly_reboot); + static int __init reboot_setup(char *str) { for (;;) { -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/powernv: Reboot when requested by firmware
Use orderly_reboot so userspace will to shut itself down via the reboot path. This is required for graceful reboot initiated by the BMC, such as when a user uses ipmitool to issue a 'chassis power cycle' command. Signed-off-by: Joel Stanley --- arch/powerpc/platforms/powernv/opal-power.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-power.c b/arch/powerpc/platforms/powernv/opal-power.c index 48bf5b0..ac46c2c 100644 --- a/arch/powerpc/platforms/powernv/opal-power.c +++ b/arch/powerpc/platforms/powernv/opal-power.c @@ -29,8 +29,9 @@ static int opal_power_control_event(struct notifier_block *nb, switch (type) { case SOFT_REBOOT: - /* Fall through. The service processor is responsible for -* bringing the machine back up */ + pr_info("OPAL: reboot requested\n"); + orderly_reboot(); + break; case SOFT_OFF: pr_info("OPAL: poweroff requested\n"); orderly_poweroff(true); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on boot once kmalloc can be called". But that's not *quite* true. We set it at the bottom of mem_init(), and rely on the fact that mm_init() calls kmem_cache_init() immediately after that, and nothing is running in parallel. So replace it with the generic and 100% correct slab_is_available(). Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/setup.h | 1 - arch/powerpc/kernel/pci-common.c | 2 +- arch/powerpc/kernel/rtas.c | 4 ++-- arch/powerpc/lib/alloc.c | 2 +- arch/powerpc/mm/mem.c| 3 --- arch/powerpc/mm/pgtable_32.c | 9 - arch/powerpc/mm/pgtable_64.c | 4 ++-- arch/powerpc/platforms/cell/celleb_pci.c | 2 +- 8 files changed, 11 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index fbdf18cf954c..e9d384cbd021 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -7,7 +7,6 @@ extern void ppc_printk_progress(char *s, unsigned short hex); extern unsigned int rtas_data; -extern int mem_init_done; /* set on boot once kmalloc can be called */ extern unsigned long long memory_limit; extern unsigned long klimit; extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2a525c938158..bcf618bfff1e 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -76,7 +76,7 @@ struct pci_controller *pcibios_alloc_controller(struct device_node *dev) list_add_tail(&phb->list_node, &hose_list); spin_unlock(&hose_spinlock); phb->dn = dev; - phb->is_dynamic = mem_init_done; + phb->is_dynamic = slab_is_available(); #ifdef CONFIG_PPC64 if (dev) { int nid = of_node_to_nid(dev); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 21c45a2d0706..fffdef2962eb 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -401,7 +401,7 @@ static char *__fetch_rtas_last_error(char *altbuf) buf = altbuf; } else { buf = rtas_err_buf; - if (mem_init_done) + if (slab_is_available()) buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC); } if (buf) @@ -461,7 +461,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) if (buff_copy) { log_error(buff_copy, ERR_TYPE_RTAS_LOG, 0); - if (mem_init_done) + if (slab_is_available()) kfree(buff_copy); } return ret; diff --git a/arch/powerpc/lib/alloc.c b/arch/powerpc/lib/alloc.c index 4a6c2cf890d9..60b0b3fc8fc1 100644 --- a/arch/powerpc/lib/alloc.c +++ b/arch/powerpc/lib/alloc.c @@ -10,7 +10,7 @@ void * __init_refok zalloc_maybe_bootmem(size_t size, gfp_t mask) { void *p; - if (mem_init_done) + if (slab_is_available()) p = kzalloc(size, mask); else { p = memblock_virt_alloc(size, 0); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index b7285a5870f8..45fda71feb27 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -61,7 +61,6 @@ #define CPU_FTR_NOEXECUTE 0 #endif -int mem_init_done; unsigned long long memory_limit; #ifdef CONFIG_HIGHMEM @@ -377,8 +376,6 @@ void __init mem_init(void) pr_info(" * 0x%08lx..0x%08lx : vmalloc & ioremap\n", VMALLOC_START, VMALLOC_END); #endif /* CONFIG_PPC32 */ - - mem_init_done = 1; } void free_initmem(void) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 03b1a3b0fbd5..57b41f8b7053 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -110,9 +110,8 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd) __init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { pte_t *pte; - extern int mem_init_done; - if (mem_init_done) { + if (slab_is_available()) { pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO); } else { pte = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE)); @@ -219,7 +218,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags, * Don't allow anybody to remap normal RAM that we're using. * mem_init() sets high_memory so only do the check after that. */ - if (mem_init_done && (p < virt_to_phys(high_memory)) && + if (slab_is_available() && (p < virt_to_phys(high_memory)) && !(__allow_ioremap_reserved && memblock_is_region_reserved(p, size))) { printk("__ioremap(): phys addr 0x%llx is RAM lr %pf\n",
[PATCH v2 3/4] selftests/powerpc: Add transactional syscall test
Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff --- v2: Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 5 files changed, 142 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index 33d02cc..2699635d 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -1 +1,2 @@ tm-resched-dscr +tm-syscall diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 2cede23..93bbff3 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,8 +1,10 @@ -PROGS := tm-resched-dscr +PROGS := tm-resched-dscr tm-syscall +CFLAGS:=$(CFLAGS) -mhtm all: $(PROGS) $(PROGS): ../harness.c +tm-syscall: tm-syscall-asm.S run_tests: all @-for PROG in $(PROGS); do \ diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S new file mode 100644 index 000..2b2daa7 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S @@ -0,0 +1,27 @@ +#include +#include + + .text +FUNC_START(getppid_tm_active_impl) + tbegin. + beq 1f + li r0, __NR_getppid + sc + tend. + blr +1: + li r3, -1 + blr + +FUNC_START(getppid_tm_suspended_impl) + tbegin. + beq 1f + li r0, __NR_getppid + tsuspend. + sc + tresume. + tend. + blr +1: + li r3, -1 + blr diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h new file mode 100644 index 000..6136328 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h @@ -0,0 +1,2 @@ +extern int getppid_tm_active_impl(void); +extern int getppid_tm_suspended_impl(void); diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c new file mode 100644 index 000..ff3b15c --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c @@ -0,0 +1,109 @@ +/* Test the kernel's system call code to ensure that a system call + * made from within an active HTM transaction is aborted with the + * correct failure code. + * Conversely, ensure that a system call made from within a + * suspended transaction can succeed. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "tm-syscall-asm.h" + +unsigned retries = 0; + +#define TEST_DURATION 10 /* seconds */ +#define TM_RETRIES 100 + +long failure_code(void) +{ + return __builtin_get_texasr() >> 56; +} + +bool failure_is_persistent(void) +{ + return (failure_code() & TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT; +} + +bool failure_is_syscall(void) +{ + return (failure_code() & TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL; +} + +pid_t getppid_tm(bool suspend) +{ + int i; + pid_t pid; + + for (i = 0; i < TM_RETRIES; i++) { + if (suspend) + pid = getppid_tm_suspended_impl(); + else + pid = getppid_tm_active_impl(); + if (pid >= 0) + return pid; + if (failure_is_persistent()) { + if (failure_is_syscall()) + return -1; + printf("Unexpected persistent transaction failure.\n"); + printf("TEXASR 0x%016lx, TFIAR 0x%016lx.\n", + __builtin_get_texasr(), __builtin_get_tfiar()); + exit(-1); + } + retries++; + } + printf("Exceeded limit of %d temporary transaction failures.\n",
[PATCH v2 1/4] powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return immediately without performing the syscall. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality because syscalls will still be performed after tsuspend. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a system call increases by about 0.5%. Signed-off-by: Sam Bobroff --- v2: Also update the failure code table. Documentation/powerpc/transactional_memory.txt | 32 arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 ++ 3 files changed, 36 insertions(+), 17 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 9791e98..98b39af 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock & make no progress. +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. -Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearance of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals @@ -176,8 +177,7 @@ kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. TM_CAUSE_TLBI Software TLB invalide. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. - TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort -transactions for consistency will use this. + TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. TM_CAUSE_MISC Currently unused. TM_CAUSE_ALIGNMENT Alignment fault. diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h index 5d836b7..5047659 100644 --- a/arch/powerpc/include/uapi/asm/tm.h +++ b/arch/powerpc/include/uapi/asm/tm.h @@ -11,7 +11,7 @@ #define TM_CAUSE_RESCHED 0xde #define TM_CAUSE_TLBI 0xdc #define TM_CAUSE_FAC_UNAV 0xda -#define TM_CAUSE_SYSCALL 0xd8 /* future use */ +#define TM_CAUSE_SYSCALL 0xd8 #define TM_CAUSE_MISC 0xd6 /* future use */ #define TM_CAUSE_SIGNAL0xd4 #define TM_CAUSE_ALIGNMENT 0xd2 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index d180caf2..85bf81d 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -34,6 +34,7 @@ #include #include #include +#include /* * System calls. @@ -145,6 +146,24 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne syscall_dotrace .Lsyscall_dotrace_cont: +#ifdef CONF
[PATCH v2 2/4] selftests/powerpc: Move get_auxv_entry() to harness.c
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make it available to other tests. Signed-off-by: Sam Bobroff --- tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/utils.h |2 +- 4 files changed, 48 insertions(+), 49 deletions(-) diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c index 8ebc58a..f7997af 100644 --- a/tools/testing/selftests/powerpc/harness.c +++ b/tools/testing/selftests/powerpc/harness.c @@ -11,6 +11,10 @@ #include #include #include +#include +#include +#include +#include #include "subunit.h" #include "utils.h" @@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name) return rc; } + +static char auxv[4096]; + +void *get_auxv_entry(int type) +{ + ElfW(auxv_t) *p; + void *result; + ssize_t num; + int fd; + + fd = open("/proc/self/auxv", O_RDONLY); + if (fd == -1) { + perror("open"); + return NULL; + } + + result = NULL; + + num = read(fd, auxv, sizeof(auxv)); + if (num < 0) { + perror("read"); + goto out; + } + + if (num > sizeof(auxv)) { + printf("Overflowed auxv buffer\n"); + goto out; + } + + p = (ElfW(auxv_t) *)auxv; + + while (p->a_type != AT_NULL) { + if (p->a_type == type) { + result = (void *)p->a_un.a_val; + break; + } + + p++; + } +out: + close(fd); + return result; +} diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index 9768dea..a07104c 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -5,15 +5,10 @@ #define _GNU_SOURCE/* For CPU_ZERO etc. */ -#include #include -#include -#include #include #include #include -#include -#include #include #include "utils.h" @@ -256,45 +251,3 @@ out: return rc; } -static char auxv[4096]; - -void *get_auxv_entry(int type) -{ - ElfW(auxv_t) *p; - void *result; - ssize_t num; - int fd; - - fd = open("/proc/self/auxv", O_RDONLY); - if (fd == -1) { - perror("open"); - return NULL; - } - - result = NULL; - - num = read(fd, auxv, sizeof(auxv)); - if (num < 0) { - perror("read"); - goto out; - } - - if (num > sizeof(auxv)) { - printf("Overflowed auxv buffer\n"); - goto out; - } - - p = (ElfW(auxv_t) *)auxv; - - while (p->a_type != AT_NULL) { - if (p->a_type == type) { - result = (void *)p->a_un.a_val; - break; - } - - p++; - } -out: - close(fd); - return result; -} diff --git a/tools/testing/selftests/powerpc/pmu/lib.h b/tools/testing/selftests/powerpc/pmu/lib.h index 0f0339c..ca5d72a 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.h +++ b/tools/testing/selftests/powerpc/pmu/lib.h @@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe); extern int notify_parent_of_error(union pipe write_pipe); extern pid_t eat_cpu(int (test_function)(void)); extern bool require_paranoia_below(int level); -extern void *get_auxv_entry(int type); struct addr_range { uint64_t first, last; diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index a93777a..64f53cd 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -19,7 +19,7 @@ typedef uint8_t u8; int test_harness(int (test_function)(void), char *name); - +extern void *get_auxv_entry(int type); /* Yes, this is evil */ #define FAIL_IF(x) \ -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 4/4] powerpc/tm: Correct minor documentation typos
Signed-off-by: Sam Bobroff --- v2: Discovered some typos while updating the documentation. Documentation/powerpc/transactional_memory.txt |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 98b39af..ba0a2a4 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -175,7 +175,7 @@ These are defined in , and distinguish different reasons why the kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. - TM_CAUSE_TLBI Software TLB invalide. + TM_CAUSE_TLBI Software TLB invalid. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. @@ -185,7 +185,7 @@ kernel aborted a transaction: These can be checked by the user program's abort handler as TEXASR[0:7]. If bit 7 is set, it indicates that the error is consider persistent. For example -a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q +a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not. GDB === -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/4] powerpc/tm: Abort syscalls in active transactions
See the first patch for a description of the reasoning behind this change. This set includes the change, a kernel selftest for it and some slight refactoring of the selftest code. v2: Patch 1/4: powerpc/tm: Abort syscalls in active transactions Also update the failure code table. Patch 3/4: selftests/powerpc: Add transactional syscall test Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. Patch 4/4: powerpc/tm: Correct minor documentation typos Discovered some typos while updating the documentation. Sam Bobroff (4): powerpc/tm: Abort syscalls in active transactions selftests/powerpc: Move get_auxv_entry() to harness.c selftests/powerpc: Add transactional syscall test powerpc/tm: Correct minor documentation typos Documentation/powerpc/transactional_memory.txt | 36 +++ arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 tools/testing/selftests/powerpc/utils.h|2 +- 12 files changed, 228 insertions(+), 69 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock
On Wed, 2015-03-25 at 13:34 -0400, Sowmini Varadhan wrote: > Investigation of multithreaded iperf experiments on an ethernet > interface show the iommu->lock as the hottest lock identified by > lockstat, with something of the order of 21M contentions out of > 27M acquisitions, and an average wait time of 26 us for the lock. > This is not efficient. A more scalable design is to follow the ppc > model, where the iommu_table has multiple pools, each stretching > over a segment of the map, and with a separate lock for each pool. > This model allows for better parallelization of the iommu map search. > .../... > > diff --git a/include/linux/iommu-common.h b/include/linux/iommu-common.h > new file mode 100644 > index 000..197111b > --- /dev/null > +++ b/include/linux/iommu-common.h > @@ -0,0 +1,48 @@ > +#ifndef _LINUX_IOMMU_COMMON_H > +#define _LINUX_IOMMU_COMMON_H > + > +#include > +#include > +#include > + > +#define IOMMU_POOL_HASHBITS 4 > +#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS) I don't like those macros. You changed the value from what we had on powerpc. It could be that the new values are as good for us but I'd like to do a bit differently. Can you make the bits a variable ? Or at least an arch-provided macro which we can change later if needed ? > +struct iommu_pool { > + unsigned long start; > + unsigned long end; > + unsigned long hint; > + spinlock_t lock; > +}; > + > +struct iommu_table { Let's make it clear that this is for allocation of DMA space only, it would thus make my life easier when adapting powerpc to use a different names, something like "struct iommu_area" works for me, or "iommu alloc_region" .. whatever you fancy the most. > + unsigned long page_table_map_base; > + unsigned long page_table_shift; Minor nit but I'm not fan of the naming here, maybe just use entry_shift ? I don't like too long identifiers :) Same for base, could just be "base" or "table_base". > + unsigned long nr_pools; > + void(*flush_all)(struct iommu_table *); Call this "lazy_flush", document that it is optional and when it is called. > + unsigned long poolsize; > + struct iommu_pool arena_pool[IOMMU_NR_POOLS]; Why adding the 'arena' prefix ? What was wrong with "pools" in the powerpc imlementation ? > + u32 flags; > +#define IOMMU_HAS_LARGE_POOL0x0001 > +#define IOMMU_NO_SPAN_BOUND 0x0002 > + struct iommu_pool large_pool; > + unsigned long *map; > +}; > + > +extern void iommu_tbl_pool_init(struct iommu_table *iommu, > + unsigned long num_entries, > + u32 page_table_shift, > + void (*flush_all)(struct iommu_table *), > + bool large_pool, u32 npools, > + bool skip_span_boundary_check); > + > +extern unsigned long iommu_tbl_range_alloc(struct device *dev, > +struct iommu_table *iommu, > +unsigned long npages, > +unsigned long *handle); > + > +extern void iommu_tbl_range_free(struct iommu_table *iommu, > + u64 dma_addr, unsigned long npages, > + unsigned long entry); > + > +#endif > diff --git a/lib/Makefile b/lib/Makefile > index 3c3b30b..0ea2ac6 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -102,7 +102,7 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o > obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o > > obj-$(CONFIG_SWIOTLB) += swiotlb.o > -obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o > +obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o iommu-common.o > obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o > obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o > obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o > diff --git a/lib/iommu-common.c b/lib/iommu-common.c > new file mode 100644 > index 000..bb7e706 > --- /dev/null > +++ b/lib/iommu-common.c > @@ -0,0 +1,235 @@ > +/* > + * IOMMU mmap management and range allocation functions. > + * Based almost entirely upon the powerpc iommu allocator. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define IOMMU_LARGE_ALLOC15 Make that a variable, here too, the arch might want to tweak it. I think 15 is actually not a good value for powerpc with 4k iommu pages and 64k PAGE_SIZE, we should make the above some kind of factor of PAGE_SHIFT - iommu_page_shift. > +static DEFINE_PER_CPU(unsigned int, iommu_pool_hash); > + > +static void setup_iommu_pool_hash(void) > +{ > + unsigned int i; > + static bool do_once; > + > + if (do_once) > + return; > + do_once = true; > + for_each_possible_cpu(i)
[PATCH 2/2] KVM: PPC: Remove page table walk helpers
This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/pgtable.h | 21 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 - arch/powerpc/kvm/e500_mmu_host.c| 2 +- 3 files changed, 28 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 9835ac4173b7..92fe01c355a9 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); - -static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva, -unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift); - if (!ptep) - return NULL; - if (shift) - *pte_sizep = 1ul << shift; - else - *pte_sizep = PAGE_SIZE; - - if (ps > *pte_sizep) - return NULL; - - return ptep; -} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 625407e4d3b0..73e083cb9f7e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, unlock_rmap(rmap); } -static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int hugepage_shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift); - if (!ptep) - return __pte(0); - if (hugepage_shift) - *pte_sizep = 1ul << hugepage_shift; - else - *pte_sizep = PAGE_SIZE; - if (ps > *pte_sizep) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); -} - static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) { asm volatile(PPC_RELEASE_BARRIER "" : : : "memory"); @@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, struct revmap_entry *rev; unsigned long g_ptel; struct kvm_memory_slot *memslot; - unsigned long pte_size; + unsigned hpage_shift; unsigned long is_io; unsigned long *rmap; - pte_t pte; + pte_t *ptep; unsigned int writing; unsigned long mmu_seq; unsigned long rcbits; @@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); + ptep = find_linux_pte_or_hugepte(pgdir, hva, &hpage_shift); + if (ptep) { + pte_t pte; + unsigned int host_pte_size; - /* Look up the Linux PTE for the backing page */ - pte_size = psize; - pte = lookup_linux_pte_and_update(pgdir, hva, writing, &pte_size); - if (pte_present(pte) && !pte_protnone(pte)) { - if (writing && !pte_write(pte)) - /* make the actual HPTE be read-only */ - ptel = hpte_make_readonly(ptel); - is_io = hpte_cache_bits(pte_val(pte)); - pa = pte_pfn(pte) << PAGE_SHIFT; - pa |= hva & (pte_size - 1); - pa |= gpa & ~PAGE_MASK; - } + if (hpage_shift) + host_pte_size = 1ul << hpage_shift; + else + host_pte_size = PAGE_SIZE; + /* +* We should always find the guest page size +* to <= host page size, if host is using hugepage +*/ + if (host_pte_size < psize) + return H_PARAMETER; - if (pte_size < psize) - return H_PARAMETER; + pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); + if (pte_present(pte) && !pte_protnone(pte)) { + if (writing && !pte_write(pte)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + is_io = hpte_cache_bits(pte_val(pte)); + pa = pte_pfn(pte) << PAGE_SHIFT; + pa |= hva & (host_pte_size - 1); + pa |= gpa & ~PA
[PATCH 1/2] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V --- Note: This is posted previously as part of http://article.gmane.org/gmane.linux.ports.ppc.embedded/79278 arch/powerpc/include/asm/kvm_book3s_64.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 20 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index cc073a7ac2b7..f06820c67175 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, pte_t old_pte, new_pte = __pte(0); while (1) { - old_pte = *ptep; + /* +* Make sure we don't reload from ptep +*/ + old_pte = READ_ONCE(*ptep); /* * wait until _PAGE_BUSY is clear then set it atomically */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4a75ef..5840d546aa03 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500->vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, &tsize_pages); - if (pte_present(*ptep)) - wimg = (*ptep >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK; - else { - if (printk_ratelimit()) - pr_err("%s: pte not present: gfn %lx, pfn %lx\n", - __func__, (long)gfn, pfn); - ret = -EINVAL; - goto out; + if (ptep) { + pte_t pte = READ_ONCE(*ptep); + + if (pte_present(pte)) + wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) & + MAS2_WIMGE_MASK; + else { + pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n", + __func__, (long)gfn, pfn); + ret = -EINVAL; + goto out; + } } kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V5 1/2] powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V --- Changes from V4: * Update function comment for __find_linux_pte_or_hugepte arch/powerpc/include/asm/pgtable.h | 11 ++- arch/powerpc/kernel/eeh.c| 6 -- arch/powerpc/kernel/io-workarounds.c | 10 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 30 +++--- arch/powerpc/kvm/e500_mmu_host.c | 14 -- arch/powerpc/mm/hash_utils_64.c | 2 +- arch/powerpc/mm/hugetlbpage.c| 22 -- arch/powerpc/perf/callchain.c| 24 ++-- 9 files changed, 88 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 92fe01c355a9..11a38635dd65 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -247,8 +247,17 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #define pmd_large(pmd) 0 #define has_transparent_hugepage() 0 #endif -pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, +pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); +static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift) +{ + if (!arch_irqs_disabled()) { + pr_info("%s called with irq enabled\n", __func__); + dump_stack(); + } + return __find_linux_pte_or_hugepte(pgdir, ea, shift); +} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 3b2252e7731b..8424b232e598 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -330,9 +330,11 @@ static inline unsigned long eeh_token_to_phys(unsigned long token) int hugepage_shift; /* -* We won't find hugepages here, iomem +* We won't find hugepages here(this is iomem). Hence we are not +* worried about _PAGE_SPLITTING/collapse. Also we will not hit +* page table free, because of init_mm. */ - ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, &hugepage_shift); + ptep = __find_linux_pte_or_hugepte(init_mm.pgd, token, &hugepage_shift); if (!ptep) return token; WARN_ON(hugepage_shift); diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c index 24b968f8e4d8..63d9cc4d7366 100644 --- a/arch/powerpc/kernel/io-workarounds.c +++ b/arch/powerpc/kernel/io-workarounds.c @@ -71,15 +71,15 @@ struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr) vaddr = (unsigned long)PCI_FIX_ADDR(addr); if (vaddr < PHB_IO_BASE || vaddr >= PHB_IO_END) return NULL; - - ptep = find_linux_pte_or_hugepte(init_mm.pgd, vaddr, + /* +* We won't find huge pages here (iomem). Also can't hit +* a page table free due to init_mm +*/ + ptep = __find_linux_pte_or_hugepte(init_mm.pgd, vaddr, &hugepage_shift); if (ptep == NULL) paddr = 0; else { - /* -* we don't have hugepages backing iomem -*/ WARN_ON(hugepage_shift); paddr = pte_pfn(*ptep) << PAGE_SHIFT; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..26df3864d85a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -539,12 +539,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (!writing && hpte_is_writable(r)) { unsigned int hugepage_shift; pte_t *ptep, pte
[PATCH V5 2/2] powerpc/mm/thp: Return pte address if we find trans_splitting.
For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/mm/hugetlbpage.c| 9 - 4 files changed, 10 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index f06820c67175..5233a35d80e2 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -281,11 +281,9 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type) /* * If it's present and writable, atomically set dirty and referenced bits and - * return the PTE, otherwise return 0. If we find a transparent hugepage - * and if it is marked splitting we return 0; + * return the PTE, otherwise return 0. */ -static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, -unsigned int hugepage) +static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing) { pte_t old_pte, new_pte = __pte(0); @@ -301,12 +299,6 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, cpu_relax(); continue; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* If hugepage and is trans splitting return None */ - if (unlikely(hugepage && -pmd_trans_splitting(pte_pmd(old_pte - return __pte(0); -#endif /* If pte is not present return None */ if (unlikely(!(pte_val(old_pte) & _PAGE_PRESENT))) return __pte(0); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 26df3864d85a..0fe9c92e78ed 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -537,20 +537,17 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } /* if the guest wants write access, see if that is OK */ if (!writing && hpte_is_writable(r)) { - unsigned int hugepage_shift; pte_t *ptep, pte; unsigned long flags; - /* * We need to protect against page table destruction -* while looking up and updating the pte. +* hugepage split and collapse. */ local_irq_save(flags); ptep = find_linux_pte_or_hugepte(current->mm->pgd, -hva, &hugepage_shift); +hva, NULL); if (ptep) { - pte = kvmppc_read_update_linux_pte(ptep, 1, - hugepage_shift); + pte = kvmppc_read_update_linux_pte(ptep, 1); if (pte_write(pte)) write_ok = 1; } diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index f559b25de173..d839f08cb903 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -219,7 +219,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, local_irq_restore(flags); return H_PARAMETER; } - pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); + pte = kvmppc_read_update_linux_pte(ptep, writing); if (pte_present(pte) && !pte_protnone(pte)) { if (writing && !pte_write(pte)) /* make the actual HPTE be read-only */ diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index f9a792829d4b..5349fa87e21d 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -1013,12 +1013,11 @@ pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, * A hugepage collapse is captured by pmd_none, because * it mark the pmd none and do a hpte invalidate. * -* A hugepage split is captured by pmd_trans_splitting -* because we mark the pmd trans splitting and do a -* hpte invalidate -* +
Re: [PATCH 2/2] KVM: PPC: Remove page table walk helpers
On Mon, 2015-03-30 at 10:39 +0530, Aneesh Kumar K.V wrote: > This patch remove helpers which we had used only once in the code. > Limiting page table walk variants help in ensuring that we won't > end up with code walking page table with wrong assumptions. > > Signed-off-by: Aneesh Kumar K.V Alex, it would be preferable to have this (and the previous one) in the powerpc tree due to dependencies with further fixes to our page table walking vs. THP, so if you're happy, just give us an ack. Cheers, Ben. > --- > arch/powerpc/include/asm/pgtable.h | 21 - > arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 > - > arch/powerpc/kvm/e500_mmu_host.c| 2 +- > 3 files changed, 28 insertions(+), 57 deletions(-) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 9835ac4173b7..92fe01c355a9 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, > unsigned long addr, > #endif > pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, >unsigned *shift); > - > -static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva, > - unsigned long *pte_sizep) > -{ > - pte_t *ptep; > - unsigned long ps = *pte_sizep; > - unsigned int shift; > - > - ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift); > - if (!ptep) > - return NULL; > - if (shift) > - *pte_sizep = 1ul << shift; > - else > - *pte_sizep = PAGE_SIZE; > - > - if (ps > *pte_sizep) > - return NULL; > - > - return ptep; > -} > #endif /* __ASSEMBLY__ */ > > #endif /* __KERNEL__ */ > diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c > b/arch/powerpc/kvm/book3s_hv_rm_mmu.c > index 625407e4d3b0..73e083cb9f7e 100644 > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c > @@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long > pte_index, > unlock_rmap(rmap); > } > > -static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva, > - int writing, unsigned long *pte_sizep) > -{ > - pte_t *ptep; > - unsigned long ps = *pte_sizep; > - unsigned int hugepage_shift; > - > - ptep = find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift); > - if (!ptep) > - return __pte(0); > - if (hugepage_shift) > - *pte_sizep = 1ul << hugepage_shift; > - else > - *pte_sizep = PAGE_SIZE; > - if (ps > *pte_sizep) > - return __pte(0); > - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); > -} > - > static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) > { > asm volatile(PPC_RELEASE_BARRIER "" : : : "memory"); > @@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long > flags, > struct revmap_entry *rev; > unsigned long g_ptel; > struct kvm_memory_slot *memslot; > - unsigned long pte_size; > + unsigned hpage_shift; > unsigned long is_io; > unsigned long *rmap; > - pte_t pte; > + pte_t *ptep; > unsigned int writing; > unsigned long mmu_seq; > unsigned long rcbits; > @@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long > flags, > > /* Translate to host virtual address */ > hva = __gfn_to_hva_memslot(memslot, gfn); > + ptep = find_linux_pte_or_hugepte(pgdir, hva, &hpage_shift); > + if (ptep) { > + pte_t pte; > + unsigned int host_pte_size; > > - /* Look up the Linux PTE for the backing page */ > - pte_size = psize; > - pte = lookup_linux_pte_and_update(pgdir, hva, writing, &pte_size); > - if (pte_present(pte) && !pte_protnone(pte)) { > - if (writing && !pte_write(pte)) > - /* make the actual HPTE be read-only */ > - ptel = hpte_make_readonly(ptel); > - is_io = hpte_cache_bits(pte_val(pte)); > - pa = pte_pfn(pte) << PAGE_SHIFT; > - pa |= hva & (pte_size - 1); > - pa |= gpa & ~PAGE_MASK; > - } > + if (hpage_shift) > + host_pte_size = 1ul << hpage_shift; > + else > + host_pte_size = PAGE_SIZE; > + /* > + * We should always find the guest page size > + * to <= host page size, if host is using hugepage > + */ > + if (host_pte_size < psize) > + return H_PARAMETER; > > - if (pte_size < psize) > - return H_PARAMETER; > + pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); > + if (pte_present(pte) && !pte_protnone(pte)) { > + i
Re: [PATCH 2/2] powerpc/powernv: Reboot when requested by firmware
On Mon, 2015-03-30 at 12:45 +1030, Joel Stanley wrote: > Use orderly_reboot so userspace will to shut itself down via the reboot > path. This is required for graceful reboot initiated by the BMC, such > as when a user uses ipmitool to issue a 'chassis power cycle' command. > > Signed-off-by: Joel Stanley > --- > arch/powerpc/platforms/powernv/opal-power.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/opal-power.c > b/arch/powerpc/platforms/powernv/opal-power.c > index 48bf5b0..ac46c2c 100644 > --- a/arch/powerpc/platforms/powernv/opal-power.c > +++ b/arch/powerpc/platforms/powernv/opal-power.c > @@ -29,8 +29,9 @@ static int opal_power_control_event(struct notifier_block > *nb, > > switch (type) { > case SOFT_REBOOT: > - /* Fall through. The service processor is responsible for > - * bringing the machine back up */ > + pr_info("OPAL: reboot requested\n"); > + orderly_reboot(); > + break; Acked-by: Michael Ellerman Andrew, do you want to take these 2 via your tree? Assuming folks are OK with patch 1. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] powerpc/powernv: convert codes returned by OPAL calls
On 03/30/2015 04:05 AM, Michael Ellerman wrote: > On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: >> OPAL has its own list of return codes. The patch provides a translation >> of such codes in errnos for the opal_sensor_read call, and possibly >> others if needed. >> >> Index: linux.git/arch/powerpc/platforms/powernv/opal.c >> === >> --- linux.git.orig/arch/powerpc/platforms/powernv/opal.c >> +++ linux.git/arch/powerpc/platforms/powernv/opal.c >> @@ -894,6 +894,23 @@ void opal_free_sg_list(struct opal_sg_li >> } >> } >> >> +int opal_error_code(int rc) >> +{ >> +switch (rc) { >> +case OPAL_SUCCESS: return 0; > > Obviously correct. He. Initially, I didn't put a case for SUCCESS, but we have code doing : ret = be64_to_cpu(msg.params[1]); >> +case OPAL_PARAMETER:return -EINVAL; > > Yep. > >> +case OPAL_UNSUPPORTED: return -ENOSYS; > > You shouldn't use ENOSYS here, that should only ever mean "no such syscall", > otherwise you get very confusing results like read() returning ENOSYS. Indeed. How about ENODEV then ? >> +case OPAL_ASYNC_COMPLETION: return -EAGAIN; > > EAGAIN means "try what you did again", I don't think that's what > ASYNC_COMPLETION means, is it? It looks like it means, "don't try again, but > you need to wait for the result to be ready". > > I'm not sure it maps well to any of the Linux codes, maybe EINPROGRESS ? Yes. This is better. >> +case OPAL_BUSY_EVENT: return -EBUSY; > > Yep. > >> +case OPAL_NO_MEM: return -ENOMEM; > > Yep. > >> +case OPAL_HARDWARE: return -ENOENT; > > This is another one which I think you shouldn't use as it can lead to > confusing > results at user level. eg: > > $ cat /sysfs/some/file > Error: No such file or directory > > Huh? > > Looking at the skiboot code this looks like EIO is a good match. ok. >> +case OPAL_INTERNAL_ERROR: return -EIO; > > Yeah as good as anything I guess. > >> +default: >> +pr_err("%s: unexpected OPAL error %d\n", __func__, rc); >> +return -ERANGE; > > I'm not sure about this one honestly, it means "Math result not > representable". > > I suspect the reason RTAS chose it was just that it's not EINVAL. > > This should probably also just be EIO. ok. I will change it. Thanks, C. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Reword the "returning from prom_init" message
We get way too many bug reports that say "the kernel is hung in prom_init", which stems from the fact that the last piece of output people see is "returning from prom_init". The kernel is almost never hung in prom_init(), it's just that it's crashed somewhere after prom_init() but prior to the console coming up. The existing message should give a clue to that, ie. "returning from" indicates that prom_init() has finished, but it doesn't seem to work. Let's try something different. This prints: Quiescing Open Firmware ... Booting Linux via __start() ... Which hopefully makes it clear that prom_init() is not the problem, and although __start() probably isn't either, it's at least the right place to begin looking. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/prom_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 1a85d8f96739..fd1fe4c37599 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2898,7 +2898,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, * Call OF "quiesce" method to shut down pending DMA's from * devices etc... */ - prom_printf("Calling quiesce...\n"); + prom_printf("Quiescing Open Firmware ...\n"); call_prom("quiesce", 0, 0); /* @@ -2910,7 +2910,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, /* Don't print anything after quiesce under OPAL, it crashes OFW */ if (of_platform != PLATFORM_OPAL) { - prom_printf("returning from prom_init\n"); + prom_printf("Booting Linux via __start() ...\n"); prom_debug("->dt_header_start=0x%x\n", hdr); } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 3/3] powerpc/powernv: remove opal_sensor_mutex
On 03/30/2015 04:09 AM, Michael Ellerman wrote: > On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: >> The opal sensor mutex protects the opal_sensor_read call which >> can return a OPAL_BUSY code on IBM Power systems if a previous >> request is in progress. >> >> This can be handled at user level with a retry. > > It can, but how does it actually look in practice? > > It looks like the only use of opal_get_sensor_data() is show_sensor() in > drivers/hwmon/ibmpowernv.c. > > Because that's a sysfs attribute folks will be generally just dumping > that with cat, or reading it in a shell script, neither of which will > cope nicely with EBUSY I think? It won't, I agree but it should only happen when running concurrent cat commands on the hwmon sysfs files. The event should be rare enough. Anyhow, this is not a big issue. We can drop that patch. The real "issue" is the time it takes to get some values back from the FSP. This is what user space has been most surprised about. Thanks, C. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] powerpc/powernv: convert codes returned by OPAL calls
On Mon, 2015-03-30 at 08:37 +0200, Cedric Le Goater wrote: > On 03/30/2015 04:05 AM, Michael Ellerman wrote: > > On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: > >> OPAL has its own list of return codes. The patch provides a translation > >> of such codes in errnos for the opal_sensor_read call, and possibly > >> others if needed. > >> > >> + case OPAL_UNSUPPORTED: return -ENOSYS; > > > > You shouldn't use ENOSYS here, that should only ever mean "no such syscall", > > otherwise you get very confusing results like read() returning ENOSYS. > > Indeed. How about ENODEV then ? That can also be confusing from userspace. I think it's probably best just to use EIO, as far as userspace is concerned if the kernel lets it call an unsupported OPAL routine that is more or less a kernel bug. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] powerpc/powernv: convert codes returned by OPAL calls
On 03/30/2015 08:54 AM, Michael Ellerman wrote: > On Mon, 2015-03-30 at 08:37 +0200, Cedric Le Goater wrote: >> On 03/30/2015 04:05 AM, Michael Ellerman wrote: >>> On Fri, 2015-03-27 at 17:39 +0100, Cédric Le Goater wrote: OPAL has its own list of return codes. The patch provides a translation of such codes in errnos for the opal_sensor_read call, and possibly others if needed. + case OPAL_UNSUPPORTED: return -ENOSYS; >>> >>> You shouldn't use ENOSYS here, that should only ever mean "no such syscall", >>> otherwise you get very confusing results like read() returning ENOSYS. >> >> Indeed. How about ENODEV then ? > > That can also be confusing from userspace. > > I think it's probably best just to use EIO, as far as userspace is concerned > if > the kernel lets it call an unsupported OPAL routine that is more or less a > kernel bug. OK. Will do. Thanks, C. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev