Re: Re: [RFT PATCH -next ] [BUGFIX] kprobes: Fix "Failed to find blacklist" error on ia64 and ppc64
(2014/05/07 20:59), Masami Hiramatsu wrote: > Hi Tony, Benjamin and Paul, > > I've tried to fix this bug, but since I don't have either ppc64 nor ia64, > this patch is not tested on those archs. Please review and test it on > those machines. Ping? I need your help since I don't have test environment. Thank you, > > Thank you, > > (2014/05/07 20:55), Masami Hiramatsu wrote: >> On ia64 and ppc64, the function pointer does not point the >> entry address of the function, but the address of function >> discriptor (which contains the entry address and misc >> data.) Since the kprobes passes the function pointer stored >> by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for >> initalizing its blacklist, it fails and reports many errors >> as below. >> >> Failed to find blacklist 000101316830 >> Failed to find blacklist 0001013000f0a000 >> Failed to find blacklist 000101315f70a000 >> Failed to find blacklist 000101324c80a000 >> Failed to find blacklist 0001013063f0a000 >> Failed to find blacklist 000101327800a000 >> Failed to find blacklist 0001013277f0a000 >> Failed to find blacklist 000101315a70a000 >> Failed to find blacklist 0001013277e0a000 >> Failed to find blacklist 000101305a20a000 >> Failed to find blacklist 0001013277d0a000 >> Failed to find blacklist 00010130bdc0a000 >> Failed to find blacklist 00010130dc20a000 >> Failed to find blacklist 000101309a00a000 >> Failed to find blacklist 0001013277c0a000 >> Failed to find blacklist 0001013277b0a000 >> Failed to find blacklist 0001013277a0a000 >> Failed to find blacklist 000101327790a000 >> Failed to find blacklist 000101303140a000 >> Failed to find blacklist 0001013a3280a000 >> >> To fix this bug, this introduces function_entry() macro to >> retrieve the entry address from the given function pointer, >> and uses it in NOKPROBE_SYMBOL(). >> >> >> Signed-off-by: Masami Hiramatsu >> Reported-by: Tony Luck >> Cc: Tony Luck >> Cc: Fenghua Yu >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Ananth N Mavinakayanahalli >> Cc: Kevin Hao >> Cc: linux-i...@vger.kernel.org >> Cc: linux-ker...@vger.kernel.org >> Cc: linuxppc-dev@lists.ozlabs.org >> --- >> arch/ia64/include/asm/types.h|2 ++ >> arch/powerpc/include/asm/types.h | 11 +++ >> include/linux/kprobes.h |3 ++- >> include/linux/types.h|4 >> 4 files changed, 19 insertions(+), 1 deletion(-) >> >> diff --git a/arch/ia64/include/asm/types.h b/arch/ia64/include/asm/types.h >> index 4c351b1..6ab7b6c 100644 >> --- a/arch/ia64/include/asm/types.h >> +++ b/arch/ia64/include/asm/types.h >> @@ -27,5 +27,7 @@ struct fnptr { >> unsigned long gp; >> }; >> >> +#define constant_function_entry(fn) (((struct fnptr *)(fn))->ip) >> + >> #endif /* !__ASSEMBLY__ */ >> #endif /* _ASM_IA64_TYPES_H */ >> diff --git a/arch/powerpc/include/asm/types.h >> b/arch/powerpc/include/asm/types.h >> index bfb6ded..fd297b8 100644 >> --- a/arch/powerpc/include/asm/types.h >> +++ b/arch/powerpc/include/asm/types.h >> @@ -25,6 +25,17 @@ typedef struct { >> unsigned long env; >> } func_descr_t; >> >> +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF == 1) >> +/* >> + * On PPC64 ABIv1 the function pointer actually points to the >> + * function's descriptor. The first entry in the descriptor is the >> + * address of the function text. >> + */ >> +#define constant_function_entry(fn) (((func_descr_t *)(fn))->entry) >> +#else >> +#define constant_function_entry(fn) ((unsigned long)(fn)) >> +#endif >> + >> #endif /* __ASSEMBLY__ */ >> >> #endif /* _ASM_POWERPC_TYPES_H */ >> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h >> index e059507..637eafe 100644 >> --- a/include/linux/kprobes.h >> +++ b/include/linux/kprobes.h >> @@ -40,6 +40,7 @@ >> #include >> #include >> #include >> +#include >> >> #ifdef CONFIG_KPROBES >> #include >> @@ -485,7 +486,7 @@ static inline int enable_jprobe(struct jprobe *jp) >> #define __NOKPROBE_SYMBOL(fname)\ >> static unsigned long __used \ >> __attribute__((section("_kprobe_blacklist"))) \ >> -_kbl_addr_##fname = (unsigned long)fname; >> +_kbl_addr_##fname = constant_function_entry(fname); >> #define NOKPROBE_SYMBOL(fname) __NOKPROBE_SYMBOL(fname) >> #else >> #define NOKPROBE_SYMBOL(fname) >> diff --git a/include/linux/types.h b/include/linux/types.h >> index 4d118ba..78e2d7d 100644 >> --- a/include/linux/types.h >> +++ b/include/linux/types.h >> @@ -212,5 +212,9 @@ struct callback_head { >> }; >> #define rcu_head callback_head >> >> +#ifndef constant_function_entry >> +#define constant_function_entry(fn) ((unsigned long)(fn)) >> +#endif >> + >> #endif /* __ASSEMBLY__ */ >> #endif /* _LINUX_TYPES_H */ >> >> >> > > -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiram
Re: [PATCH] pci-scan: Fix setting the limit
Alexey Kardashevskiy writes: > PCI spec says that lower 20 bits are assumed 0xF. The existing code > seems to get it right in pci-bridge-set-mem-limit. > > However pci-bridge-set-mem-base does not account 0xF and poison > the limit. Since the limit is not stored anywhere in SLOF and only > besides in the config space, it remains broken. > > This fixes pci-bridge-set-mem-base. > > Signed-off-by: Alexey Kardashevskiy > --- > > I have doubts this is the right fix as I tried to "fix" > pci-bridge-set-mmio-base (while I am here) and it broke the guest. > > The problem I am fixing by this is that QEMU started as below is > unable to initialize virtio-net device because there are overlapping > virtio's BAR and bridge's "ranges" property. Note that virtio-net is > attached to the PHB, not that additional bridge. > > /home/aik/qemu-system-ppc64 \ > -enable-kvm \ > -m 1024 \ > -machine pseries \ > -nographic \ > -vga none \ > -device pci-bridge,id=id0,bus=pci.0,addr=5.0,chassis_nr=7 \ > -netdev tap,id=id1,ifname=tap1,script=ifup.sh,downscript=ifdown.sh \ > -device virtio-net-pci,id=id2,netdev=id1 \ > -initrd 1.cpio \ > -kernel vml315rc3 \ The problem that I saw here is the brigde device does not have a downstream pci device. Before probing, we are setting the ranges property to the max limit, and the probe is done. Once the probe is over we would update the ranges property. In this particular case, when we come to update the " ranges" property, we do not have any range and we skip updating it. Because of this the old max range property remains there, which is not correct. Something like the below solves this particular problem. But what would happen when someone tries to hotplug a pci device to this bridge, will the pci-hotplug code take care of updating the ranges property? diff --git a/slof/fs/pci-properties.fs b/slof/fs/pci-properties.fs index f88a571..f5e934d 100644 --- a/slof/fs/pci-properties.fs +++ b/slof/fs/pci-properties.fs @@ -410,6 +410,7 @@ dup IF \ IF any space present (propsize>0) s" ranges" property \ | write it into the device tree ELSE\ ELSE + s" " s" ranges" property 2drop \ | forget the properties THEN\ FI drop\ forget the address So I do not see the problem when there is a device allocated downstream to the pci-bridge. Regards Nikunj ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/14/14 06:46, Anshuman Khandual wrote: > On 05/13/2014 10:43 PM, Pedro Alves wrote: >> On 05/05/14 08:54, Anshuman Khandual wrote: >>> This patch enables get and set of transactional memory related register >>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing >>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, >>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new >>> ELF core note types added previously in this regard. >>> >>> (1) NT_PPC_TM_SPR >>> (2) NT_PPC_TM_CGPR >>> (3) NT_PPC_TM_CFPR >>> (4) NT_PPC_TM_CVMX >> >> Sorry that I couldn't tell this from the code, but, what does the >> kernel return when the ptracer requests these registers and the >> program is not in a transaction? Specifically I'm wondering whether >> this follows the same semantics as the s390 port. >> > > Right now, it still returns the saved state of the registers from thread > struct. I had assumed that the user must know the state of the transaction > before initiating the ptrace request. I guess its better to check for > the transaction status before processing the request. In case if TM is not > active on that thread, we should return -EINVAL. I think s390 returns ENODATA in that case. https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html We'll want some way to tell whether the system actually supports this. That could be ENODATA vs something-else (EINVAL or perhaps better EIO for "request is invalid"). s390 actually screwed that, though it got away because there's a bit in HWCAP to signal transactions support. See: https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html Are you adding something to HWCAP too? > > I am not familiar with the s390 side of code. But if we look at the > s390_tdb_get function it checks for (regs->int_code & 0x200) before > processing the request. Not sure what 0x200 signifies though. -- Pedro Alves ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/14/14 12:18, Michael Neuling wrote: > >> s390 actually screwed that, though it got away because >> there's a bit in HWCAP to signal transactions support. See: >> >> https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html >> >> Are you adding something to HWCAP too? > > Yes but it's in HWCAP2 That's fine. -- Pedro Alves ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
> s390 actually screwed that, though it got away because > there's a bit in HWCAP to signal transactions support. See: > > https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html > > Are you adding something to HWCAP too? Yes but it's in HWCAP2 Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
On Wed, May 14, 2014 at 03:22:19PM +0930, Alan Modra wrote: > On Tue, May 13, 2014 at 10:16:51PM -0700, Guenter Roeck wrote: > > any idea what might cause this one, by any chance ? > > > > arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': > > (.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI against > > symbol `interrupt_base_book3e' defined in .text section in > > arch/powerpc/kernel/built-in.o > > arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': > > (.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI against > > symbol `interrupt_end_book3e' defined in .text section in > > arch/powerpc/kernel/built-in.o > > arch/powerpc/kernel/built-in.o: In function `exc_debug_debug_book3e': > > > > I see this if I try to build powerpc:ppc64e_defconfig or > > powerpc:chroma_defconfig > > with gcc 4.8.2 and binutils 2.24. > > Blame me. I changed the ABI, something that had to be done but > unfortunately happens to break the booke kernel code. When building > up a 64-bit value with lis, ori, shl, oris, ori or similar sequences, > you now should use @high and @higha in place of @h and @ha. @h and > @ha (and their associated relocs R_PPC64_ADDR16_HI and > R_PPC64_ADDR16_HA) now report overflow if the value is out of 32-bit > signed range. ie. @h and @ha assume you're building a 32-bit value. > This is needed to report out-of-range -mcmodel=medium toc pointer > offsets in @toc@h and @toc@ha expressions, and for consistency I did > the same for all other @h and @ha relocs. > Bummer. Confirmed, if I replace "@h" with "@high" in just one place, the builds pass with binutils 2.24. Unfortunately the same builds then fails with binutils 2.23. Any idea how to get it to compile with both old and new versions ? Is there some predefined constant which I could possibly use for something like .if as_version_below_2.24_ orisreg,reg,(expr)@h; .else orisreg,reg,(expr)@high; .endif Thanks, Guenter ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: hwmon driver for power values, fan rpm and temperature
On Wed, May 14, 2014 at 11:31:53AM +0530, Neelesh Gupta wrote: > This patch adds basic kernel enablement for reading power values, fan > speed rpm and temperature values on powernv platforms which will > be exported to user space through sysfs interface. > > Test results: > - > [root@tul163p1 ~]# sensors > ibmpowernv-isa- > Adapter: ISA adapter > fan1:5294 RPM (min =0 RPM) > fan2:4945 RPM (min =0 RPM) > fan3:5831 RPM (min =0 RPM) > fan4:5212 RPM (min =0 RPM) > fan5: 0 RPM (min =0 RPM) > fan6: 0 RPM (min =0 RPM) > fan7:7472 RPM (min =0 RPM) > fan8:7920 RPM (min =0 RPM) > temp1:+39.0°C (high = +0.0°C) > power1: 192.00 W > > [root@tul163p1 ~]# > [root@tul163p1 ~]# ls /sys/devices/platform/ibmpowernv.0/ > driver fan2_minfan4_minfan6_minfan8_min modalias > uevent > fan1_fault fan3_fault fan5_fault fan7_fault hwmon name > fan1_input fan3_input fan5_input fan7_input in1_fault power1_input > fan1_minfan3_minfan5_minfan7_minin2_fault subsystem > fan2_fault fan4_fault fan6_fault fan8_fault in3_fault temp1_input > fan2_input fan4_input fan6_input fan8_input in4_fault temp1_max > [root@tul163p1 ~]# > [root@tul163p1 ~]# ls /sys/class/hwmon/hwmon0/device/ > driver fan2_minfan4_minfan6_minfan8_min modalias > uevent > fan1_fault fan3_fault fan5_fault fan7_fault hwmon name > fan1_input fan3_input fan5_input fan7_input in1_fault power1_input > fan1_minfan3_minfan5_minfan7_minin2_fault subsystem > fan2_fault fan4_fault fan6_fault fan8_fault in3_fault temp1_input > fan2_input fan4_input fan6_input fan8_input in4_fault temp1_max > [root@tul163p1 ~]# > > Signed-off-by: Shivaprasad G Bhat > Signed-off-by: Neelesh Gupta > --- > drivers/hwmon/Kconfig |8 + > drivers/hwmon/Makefile |1 > drivers/hwmon/ibmpowernv.c | 386 > > 3 files changed, 395 insertions(+) > create mode 100644 drivers/hwmon/ibmpowernv.c > > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig > index bc196f4..3e308fa 100644 > --- a/drivers/hwmon/Kconfig > +++ b/drivers/hwmon/Kconfig > @@ -554,6 +554,14 @@ config SENSORS_IBMPEX > This driver can also be built as a module. If so, the module > will be called ibmpex. > > +config SENSORS_IBMPOWERNV > + tristate "IBM POWERNV platform sensors" > + depends on PPC_POWERNV > + default y > + help > + If you say yes here you get support for the temperature/fan/power > + sensors on your platform. > + > config SENSORS_IIO_HWMON > tristate "Hwmon driver that uses channels specified via iio maps" > depends on IIO > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile > index c48f987..199c401 100644 > --- a/drivers/hwmon/Makefile > +++ b/drivers/hwmon/Makefile > @@ -71,6 +71,7 @@ obj-$(CONFIG_SENSORS_ULTRA45) += ultra45_env.o > obj-$(CONFIG_SENSORS_I5K_AMB)+= i5k_amb.o > obj-$(CONFIG_SENSORS_IBMAEM) += ibmaem.o > obj-$(CONFIG_SENSORS_IBMPEX) += ibmpex.o > +obj-$(CONFIG_SENSORS_IBMPOWERNV)+= ibmpowernv.o > obj-$(CONFIG_SENSORS_IIO_HWMON) += iio_hwmon.o > obj-$(CONFIG_SENSORS_INA209) += ina209.o > obj-$(CONFIG_SENSORS_INA2XX) += ina2xx.o > diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c > new file mode 100644 > index 000..e5cffce > --- /dev/null > +++ b/drivers/hwmon/ibmpowernv.c > @@ -0,0 +1,386 @@ > +/* > + * IBM PowerNV platform sensors for temperature/fan/power > + * Copyright (C) 2014 IBM > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Please drop the FSF address; it can change, and we don't want to update the driver each time it does. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#define DRVNAME "ibmpowernv" > +#define MAX_ATTR_LEN 32 > + > +/* Sensor suffix name from DT */ > +#define DT_FAULT_ATTR_SUFFIX "faulted" > +#define DT_DATA_ATTR_SUFFIX "data" > +#define DT_THRESHOLD_ATTR_SUFFIX "thrs" > + > +/* Enumerates all the sensors
Re: [PATCH v2] powerpc: Add cpu family documentation
On Wed, 30 Apr 2014, Scott Wood wrote: > On Wed, 2014-04-30 at 16:45 +1000, Michael Ellerman wrote: > > On Tue, 2014-02-04 at 16:43 -0600, Scott Wood wrote: > > > > +Motorola/Freescale 8xx > > > > +-- > > > > + > > > > + - Software loaded with hardware assist. > > > > + - All 32 bit > > > > + > > > > + +--+ > > > > + | 8xx | > > > > + +--+ > > > > + | > > > > + | > > > > + v > > > > + +--+ > > > > + | 850 | > > > > + +--+ > > > > > > Is the core of MPC850 different from other MPC8xx? > > > > Dunno, maybe someone who works at Freescale knows ;) > > I think they're the same -- I was just wondering if you had some > difference in mind that led you to single it out. They are the same. There should not be a separate box that singles out 850. (Still don't know why the diagram was drawn to single out 850 in the first place.) The CPU core should be called "MPC8xx Core". ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 02/27] powerpc: Override defaults from generic/tlb.h
Make sure to not conflict with the defaults provided by generic/tlb.h. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Richard Weinberger Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Richard Weinberger --- arch/powerpc/include/asm/pgalloc.h | 1 - arch/powerpc/include/asm/tlb.h | 4 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index e9a9f60..5fddba7 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -3,7 +3,6 @@ #ifdef __KERNEL__ #include -#include #ifdef CONFIG_PPC_BOOK3E extern void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address); diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index e2b428b..392e5ef 100644 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -30,6 +30,10 @@ extern void tlb_flush(struct mmu_gather *tlb); +/* These defines are needed to override the defaults from asm-generic/tlb.h */ +#define tlb_flush tlb_flush +#define __tlb_remove_tlb_entry __tlb_remove_tlb_entry + /* Get the generic bits... */ #include -- 1.8.4.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 20/27] powerpc: Use common bits from generic tlb.h
It is no longer needed to define them on our own. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Richard Weinberger Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Richard Weinberger --- arch/powerpc/include/asm/tlb.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index 392e5ef..bdea7f5 100644 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -25,9 +25,6 @@ #include -#define tlb_start_vma(tlb, vma)do { } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) - extern void tlb_flush(struct mmu_gather *tlb); /* These defines are needed to override the defaults from asm-generic/tlb.h */ -- 1.8.4.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
roundup_pow_of_two() may not handle 64-bit integers
Hi, I'm looking to use roundup_pow_of_two() (actually, order_base_2()) from , but it seems that it only supports 64-bit integers if your toolchain uses a 64-bit 'unsigned long' type. This is strange, considering that ilog2() is explicitly designed for 32-bit or 64-bit compatibility. I also note that there is at least one location in which this limitation currently might be problematic: in pnv_pci_ioda2_set_bypass() (arch/powerpc/platforms/powernv/pci-ioda.c). It looks like this could be a problem if using large amounts of DRAM on a 32-bit PPC build, with 64-bit physical addresses. (There may be other cases like this one, but I haven't closely studied all callers of roundup_pow_of_two().) I'm thinking of cooking a patch to improve roundup_pow_of_two() (and thus order_base_2()), but I'd like to solicit comments on the basic problem first. Regards, Brian P.S. And of course, rounddown_pow_of_two() has the same issue. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC v3 0/8] EEH Support for VFIO PCI device
Hi Gavin, Can error injection be done if EEH is not enbaled? Thanks Mike On 05/14/2014 12:11 PM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which are passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly. If QEMU can't handle it, the request will be sent to host via newly introduced VFIO container IOCTL command (VFIO_EEH_INFO) and gets handled in host kernel. - The error injection infrastructure need support request from the userland utility "errinjct" and PowerKVM based guest. The userland utility "errinjct" works on pSeries platform well with dedicated syscall, which helps invoking RTAS service to fulfil error injection in kernel. From the perspective, it's reasonable to extend the syscall to support PowerNV platform so that OPAL call can be invoked in host kernel for injecting errors. The data transported between userland and kerenl is still following "struct rtas_args" for both cases of PowerNV (OPAL) and pSeries (RTAS). The series of patches requires corresponding firmware changes from Mike Qiu to support error injection and QEMU changes to support EEH for guest. QEMU patchset will be sent separately. Change log == v1 -> v2: * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl. The mechanism KVM in-kernel handling is dropped. * Error injection is reimplemented based syscall, instead of KVM in-kerenl handling. The logic for error injection token management is moved to QEMU. The error injection request is routed to QEMU and then possiblly to host kernel. v2 -> v3: * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based on the comments from Alexey. * Define macros for EEH VFIO operations (Alexey). * Clear frozen state after successful PE reset. * Merge original [PATCH 1/2/3] to one. Testing on P7 = - Emulex adapter Testing on P8 = - Need more testing after design is finalized. - Gavin Shan (8): drivers/vfio: Introduce CONFIG_VFIO_EEH powerpc/eeh: Info to trace passed devices drivers/vfio: New IOCTL command VFIO_EEH_INFO powerpc/eeh: Avoid event on passed PE powerpc/powernv: Sync OPAL header file with firmware powerpc: Extend syscall ppc_rtas() powerpc/powernv: Implement ppc_call_opal() powerpc/powernv: Error injection infrastructure arch/powerpc/include/asm/eeh.h | 52 +++ arch/powerpc/include/asm/opal.h| 74 ++- arch/powerpc/include/asm/rtas.h| 10 +- arch/powerpc/include/asm/syscalls.h| 2 +- arch/powerpc/include/asm/systbl.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 2 +- arch/powerpc/kernel/eeh.c | 8 + arch/powerpc/kernel/eeh_pe.c | 80 arch/powerpc/kernel/rtas.c | 57 +-- arch/powerpc/kernel/syscalls.c | 50 +++ arch/powerpc/platforms/powernv/Makefile| 3 +- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +- arch/powerpc/platforms/powernv/eeh-vfio.c | 593 + arch/powerpc/platforms/powernv/errinject.c | 224 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 93 drivers/vfio/Kconfig | 6 + drivers/vfio/vfio_iommu_spapr_tce.c| 12 + include/uapi/linux/vfio.h | 57 +++ kernel/sys_ni.c| 2 +- 20 files changed, 1278 insertions(+), 53 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c create mode 100644 arch/powerpc/platforms/powernv/errinject.c Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev