Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/5/21 22:45, Alexander Gordeev 写道: On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote: The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 Hi Mike, I am curious if pSeries firmware allows changing affinity masks independently for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21 and IRQ22 to different CPUs? Yes, as Ben says, this is very different from other firmware :) Thanks Mike Thanks! LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/5/22 8:15, Benjamin Herrenschmidt 写道: On Tue, 2013-05-21 at 16:45 +0200, Alexander Gordeev wrote: On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote: The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 Hi Mike, I am curious if pSeries firmware allows changing affinity masks independently for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21 and IRQ22 to different CPUs? Yes. Each interrupt has its own affinity, whether it's an MSI or not, the affinity is not driven by the address. Cheers, Ben. Hi Ben, May this patch be accepted? if so I will send out the 3.9 version. As Michael Ellerman says, he want to see the performance data, but this depends on the driver. It is something like MSI, and the driver can use more than 1 MSI. That is to say, the driver has more interrupt resource to use, but whether the driver is full use of the resource, is out of this patch's control. I test this patch use ipr driver, which add multiple MSI support by others. and it can work. Thanks Mike Thanks! LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] powerpc: Force 32 bit MSIs on systems lacking firmware support
于 2013/5/22 5:54, Brian King 写道: Recent commit e61133dda480062d221f09e4fc18f66763f8ecd0 added support for a new firmware feature to force an adapter to use 32 bit MSIs. However, this firmware is not available for all systems. The hack below allows devices needing 32 bit MSIs to work on these systems as well. It is careful to only enable this on Gen2 slots, which should limit this to configurations where this hack is needed and tested to work. Signed-off-by: Brian King --- arch/powerpc/platforms/pseries/msi.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff -puN arch/powerpc/platforms/pseries/msi.c~powerpc_32bit_msi_hack_on_papr arch/powerpc/platforms/pseries/msi.c --- linux/arch/powerpc/platforms/pseries/msi.c~powerpc_32bit_msi_hack_on_papr 2013-05-15 10:44:46.0 -0500 +++ linux-bjking1/arch/powerpc/platforms/pseries/msi.c 2013-05-20 15:24:52.0 -0500 @@ -397,10 +397,11 @@ static int check_msix_entries(struct pci static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) { struct pci_dn *pdn; - int hwirq, virq, i, rc; + int hwirq, virq, i, rc = -1; struct msi_desc *entry; struct msi_msg msg; int nvec = nvec_in; + int use_32bit_msi_hack = 0; pdn = get_pdn(pdev); if (!pdn) @@ -428,15 +429,37 @@ static int rtas_setup_msi_irqs(struct pc */ again: if (type == PCI_CAP_ID_MSI) { - if (pdn->force_32bit_msi) + if (pdn->force_32bit_msi) { rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec); - else + if (rc < 0) { + /* We only want to run the 32 bit MSI hack below if +the max bus speed is Gen2 speed. */ + if (pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT) + return rc; + + use_32bit_msi_hack = 1; + } + } + + if (rc < 0) rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec); - if (rc < 0 && !pdn->force_32bit_msi) { + if (rc < 0) { pr_debug("rtas_msi: trying the old firmware call.\n"); rc = rtas_change_msi(pdn, RTAS_CHANGE_FN, nvec); } + + if (use_32bit_msi_hack && rc > 0) { + int pos; + u32 addr_hi, addr_lo; + + dev_info(&pdev->dev, "rtas_msi: No 32 bit MSI firmware support, forcing 32 bit MSI\n"); + pos = pci_find_capability(pdev, PCI_CAP_ID_MSI); + pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI, &addr_hi); + addr_lo = 0x | ((addr_hi >> (48 - 32)) << 4); + pci_write_config_dword(pdev, pos + PCI_MSI_ADDRESS_LO, addr_lo); + pci_write_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI, 0); I think here you can use catched dev->msi_cap for better. Thanks Mike + } } else rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec); _ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 08/10] irqdomain: Refactor irq_domain_associate_many()
于 2013/6/10 8:49, Grant Likely 写道: Originally, irq_domain_associate_many() was designed to unwind the mapped irqs on a failure of any individual association. However, that proved to be a problem with certain IRQ controllers. Some of them only support a subset of irqs, and will fail when attempting to map a reserved IRQ. In those cases we want to map as many IRQs as possible, so instead it is better for irq_domain_associate_many() to make a best-effort attempt to map irqs, but not fail if any or all of them don't succeed. If a caller really cares about how many irqs got associated, then it should instead go back and check that all of the irqs is cares about were mapped. The original design open-coded the individual association code into the body of irq_domain_associate_many(), but with no longer needing to unwind associations, the code becomes simpler to split out irq_domain_associate() to contain the bulk of the logic, and irq_domain_associate_many() to be a simple loop wrapper. This patch also adds a new error check to the associate path to make sure it isn't called for an irq larger than the controller can handle, and adds locking so that the irq_domain_mutex is held while setting up a new association. Signed-off-by: Grant Likely --- include/linux/irqdomain.h | 22 +++--- kernel/irq/irqdomain.c| 185 +++--- 2 files changed, 101 insertions(+), 106 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index fd4b26f..f9e8e06 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -103,6 +103,7 @@ struct irq_domain { struct irq_domain_chip_generic *gc; /* reverse map data. The linear map gets appended to the irq_domain */ + irq_hw_number_t hwirq_max; unsigned int revmap_direct_max_irq; unsigned int revmap_size; struct radix_tree_root revmap_tree; @@ -110,8 +111,8 @@ struct irq_domain { }; #ifdef CONFIG_IRQ_DOMAIN -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain *__irq_domain_add(struct device_node *of_node, int size, + irq_hw_number_t hwirq_max, int direct_max, const struct irq_domain_ops *ops, void *host_data); struct irq_domain *irq_domain_add_simple(struct device_node *of_node, @@ -140,14 +141,14 @@ static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_no const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, size, 0, ops, host_data); + return __irq_domain_add(of_node, size, size, 0, ops, host_data); } static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node, unsigned int max_irq, const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, 0, max_irq, ops, host_data); + return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data); } static inline struct irq_domain *irq_domain_add_legacy_isa( struct device_node *of_node, @@ -166,14 +167,11 @@ static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node extern void irq_domain_remove(struct irq_domain *host); -extern int irq_domain_associate_many(struct irq_domain *domain, -unsigned int irq_base, -irq_hw_number_t hwirq_base, int count); -static inline int irq_domain_associate(struct irq_domain *domain, unsigned int irq, - irq_hw_number_t hwirq) -{ - return irq_domain_associate_many(domain, irq, hwirq, 1); -} +extern int irq_domain_associate(struct irq_domain *domain, unsigned int irq, + irq_hw_number_t hwirq); +extern void irq_domain_associate_many(struct irq_domain *domain, + unsigned int irq_base, + irq_hw_number_t hwirq_base, int count); extern unsigned int irq_create_mapping(struct irq_domain *host, irq_hw_number_t hwirq); diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 280b804..80e9249 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -35,8 +35,8 @@ static struct irq_domain *irq_default_domain; * register allocated irq_domain with irq_domain_register(). Returns pointer * to IRQ domain, or NULL on failure. */ -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain *__irq_domain_add(struct devi
Re: [PATCH 05/31] powerpc/eeh: Trace PCI bus from PE
于 2013/6/18 16:33, Gavin Shan 写道: There're several types of PEs can be supported for now: PHB, Bus and Device dependent PE. For PCI bus dependent PE, tracing the corresponding PCI bus from PE (struct eeh_pe) would make the code more efficient. The patch also enables the retrieval of PCI bus based on the PCI bus dependent PE. Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h |1 + arch/powerpc/kernel/eeh_pe.c | 22 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index acdfcaa..f3b49d6 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -59,6 +59,7 @@ struct eeh_pe { int config_addr;/* Traditional PCI address */ int addr; /* PE configuration address */ struct pci_controller *phb; /* Associated PHB */ + struct pci_bus *bus;/* Top PCI bus for bus PE */ int check_count;/* Times of ignored error */ int freeze_count; /* Times of froze up*/ int false_positives;/* Times of reported #ff's */ diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 3d2dcf5..5bd1637 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -304,6 +304,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev) int eeh_add_to_parent_pe(struct eeh_dev *edev) { struct eeh_pe *pe, *parent; + struct eeh_dev *first_edev; eeh_lock(); @@ -326,6 +327,21 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) pe->type = EEH_PE_BUS; edev->pe = pe; + /* +* For PCI bus sensitive PE, we can reset the parent +* bridge in order for hot-reset. However, the PCI +* devices including the associated EEH devices might +* be removed when EEH core is doing recovery. So that +* won't safe to retrieve the bridge through downstream +* EEH device. We have to trace the parent PCI bus, then +* the parent bridge explicitly. +*/ + if (eeh_probe_mode_dev() && !pe->bus) { + first_edev = list_first_entry(&pe->edevs, + struct eeh_dev, list); + pe->bus = eeh_dev_to_pci_dev(first_edev)->bus; + } Hi Gavin I have qestion, can we keep pe->bus for a device pe ? the value is the bus which edev belongs to. so that we can make the code more efficient for device pe. I have no idea of whether this will cause side effect Thanks Mike + /* Put the edev to PE */ list_add_tail(&edev->list, &pe->edevs); eeh_unlock(); @@ -641,12 +657,18 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe) bus = pe->phb->bus; } else if (pe->type & EEH_PE_BUS || pe->type & EEH_PE_DEVICE) { + if (pe->bus) { + bus = pe->bus; + goto out; + } + edev = list_first_entry(&pe->edevs, struct eeh_dev, list); pdev = eeh_dev_to_pci_dev(edev); if (pdev) bus = pdev->bus; } +out: eeh_unlock(); return bus; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/eeh: Fix undefined variable
'pe_no' hasn't been defined, it should be an typo error, it should be 'frozen_pe_no'. Also '__func__' should be added to IODA_EEH_DBG(), Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 0cd1c4a..a49bee7 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -843,7 +843,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) * specific PHB. */ IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", - err_type, severity, pe_no, hose->global_number); + __func__, err_type, severity, + frozen_pe_no, hose->global_number); switch (err_type) { case OPAL_EEH_IOC_ERROR: if (severity == OPAL_EEH_SEV_IOC_DEAD) { -- 1.8.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/eeh: Add procfs entry for PowerNV
The procfs entry for global statistics has been missed on PowerNV platform and the patch is going to add that. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 39954fe..0e12bb1 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1063,7 +1063,7 @@ static const struct file_operations proc_eeh_operations = { static int __init eeh_init_proc(void) { - if (machine_is(pseries)) + if (machine_is(pseries) || machine_is(powernv)) proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations); return 0; } -- 1.8.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/eeh: Fix undefined variable
于 2013/8/7 13:25, Gavin Shan 写道: On Wed, Aug 07, 2013 at 03:11:24PM +1000, Michael Ellerman wrote: On Tue, Aug 06, 2013 at 10:24:46PM -0400, Mike Qiu wrote: 'pe_no' hasn't been defined, it should be an typo error, it should be 'frozen_pe_no'. Also '__func__' should be added to IODA_EEH_DBG(), Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 0cd1c4a..a49bee7 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -843,7 +843,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) * specific PHB. */ IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", - err_type, severity, pe_no, hose->global_number); + __func__, err_type, severity, + frozen_pe_no, hose->global_number); Why is it using a custom macro? If you use pr_devel() or similar you avoid these bugs, because the argument list is always expanded. The custom macro at least can save some CPU cycles, but that's not safe as you mentioned. It's resonable to use pr_devel() here. Mike, could you help to replace IODA_EEH_DBG() with pr_devel() as Michael suggested? OK, I will change the patch in V2 Thanks Mike Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/eeh: powerpc/eeh: Fix undefined variable
'pe_no' hasn't been defined, it should be an typo error, it should be 'frozen_pe_no'. Also '__func__' has missed in IODA_EEH_DBG(), For safety reasons, use pr_info() directly, instead of use IODA_EEH_DBG() Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 0cd1c4a..8bc19c8 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -36,13 +36,6 @@ #include "powernv.h" #include "pci.h" -/* Debugging option */ -#ifdef IODA_EEH_DBG_ON -#define IODA_EEH_DBG(args...) pr_info(args) -#else -#define IODA_EEH_DBG(args...) -#endif - static char *hub_diag = NULL; static int ioda_eeh_nb_init = 0; @@ -823,17 +816,17 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) /* If OPAL API returns error, we needn't proceed */ if (rc != OPAL_SUCCESS) { - IODA_EEH_DBG("%s: Invalid return value on " -"PHB#%x (0x%lx) from opal_pci_next_error", -__func__, hose->global_number, rc); + pr_info("%s: Invalid return value on " + "PHB#%x (0x%lx) from opal_pci_next_error", + __func__, hose->global_number, rc); continue; } /* If the PHB doesn't have error, stop processing */ if (err_type == OPAL_EEH_NO_ERROR || severity == OPAL_EEH_SEV_NO_ERROR) { - IODA_EEH_DBG("%s: No error found on PHB#%x\n", -__func__, hose->global_number); + pr_info("%s: No error found on PHB#%x\n", + __func__, hose->global_number); continue; } @@ -842,8 +835,9 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) * highest priority reported upon multiple errors on the * specific PHB. */ - IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", - err_type, severity, pe_no, hose->global_number); + pr_info("%s: Error (%d, %d, %d) on PHB#%x\n", + __func__, err_type, severity, + frozen_pe_no, hose->global_number); switch (err_type) { case OPAL_EEH_IOC_ERROR: if (severity == OPAL_EEH_SEV_IOC_DEAD) { -- 1.8.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] powerpc/eeh: powerpc/eeh: Fix undefined variable
'pe_no' hasn't been defined, it should be an typo error, it should be 'frozen_pe_no'. Also '__func__' has missed in IODA_EEH_DBG(), For safety reasons, use pr_devel() directly, instead of use IODA_EEH_DBG() Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 0cd1c4a..88d99ba 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -36,13 +36,6 @@ #include "powernv.h" #include "pci.h" -/* Debugging option */ -#ifdef IODA_EEH_DBG_ON -#define IODA_EEH_DBG(args...) pr_info(args) -#else -#define IODA_EEH_DBG(args...) -#endif - static char *hub_diag = NULL; static int ioda_eeh_nb_init = 0; @@ -823,17 +816,17 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) /* If OPAL API returns error, we needn't proceed */ if (rc != OPAL_SUCCESS) { - IODA_EEH_DBG("%s: Invalid return value on " -"PHB#%x (0x%lx) from opal_pci_next_error", -__func__, hose->global_number, rc); + pr_devel("%s: Invalid return value on " + "PHB#%x (0x%lx) from opal_pci_next_error", + __func__, hose->global_number, rc); continue; } /* If the PHB doesn't have error, stop processing */ if (err_type == OPAL_EEH_NO_ERROR || severity == OPAL_EEH_SEV_NO_ERROR) { - IODA_EEH_DBG("%s: No error found on PHB#%x\n", -__func__, hose->global_number); + pr_devel("%s: No error found on PHB#%x\n", + __func__, hose->global_number); continue; } @@ -842,8 +835,9 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) * highest priority reported upon multiple errors on the * specific PHB. */ - IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", - err_type, severity, pe_no, hose->global_number); + pr_devel("%s: Error (%d, %d, %d) on PHB#%x\n", + __func__, err_type, severity, + frozen_pe_no, hose->global_number); switch (err_type) { case OPAL_EEH_IOC_ERROR: if (severity == OPAL_EEH_SEV_IOC_DEAD) { -- 1.8.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4] powerpc/eeh: powerpc/eeh: Fix undefined variable
changes for V4: - changes the type of frozen_pe_no from %d to %llu in pr_devel() 'pe_no' hasn't been defined, it should be an typo error, it should be 'frozen_pe_no'. Also '__func__' has missed in IODA_EEH_DBG(), For safety reasons, use pr_devel() directly, instead of use IODA_EEH_DBG() Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 0cd1c4a..cf42e74 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -36,13 +36,6 @@ #include "powernv.h" #include "pci.h" -/* Debugging option */ -#ifdef IODA_EEH_DBG_ON -#define IODA_EEH_DBG(args...) pr_info(args) -#else -#define IODA_EEH_DBG(args...) -#endif - static char *hub_diag = NULL; static int ioda_eeh_nb_init = 0; @@ -823,17 +816,17 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) /* If OPAL API returns error, we needn't proceed */ if (rc != OPAL_SUCCESS) { - IODA_EEH_DBG("%s: Invalid return value on " -"PHB#%x (0x%lx) from opal_pci_next_error", -__func__, hose->global_number, rc); + pr_devel("%s: Invalid return value on " +"PHB#%x (0x%lx) from opal_pci_next_error", +__func__, hose->global_number, rc); continue; } /* If the PHB doesn't have error, stop processing */ if (err_type == OPAL_EEH_NO_ERROR || severity == OPAL_EEH_SEV_NO_ERROR) { - IODA_EEH_DBG("%s: No error found on PHB#%x\n", -__func__, hose->global_number); + pr_devel("%s: No error found on PHB#%x\n", +__func__, hose->global_number); continue; } @@ -842,8 +835,9 @@ static int ioda_eeh_next_error(struct eeh_pe **pe) * highest priority reported upon multiple errors on the * specific PHB. */ - IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", - err_type, severity, pe_no, hose->global_number); + pr_devel("%s: Error (%d, %d, %llu) on PHB#%x\n", +__func__, err_type, severity, +frozen_pe_no, hose->global_number); switch (err_type) { case OPAL_EEH_IOC_ERROR: if (severity == OPAL_EEH_SEV_IOC_DEAD) { -- 1.8.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/3] Enable multiple MSI feature in pSeries
Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong : [PATCH 0/7] Add support for new IBM SAS controllers Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Mike Qiu (3): irq: Set multiple MSI descriptor data for multiple IRQs irq: Add hw continuous IRQs map to virtual continuous IRQs support powerpc/pci: Enable pSeries multiple MSI feature arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - include/linux/irq.h |4 ++ include/linux/irqdomain.h|3 ++ kernel/irq/chip.c| 40 - kernel/irq/irqdomain.c | 61 + 6 files changed, 158 insertions(+), 16 deletions(-) -- 1.7.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node)\ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) void irq_free_descs(unsigned int irq, unsigned int cnt); int irq_reserve_irqs(unsigned int from, unsigned int cnt); diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map + * + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq_base); + + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n" + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug("-> using domain @%p\n", domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); + + /* Check if mapping already exists */ + for (i = 0; i < count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug("existing mapping on virq %d," + " now dispose it first\n", virq); + irq_dispose_mapping(virq); + } + } + + /* Allocate the continuous virtual interrupt numbers */ + irq_base = irq_alloc_desc_n(count, of_node_to_nid(domain->of_node)); + if (unlikely(irq_base < 0)) + return irq_base; + + ret = irq_domain_associate_many(domain, irq_base, hwirq_base, count); + if (unlikely(ret < 0)) { + irq_free_descs(irq_base, count); + return ret; + } + + return irq_base; +} +EXPORT_SYMBOL_GPL(irq_create_mapping_many); + unsigned int irq_create_of_mapping(struct device_node *controller, const u32 *intspec, unsigned int intsize) { -- 1.7.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] powerpc/pci: Enable pSeries multiple MSI feature
PCI devices support MSI, MSIX as well as multiple MSI. But pSeries does not support multiple MSI yet. This patch enable multiple MSI feature in pSeries. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - 2 files changed, 60 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c index 8bbc12d..46b1470 100644 --- a/arch/powerpc/kernel/msi.c +++ b/arch/powerpc/kernel/msi.c @@ -20,10 +20,6 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type) return -ENOSYS; } - /* PowerPC doesn't support multiple MSI yet */ - if (type == PCI_CAP_ID_MSI && nvec > 1) - return 1; - if (ppc_md.msi_check_device) { pr_debug("msi: Using platform check routine.\n"); return ppc_md.msi_check_device(dev, nvec, type); diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index e5b0847..6633b18 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -132,13 +132,17 @@ static int rtas_query_irq_number(struct pci_dn *pdn, int offset) static void rtas_teardown_msi_irqs(struct pci_dev *pdev) { struct msi_desc *entry; + int nvec, i; list_for_each_entry(entry, &pdev->msi_list, list) { if (entry->irq == NO_IRQ) continue; irq_set_msi_desc(entry->irq, NULL); - irq_dispose_mapping(entry->irq); + nvec = entry->msi_attrib.is_msix ? 1 : 1 << + entry->msi_attrib.multiple; + for (i = 0; i < nvec; i++) + irq_dispose_mapping(entry->irq + i); } rtas_disable_msi(pdev); @@ -392,6 +396,55 @@ static int check_msix_entries(struct pci_dev *pdev) return 0; } +static int setup_multiple_msi_irqs(struct pci_dev *pdev, int nvec) +{ + struct pci_dn *pdn; + int hwirq, virq_base, i, hwirq_base = 0; + struct msi_desc *entry; + struct msi_msg msg; + + pdn = get_pdn(pdev); + entry = list_entry(pdev->msi_list.next, typeof(*entry), list); + + /* +* Get the hardware IRQ base and ensure the retrieved +* hardware IRQs are continuous +*/ + for (i = 0; i < nvec; i++) { + hwirq = rtas_query_irq_number(pdn, i); + if (i == 0) + hwirq_base = hwirq; + + if (hwirq < 0 || hwirq != (hwirq_base + i)) { + pr_debug("rtas_msi: Failure to get %d IRQs on" + "PCI device %04x:%02x:%02x.%01x\n", nvec, + pci_domain_nr(pdev->bus), pdev->bus->number, + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); + return hwirq; + } + } + + virq_base = irq_create_mapping_many(NULL, hwirq_base, nvec); + if (virq_base <= 0) { + pr_debug("rtas_msi: Failure to map IRQs (%d, %d) " + "for PCI device %04x:%02x:%02x.%01x\n", + hwirq_base, nvec, pci_domain_nr(pdev->bus), + pdev->bus->number, PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn)); + return -ENOSPC; + } + + entry->msi_attrib.multiple = ilog2(nvec & 0x3f); + irq_set_multiple_msi_desc(virq_base, nvec, entry); + for (i = 0; i < nvec; i++) { + /* Read config space back so we can restore after reset */ + read_msi_msg(virq_base + i, &msg); + entry->msg = msg; + } + + return 0; +} + static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) { struct pci_dn *pdn; @@ -444,11 +497,16 @@ again: return rc; } + if (type == PCI_CAP_ID_MSI && nvec > 1) { + rc = setup_multiple_msi_irqs(pdev, nvec); + return rc; + } + i = 0; list_for_each_entry(entry, &pdev->msi_list, list) { hwirq = rtas_query_irq_number(pdn, i++); if (hwirq < 0) { - pr_debug("rtas_msi: error (%d) getting hwirq\n", rc); + pr_debug("rtas_msi: error (%d) getting hwirq\n", nvec); return hwirq; } -- 1.7.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] irq: Set multiple MSI descriptor data for multiple IRQs
Multiple MSI only requires the IRQ in msi_desc entry to be set as the value of irq_base. This patch implements the above mentioned technique. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 ++ kernel/irq/chip.c | 40 ++-- 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index fdf2c4a..60ef45b 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -528,6 +528,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry); +extern int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry); extern struct irq_data *irq_get_irq_data(unsigned int irq); static inline struct irq_chip *irq_get_chip(unsigned int irq) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 3aca9f2..c4c39d3 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -90,6 +90,35 @@ int irq_set_handler_data(unsigned int irq, void *data) EXPORT_SYMBOL(irq_set_handler_data); /** + * irq_set_multiple_msi_desc - set Multiple MSI descriptor data + * for multiple IRQs + * @irq_base: Interrupt number base + * @nvec: The number of interrupts + * @entry: Pointer to MSI descriptor data + * + * Set IRQ descriptors for multiple MSIs + */ +int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry) +{ + unsigned long flags, i; + struct irq_desc *desc; + + for (i = 0; i < nvec; i++) { + desc = irq_get_desc_lock(irq_base + i, &flags, + IRQ_GET_DESC_CHECK_GLOBAL); + if (!desc) + return -EINVAL; + desc->irq_data.msi_desc = entry; + if (i == 0 && entry) + entry->irq = irq_base; + irq_put_desc_unlock(desc, flags); + } + + return 0; +} + +/** * irq_set_msi_desc - set MSI descriptor data for an irq * @irq: Interrupt number * @entry: Pointer to MSI descriptor data @@ -98,16 +127,7 @@ EXPORT_SYMBOL(irq_set_handler_data); */ int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry) { - unsigned long flags; - struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL); - - if (!desc) - return -EINVAL; - desc->irq_data.msi_desc = entry; - if (entry) - entry->irq = irq; - irq_put_desc_unlock(desc, flags); - return 0; + return irq_set_multiple_msi_desc(irq, 1, entry); } /** -- 1.7.7.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm: Fix hash computation function
With the fix, the machine can boot up successfully Tested-by: Mike Qiu 于 2013/1/30 13:40, Aneesh Kumar K.V 写道: > From: "Aneesh Kumar K.V" > > The ASM version of hash computation function was truncating the upper bit. > Make the ASM version similar to hpt_hash function. Remove masking vsid bits. > Without this patch, we observed hang during bootup due to not satisfying page > fault request correctly. The fault handler used wrong hash values to update > the HPTE. Hence we kept looping with page fault. > > hash_page(ea=01003e260008, access=203, trap=300 ip=3fff91787134 dsisr > 4200 > The computed value of hash 0f22f390 > update: avpnv=4003e46054003e00, hash=0722f390, f=80000006, psize: 2 > ... > > Reported-by: Mike Qiu > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/hash_low_64.S | 62 > +++-- > 1 file changed, 35 insertions(+), 27 deletions(-) > > diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S > index 5658508..7443481 100644 > --- a/arch/powerpc/mm/hash_low_64.S > +++ b/arch/powerpc/mm/hash_low_64.S > @@ -115,11 +115,13 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) > sldir29,r5,SID_SHIFT - VPN_SHIFT > rldicl r28,r3,64 - VPN_SHIFT,64 - (SID_SHIFT - VPN_SHIFT) > or r29,r28,r29 > - > - /* Calculate hash value for primary slot and store it in r28 */ > - rldicl r5,r5,0,25 /* vsid & 0x007f */ > - rldicl r0,r3,64-12,48 /* (ea >> 12) & 0x */ > - xor r28,r5,r0 > + /* > + * Calculate hash value for primary slot and store it in r28 > + * r3 = va, r5 = vsid > + * r0 = (va >> 12) & ((1ul << (28 - 12)) -1) > + */ > + rldicl r0,r3,64-12,48 > + xor r28,r5,r0 /* hash */ > b 4f > > 3: /* Calc vpn and put it in r29 */ > @@ -130,11 +132,12 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) > /* >* calculate hash value for primary slot and >* store it in r28 for 1T segment > + * r3 = va, r5 = vsid >*/ > - rldic r28,r5,25,25/* (vsid << 25) & 0x7f */ > - clrldi r5,r5,40/* vsid & 0xff */ > - rldicl r0,r3,64-12,36 /* (ea >> 12) & 0xfff */ > - xor r28,r28,r5 > + sldir28,r5,25 /* vsid << 25 */ > + /* r0 = (va >> 12) & ((1ul << (40 - 12)) -1) */ > + rldicl r0,r3,64-12,36 > + xor r28,r28,r5 /* vsid ^ ( vsid << 25) */ > xor r28,r28,r0 /* hash */ > > /* Convert linux PTE bits into HW equivalents */ > @@ -407,11 +410,13 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) >*/ > rldicl r28,r3,64 - VPN_SHIFT,64 - (SID_SHIFT - VPN_SHIFT) > or r29,r28,r29 > - > - /* Calculate hash value for primary slot and store it in r28 */ > - rldicl r5,r5,0,25 /* vsid & 0x007f */ > - rldicl r0,r3,64-12,48 /* (ea >> 12) & 0x */ > - xor r28,r5,r0 > + /* > + * Calculate hash value for primary slot and store it in r28 > + * r3 = va, r5 = vsid > + * r0 = (va >> 12) & ((1ul << (28 - 12)) -1) > + */ > + rldicl r0,r3,64-12,48 > + xor r28,r5,r0 /* hash */ > b 4f > > 3: /* Calc vpn and put it in r29 */ > @@ -426,11 +431,12 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) > /* >* Calculate hash value for primary slot and >* store it in r28 for 1T segment > + * r3 = va, r5 = vsid >*/ > - rldic r28,r5,25,25/* (vsid << 25) & 0x7f */ > - clrldi r5,r5,40/* vsid & 0xff */ > - rldicl r0,r3,64-12,36 /* (ea >> 12) & 0xfff */ > - xor r28,r28,r5 > + sldir28,r5,25 /* vsid << 25 */ > + /* r0 = (va >> 12) & ((1ul << (40 - 12)) -1) */ > + rldicl r0,r3,64-12,36 > + xor r28,r28,r5 /* vsid ^ ( vsid << 25) */ > xor r28,r28,r0 /* hash */ > > /* Convert linux PTE bits into HW equivalents */ > @@ -752,25 +758,27 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) > rldicl r28,r3,64 - VPN_SHIFT,64 - (SID_SHIFT - VPN_SHIFT) > or r29,r28,r29 > > - /* Calculate hash value for primary slot and store it in r28 */ > - rldicl r5,r5,0,25 /* vsid & 0x007f */ > - rldicl r0,r3,64-
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote: Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. Hi Mike, These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong : So who wrote these patches? Normally we would expect the original author to post the patches if at all possible. Hi Michael These Multiple MSI patches were wrote by myself, you know this feature has not enabled and it need device driver to test whether it works suitable. So I test my patches use Wen Xiong's ipr patches, which has been send out to the maillinglist. I'm the**original author :) [PATCH 0/7] Add support for new IBM SAS controllers I would like to see the full series, including the driver enablement. Yep, but the driver patches were wrote by Wen Xiong and has been send out. I just use her patches to test my patches. all device support Multiple MSI can use my feature not only IBM SAS controllers, I also test my patches use the broadcom wireless card tg3, and also works OK. Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 This shows that you are correctly configuring two MSIs. But the key advantage of using multiple interrupts is to distribute load across CPUs and improve performance. So I would like to see some performance numbers that show that there is a real benefit for all the extra complexity in the code. Yes, the system just has suport two MSIs. Anyway, I will try to do some proformance test, to show the real benefit. But actually it needs the driver to do so. As the data show above, it seems there is some problems in use the interrupt, the irq 21 use few, most use 22, I will discuss with the driver author to see why and if she fixed, I will give out the proformance result. Thanks Mike cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
2013/2/4 13:56, Michael Ellerman: On Mon, 2013-02-04 at 11:49 +0800, Mike Qiu wrote: On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote: Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. Hi Mike, These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong : So who wrote these patches? Normally we would expect the original author to post the patches if at all possible. Hi Michael These Multiple MSI patches were wrote by myself, you know this feature has not enabled and it need device driver to test whether it works suitable. So I test my patches use Wen Xiong's ipr patches, which has been send out to the maillinglist. I'm the original author :) Ah OK, sorry, that was more or less clear from your mail but I just misunderstood. [PATCH 0/7] Add support for new IBM SAS controllers I would like to see the full series, including the driver enablement. Yep, but the driver patches were wrote by Wen Xiong and has been send out. OK, you mean this series? http://thread.gmane.org/gmane.linux.scsi/79639 Yes, exactly. I just use her patches to test my patches. all device support Multiple MSI can use my feature not only IBM SAS controllers, I also test my patches use the broadcom wireless card tg3, and also works OK. You mean drivers/net/ethernet/broadcom/tg3.c ? I don't see where it calls pci_enable_msi_block() ? Yes, I just modify the driver to support mutiple MSI. All devices /can/ use it, but the driver needs to be updated. Currently we have two drivers that do so (in Linus' tree), plus the updated IPR. Not all devices, just the device which support the multiple MSI by hardware, can use it Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 This shows that you are correctly configuring two MSIs. But the key advantage of using multiple interrupts is to distribute load across CPUs and improve performance. So I would like to see some performance numbers that show that there is a real benefit for all the extra complexity in the code. Yes, the system just has suport two MSIs. Anyway, I will try to do some proformance test, to show the real benefit. But actually it needs the driver to do so. As the data show above, it seems there is some problems in use the interrupt, the irq 21 use few, most use 22, I will discuss with the driver author to see why and if she fixed, I will give out the proformance result. Yeah that would be good. I really dislike that we have a separate API for multi-MSI vs MSI-X, and pci_enable_msi_block() also pushes the contiguous power-of-2 allocation into the irq domain layer, which is unpleasant. So if we really must do multi-MSI I would like to do it differently. Yes, but the multi-MSI must need the hardware support, it is one extend for MSI, The device may sopport MSI and multiple MSI, but not support MSI-X. for these devices, we'd better use multiple MSI to makes it more efficiency, compare with MSI. multi-MSI just can use no more than 32 interrupts Thanks cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/3/1 11:54, Michael Ellerman 写道: On Fri, Mar 01, 2013 at 11:08:45AM +0800, Mike wrote: Hi all Any comments? or any questions about my patchset? You were going to get some performance numbers that show a definite benefit for using more than one MSI. Yes, but my patch just enable the kernel to support this feature, whether to use it depens on the device driver. And this feature has been merged to the kernel for X86 for a long time. See commit: 5ca72c4f7c412c2002363218901eba5516c476b1 51906e779f2b13b38f8153774c4c7163d412ffd9 Actually, I'm trying to do the test. but it is difficult to do that test, because it mostly depends on how the device driver to use this feature, while the ipr driver patch was wrote by another person. also no any reply from her. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) This has been superseeded by irq_alloc_descs_from(), which is the right way to do it. Yes, but irq_alloc_descs_from() just for 1 irq, and if I change the api, maybe a lot places which call this function will be affact. diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. This comment doesn't make sense to me. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq_base); I'd like to see this whole function rewritten to reduce the duplication vs irq_create_mapping(). I don't see any reason why this can't be the core routine, and irq_create_mapping() becomes a caller of it, passing a count of 1 ? It's good suggestion. + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n" + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug("-> using domain @%p\n", domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); The above doesn't work. Why it doesn't work ? + /* Check if mapping already exists */ + for (i = 0; i < count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug("existing mapping on virq %d," + " now dispose it first\n", virq); + irq_dispose_mapping(virq); You might have just disposed of someone elses mapping, we shouldn't do that. It should be an error to the caller. It's a good question. If the interrupt used for someone elses, why I can
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:41, Paul Mundt 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ Other than the other review comments already made, I think you can simplify this considerably by simply doing what irq_create_strict_mappings() does, and relaxing the irq_base requirements. In any event, as you are creating a new interface, I don't think you want to carry around half of the legacy crap that irq_create_mapping() has to deal with. We made the decision to avoid this with irq_create_strict_mappings() intentionally, too. Oh, yes, you are right, I will send out V2 of my patch to make it more comfortable , and hope you can review my patch again Thanks Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/6 11:54, Michael Ellerman 写道: On Tue, Mar 05, 2013 at 03:19:57PM +0800, Mike Qiu wrote: 于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) This has been superseeded by irq_alloc_descs_from(), which is the right way to do it. Yes, but irq_alloc_descs_from() just for 1 irq No it's not, look again. #define irq_alloc_descs_from(from, cnt, node) \ irq_alloc_descs(-1, from, cnt, node) Sorry, I see as irq_alloc_desc_from(from, node) you are right diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify You're confusing hardware interrupt numbers and virtual interrupt numbers. My comment is about irq_create_mapping_many(), which returns virtual interrupt numbers. As I said I don't think there is a requirement that the virtual interrupt numbers are also a power-of-2 naturally aligned block, but we should allocate them as one anyway, to avoid any issues in future. But for virtual interrupt numbersit should be a power-of-2 naturally aligned block, because it must be continuous, as the MSI-HOWTO.txt says: 4.2.2 pci_enable_msi_block int pci_enable_msi_block(struct pci_dev *dev, int count) This variation on the above call allows a device driver to request multiple MSIs. The MSI specification only allows interrupts to be allocated in powers of two, up to a maximum of 2^5 (32). If this function returns 0, it has succeeded in allocating at least as many interrupts as the driver requested (it may have allocated more in order to satisfy the power-of-two requirement). In this case, the function enables MSI on this device and updates dev->irq to be the lowest of the new interrupts assigned to it. The other interrupts assigned to the device are in the range dev->irq to dev->irq + count - 1. See the last line, that means for the virtual interrupts must be a continuous block. And so this API, which returns virtual interrupt numbers, must satisfy that specification. + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n" + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug("-> using domain @%p\n", domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); The above doesn't work. Why it doesn't work ? Because irq_domain_legacy_revmap() only allocates a single interrupt number. OK, your right. + /* Check if mapping already exists */ + for (i = 0; i < count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug("existing mapping on virq %d," + " now dispose it first\n", virq); + irq_dispose_mapping(virq);
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/6 13:42, Michael Ellerman 写道: On Wed, Mar 06, 2013 at 01:34:58PM +0800, Mike Qiu wrote: 于 2013/3/6 11:54, Michael Ellerman 写道: On Tue, Mar 05, 2013 at 03:19:57PM +0800, Mike Qiu wrote: 于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify You're confusing hardware interrupt numbers and virtual interrupt numbers. My comment is about irq_create_mapping_many(), which returns virtual interrupt numbers. As I said I don't think there is a requirement that the virtual interrupt numbers are also a power-of-2 naturally aligned block, but we should allocate them as one anyway, to avoid any issues in future. But for virtual interrupt numbersit should be a power-of-2 naturally aligned block, because it must be continuous, as the MSI-HOWTO.txt says: 4.2.2 pci_enable_msi_block int pci_enable_msi_block(struct pci_dev *dev, int count) This variation on the above call allows a device driver to request multiple MSIs. The MSI specification only allows interrupts to be allocated in powers of two, up to a maximum of 2^5 (32). If this function returns 0, it has succeeded in allocating at least as many interrupts as the driver requested (it may have allocated more in order to satisfy the power-of-two requirement). In this case, the function enables MSI on this device and updates dev->irq to be the lowest of the new interrupts assigned to it. The other interrupts assigned to the device are in the range dev->irq to dev->irq + count - 1. See the last line, that means for the virtual interrupts must be a continuous block. In practice I think things could work if we didn't, because we are not using the mask routines that assume that layout. But you're right, we must implement the API as it's specified, so the virtual interrupt numbers must be a naturally aligned power-of-2. Yes, also your opinion is also right, just becasue the API requires a naturally aligned power-of-2 interrupt numbers, so we need to implement it like this. cheers cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
"attempt to move .org backwards" still show up
Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. [root@feng linux]# make -j60 CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CC scripts/mod/devicetable-offsets.s GEN scripts/mod/devicetable-offsets.h HOSTCC scripts/mod/file2alias.o CALL scripts/checksyscalls.sh HOSTLD scripts/mod/modpost CHK include/generated/compile.h CALL arch/powerpc/kernel/systbl_chk.sh CALL arch/powerpc/kernel/prom_init_check.sh AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 make: *** [arch/powerpc/kernel] Error 2 make: *** Waiting for unfinished jobs and I see this should be fixed by the commit: 087aa036eb79f24b856893190359ba812b460f45 But it still failed in my P7 machine. the kernel source code info: git tree : git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [root@feng linux]# git log commit 824282ca7d250bd7c301f221c3cd902ce906d731 Merge: f83b293 3b5e50e Author: Linus Torvalds Date: Mon Apr 22 15:00:59 2013 -0700 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS fix from Ralf Baechle: "Revert the change of the definition of PAGE_MASK which was prettier but broke a few relativly rare platforms" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK." commit 3b5e50edaf500f392f4a372296afc0b99ffa7e70 Author: Ralf Baechle Date: Mon Apr 22 17:57:54 2013 +0200 [root@feng linux]# git branch * master [root@feng linux]# git diff [root@feng linux]# Thant means I have done nothing with the kernel Thanks Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: "attempt to move .org backwards" still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: "attempt to move .org backwards" still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? cheers And I do know how to build the source code in this machine . . . Thanks ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: "attempt to move .org backwards" still show up
于 2013/4/25 9:05, Chen Gang 写道: On 2013年04月24日 20:47, Mike wrote: 在 2013-04-24三的 20:37 +1000,Michael Neuling写道: Mike Qiu wrote: 于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. Can you post the resulting config here? Do you have commit in your tree? commit 087aa036eb79f24b856893190359ba812b460f45 Author: Chen Gang powerpc: make additional room in exception vector area Sure, that commit certainly in my git tree. And I just try to remove the code and re-git clone the source code from upstream, this problem still happen. I will post the config file as the attachment :) Thanks I will try, and plan to get a result within this week (2013-04-28) Thanks. Hi This has block my work now So I hope you can take a look ASAP Thanks :) Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/25 16:21, Chen Gang 写道: Hello Mike: Please try this patch, at least it can pass compiling with the config file which you provided under my cross-compiling envrionments. I do not give a running test now, so better to try to run the new kernel with this patch. OK, I will use your patch, and I will send out the result later. Thanks Mike Thanks. On 2013年04月25日 16:18, Chen Gang wrote: When CONFIG_KVM_BOOK3S_64_PR is enabled, MASKABLE_EXCEPTION_PSERIES(0x900 ...) will includes __KVMTEST, it will exceed 0x980 which STD_EXCEPTION_HV(0x980 ...) will use, it will cause compiling issue. The related errors: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 Signed-off-by: Chen Gang --- arch/powerpc/include/asm/kvm_asm.h |2 +- arch/powerpc/kernel/exceptions-64s.S |6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index b9dd382..2c65bae 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -86,7 +86,7 @@ #define BOOK3S_INTERRUPT_PROGRAM 0x700 #define BOOK3S_INTERRUPT_FP_UNAVAIL 0x800 #define BOOK3S_INTERRUPT_DECREMENTER 0x900 -#define BOOK3S_INTERRUPT_HV_DECREMENTER0x980 +#define BOOK3S_INTERRUPT_HV_DECREMENTER0x988 #define BOOK3S_INTERRUPT_SYSCALL 0xc00 #define BOOK3S_INTERRUPT_TRACE0xd00 #define BOOK3S_INTERRUPT_H_DATA_STORAGE 0xe00 diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e789ee7..bb0e677 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -255,7 +255,7 @@ hardware_interrupt_hv: KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) - STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) + STD_EXCEPTION_HV(0x988, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xa00) @@ -698,7 +698,7 @@ machine_check_common: STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ) STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt) - STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt) + STD_EXCEPTION_COMMON(0x988, hdecrementer, .hdec_interrupt) #ifdef CONFIG_PPC_DOORBELL STD_EXCEPTION_COMMON_ASYNC(0xa00, doorbell_super, .doorbell_exception) #else @@ -802,7 +802,7 @@ hardware_interrupt_relon_hv: STD_RELON_EXCEPTION_PSERIES(0x4700, 0x700, program_check) STD_RELON_EXCEPTION_PSERIES(0x4800, 0x800, fp_unavailable) MASKABLE_RELON_EXCEPTION_PSERIES(0x4900, 0x900, decrementer) - STD_RELON_EXCEPTION_HV(0x4980, 0x982, hdecrementer) + STD_RELON_EXCEPTION_HV(0x4988, 0x982, hdecrementer) MASKABLE_RELON_EXCEPTION_PSERIES(0x4a00, 0xa00, doorbell_super) STD_RELON_EXCEPTION_PSERIES(0x4b00, 0xb00, trap_0b) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: "attempt to move .org backwards" still show up
于 2013/4/25 19:16, Chen Gang 写道: On 2013年04月25日 14:25, Paul Mackerras wrote: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Yes, just as my original reply to Mike to bypass it, but get no reply, I guess he has to face the CONFIG_KVM_BOOK3S_64_PR. Now, I am just fixing it, when I finish one patch, please help check. Actually, I have compile pass by your patch, but I see Micheal Neuling's reply, I just stop to do that, and wait for you new patch :) Now I will use your V2 patch to build Thanks Mike Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 9:36, Chen Gang 写道: > On 2013年04月26日 09:18, Chen Gang wrote: >> On 2013年04月26日 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last > branch and is hence overwritten by any branch. > >>> Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? >>> . = 0x900 >>> .globl decrementer_pSeries >>> decrementer_pSeries: >>> HMT_MEDIUM_PPR_DISCARD >>> SET_SCRATCH0(r13) >>> b decrementer_pSeries_0 >>> >>> ... >>> >>> > Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related > with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up > -diff v2 begin- > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index e789ee7..f0489c4 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -254,7 +254,15 @@ hardware_interrupt_hv: > STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) > KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) > > - MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) > + . = 0x900 > + .globl decrementer_pSeries > +decrementer_pSeries: > + HMT_MEDIUM_PPR_DISCARD > + SET_SCRATCH0(r13) /* save r13 */ > + EXCEPTION_PROLOG_0(PACA_EXGEN) > + EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) > + b decrementer_pSeries_0 > + > STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) > > MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) > @@ -536,6 +544,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) > #endif > > .align 7 > + /* moved from 0x900 */ > +decrementer_pSeries_0: > + EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) > + > + .align 7 > /* moved from 0xe00 */ > STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) > KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) > > > -diff v2 end--- > > >> Such as the fix below, is it OK (just like 0x300 or 0x200 has done) ? >> >> Please check, thanks. >> >> ---diff begin- >> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S >> b/arch/powerpc/kernel/exceptions-64s.S >> index e789ee7..a0a5ff2 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -254,7 +254,14 @@ hardware_interrupt_hv: >> STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) >> KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) >> >> -MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) >> +. = 0x900 >> +.globl decrementer_pSeries >> +decrementer_pSeries: >> +HMT_MEDIUM_PPR_DISCARD >> +SET_SCRATCH0(r13) /* save r13 */ >> +EXCEPTION_PROLOG_0(PACA_EXGEN) >> +b decrementer_pSeries_0 >> + >> STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) >> >> MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) >> @@ -536,6 +543,12 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) >> #endif >> >> .align 7 >> +/* moved from 0x900 */ >> +decrementer_pSeries_0: >> +EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) >> +EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) >> + >> +.align 7 >> /* moved from 0xe00 */ >> STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) >> KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) >> >> ---diff end--- >> > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Thanks Mike :-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: "attempt to move .org backwards" still show up
于 2013/4/25 14:25, Paul Mackerras 写道: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Thanks got it, I will have a try. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. Thank you for your information ! I have checked the disassemble by powerpc64-linux-gnu-objdump, it seems all we have done for 0x900 is almost like the original done for 0x200. I am just learning about the CFAR (google it), And I plan to wait for a day, if all things go smoothly, I will send patch v3. :-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/27 17:28, Chen Gang F T 写道: On 2013年04月26日 11:54, Mike Qiu wrote: 于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. I think the diff v2 is correct, but is not the best one for this issue. I prefer the Paul's patch for this issue which has better performance :-) yes, I use your patch and it can work, also Paul's patch can work too. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error
于 2013/4/26 11:51, Paul Mackerras 写道: Building a 64-bit powerpc kernel with PR KVM enabled currently gives this error: AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1 This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into 33 instructions, but we only have space for 32 at the decrementer interrupt vector (from 0x900 to 0x980). In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we currently have two instances of the HMT_MEDIUM macro, which has the effect of setting the SMT thread priority to medium. One is the first instruction, and is overwritten by a no-op on processors where we save the PPR (processor priority register), that is, POWER7 or later. The other is after we have saved the PPR. In order to reduce the code at 0x900 by one instruction, we omit the first HMT_MEDIUM. On processors without SMT this will have no effect since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this will mean that the first few instructions take a little longer in the case where a decrementer interrupt occurs when the hardware thread is running at low SMT priority. On POWER6 and later machines, the hardware automatically boosts the thread priority when a decrementer interrupt is taken if the thread priority was below medium, so this change won't make any difference. The alternative would be to branch out of line after saving the CFAR. However, that would incur an extra overhead on all processors, whereas the approach adopted here only adds overhead on older threaded processors. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/exception-64s.h |2 +- arch/powerpc/kernel/exceptions-64s.S |7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 05e6d2e..8e5fae8 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -414,7 +414,6 @@ label##_relon_hv: \ #define SOFTEN_NOTEST_HV(vec) _SOFTEN_TEST(EXC_HV, vec) #define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)\ - HMT_MEDIUM_PPR_DISCARD; \ SET_SCRATCH0(r13);/* save r13 */\ EXCEPTION_PROLOG_0(PACA_EXGEN); \ __EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec); \ @@ -427,6 +426,7 @@ label##_relon_hv: \ . = loc;\ .globl label##_pSeries; \ label##_pSeries: \ + HMT_MEDIUM_PPR_DISCARD; \ _MASKABLE_EXCEPTION_PSERIES(vec, label, \ EXC_STD, SOFTEN_TEST_PR) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 56bd923..574db3f 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -235,6 +235,7 @@ instruction_access_slb_pSeries: .globl hardware_interrupt_hv; hardware_interrupt_pSeries: hardware_interrupt_hv: + HMT_MEDIUM_PPR_DISCARD BEGIN_FTR_SECTION _MASKABLE_EXCEPTION_PSERIES(0x502, hardware_interrupt, EXC_HV, SOFTEN_TEST_HV) @@ -254,7 +255,11 @@ hardware_interrupt_hv: STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) - MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) + . = 0x900 + .globl decrementer_pSeries +decrementer_pSeries: + _MASKABLE_EXCEPTION_PSERIES(0x900, decrementer, EXC_STD, SOFTEN_TEST_PR) + STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) test-by: Mike Qiu It's workable for me. but I just use this patch to compile and boot up the machine. not do any performance test:) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFC v3 0/8] EEH Support for VFIO PCI device
Hi Gavin, Can error injection be done if EEH is not enbaled? Thanks Mike On 05/14/2014 12:11 PM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which are passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly. If QEMU can't handle it, the request will be sent to host via newly introduced VFIO container IOCTL command (VFIO_EEH_INFO) and gets handled in host kernel. - The error injection infrastructure need support request from the userland utility "errinjct" and PowerKVM based guest. The userland utility "errinjct" works on pSeries platform well with dedicated syscall, which helps invoking RTAS service to fulfil error injection in kernel. From the perspective, it's reasonable to extend the syscall to support PowerNV platform so that OPAL call can be invoked in host kernel for injecting errors. The data transported between userland and kerenl is still following "struct rtas_args" for both cases of PowerNV (OPAL) and pSeries (RTAS). The series of patches requires corresponding firmware changes from Mike Qiu to support error injection and QEMU changes to support EEH for guest. QEMU patchset will be sent separately. Change log == v1 -> v2: * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl. The mechanism KVM in-kernel handling is dropped. * Error injection is reimplemented based syscall, instead of KVM in-kerenl handling. The logic for error injection token management is moved to QEMU. The error injection request is routed to QEMU and then possiblly to host kernel. v2 -> v3: * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based on the comments from Alexey. * Define macros for EEH VFIO operations (Alexey). * Clear frozen state after successful PE reset. * Merge original [PATCH 1/2/3] to one. Testing on P7 = - Emulex adapter Testing on P8 = - Need more testing after design is finalized. - Gavin Shan (8): drivers/vfio: Introduce CONFIG_VFIO_EEH powerpc/eeh: Info to trace passed devices drivers/vfio: New IOCTL command VFIO_EEH_INFO powerpc/eeh: Avoid event on passed PE powerpc/powernv: Sync OPAL header file with firmware powerpc: Extend syscall ppc_rtas() powerpc/powernv: Implement ppc_call_opal() powerpc/powernv: Error injection infrastructure arch/powerpc/include/asm/eeh.h | 52 +++ arch/powerpc/include/asm/opal.h| 74 ++- arch/powerpc/include/asm/rtas.h| 10 +- arch/powerpc/include/asm/syscalls.h| 2 +- arch/powerpc/include/asm/systbl.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 2 +- arch/powerpc/kernel/eeh.c | 8 + arch/powerpc/kernel/eeh_pe.c | 80 arch/powerpc/kernel/rtas.c | 57 +-- arch/powerpc/kernel/syscalls.c | 50 +++ arch/powerpc/platforms/powernv/Makefile| 3 +- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +- arch/powerpc/platforms/powernv/eeh-vfio.c | 593 + arch/powerpc/platforms/powernv/errinject.c | 224 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 93 drivers/vfio/Kconfig | 6 + drivers/vfio/vfio_iommu_spapr_tce.c| 12 + include/uapi/linux/vfio.h | 57 +++ kernel/sys_ni.c| 2 +- 20 files changed, 1278 insertions(+), 53 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c create mode 100644 arch/powerpc/platforms/powernv/errinject.c Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
PowerPC Build error with patch: powerpc/ppc64: Allow allmodconfig to build (finally !)
Hi all, I face one build error in linux-next git tree, see below: The platform is IBM P7. [root@cena01 linux-next]# make -j60 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh :1232:2: warning: #warning syscall renameat2 not implemented [-Wcpp] CHK include/generated/compile.h CALLarch/powerpc/kernel/systbl_chk.sh CALLarch/powerpc/kernel/prom_init_check.sh AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range (0x814c is not between 0x8000 and 0x7ffc) arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range (0x814c is not between 0x8000 and 0x7ffc) make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 make[1]: *** Waiting for unfinished jobs make: *** [arch/powerpc/kernel] Error 2 make: *** Waiting for unfinished jobs Finally, I find out that it is the commit "0be9d8b61c0c1f3c8f86292c6e237ff26acd392d powerpc/ppc64: Allow allmodconfig to build (finally !)" case this error. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Kernel build fail with "Circular xxxx <- xxxx dependency dropped"
Hi all, I recently build linux next kernel in IBM Power7 platform, use default config file copy from /boot/config-3.6.10-4.fc18.ppc64p7 [root@cena01 linux-next]# uname -a Linux cena01.austin.ibm.com 3.15.0-rc1+ #47 SMP Thu Apr 24 20:59:46 CDT 2014 ppc64 ppc64 ppc64 GNU/Linux [root@cena01 linux-next]# cat /etc/issue Fedora release 18 (Spherical Cow) Kernel \r on an \m (\l) and build error log below: [root@cena01 linux-next]# make -j60 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh CHK include/generated/compile.h CALLarch/powerpc/kernel/systbl_chk.sh CALLarch/powerpc/kernel/prom_init_check.sh CHK include/generated/uapi/linux/version.h CALLarch/powerpc/relocs_check.pl Building modules, stage 2. WARNING: 1 bad relocations c1455040 R_PPC64_ADDR64uprobes_fetch_type_table make[1]: Circular arch/powerpc/boot/zImage.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.coff.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.coff.lds.S <- arch/powerpc/boot/zImage.coff.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.coff.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.ps3.lds dependency dropped. WRAParch/powerpc/boot/zImage.ps3.lds.S INFO: Uncompressed kernel (size 0x14d0db8) overlaps the address of the wrapper(0x40) INFO: Fixing the link_address of wrapper to (0x150) ld: cannot open linker script file arch/powerpc/boot/zImage.lds: No such file or directory make[1]: *** [arch/powerpc/boot/zImage.ps3.lds.S] Error 1 make: *** [zImage] Error 2 make: *** Waiting for unfinished jobs MODPOST 1853 modules I use git bisect to find out the possible commits to lead this problem: 7e1c04779efd51154baf652e653ceb24ce68939b kbuild: Use relative path for $(objtree) 890676c65d699db3ad82e70cf8fb449031af kbuild: Use relative path when building in the source tree 9da0763bdd82572be243fcf5161734f11568960f kbuild: Use relative path when building in a subdir of the source tree Thanks Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Kernel build fail with "Circular xxxx <- xxxx dependency dropped"
Hi Michal, after ally you patch, it also has some issue, see below: WARNING: 1 bad relocations c1455040 R_PPC64_ADDR64uprobes_fetch_type_table arch/powerpc/boot/Makefile:336: target `arch/powerpc/boot/zImage.pseries' given more than once in the same rule. gcc -m32 -Wp,-MD,arch/powerpc/boot/.zImage.lds.d -nostdinc -isystem /usr/lib/gcc/ppc64-redhat-linux/4.7.2/include -I./arch/powerpc/include -Iarch/powerpc/include/generated -Iinclude -I./arch/powe rpc/include/uapi -Iarch/powerpc/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Iarch/powerpc -E -Wp,-MD,arch/powerpc/boot/.zI mage.lds.d -P -Upowerpc \ -D__ASSEMBLY__ -DLINKER_SCRIPT -o arch/powerpc/boot/zImage.lds arch/powerpc/boot/zImage.lds.S WRAParch/powerpc/boot/zImage.pseries WRAParch/powerpc/boot/zImage.epapr MODPOST 1853 modules INFO: Uncompressed kernel (size 0x14d0db8) overlaps the address of the wrapper(0x40) INFO: Fixing the link_address of wrapper to (0x150) ld: cannot find arch/powerpc/boot/vmlinux.o: No such file or directory make[1]: *** [arch/powerpc/boot/zImage.pseries] Error 1 make[1]: *** Waiting for unfinished jobs INFO: Uncompressed kernel (size 0x14d0db8) overlaps the address of the wrapper(0x40) INFO: Fixing the link_address of wrapper to (0x150) ld: cannot find arch/powerpc/boot/vmlinux.o: No such file or directory make[1]: *** [arch/powerpc/boot/zImage.epapr] Error 1 make: *** [zImage] Error 2 make: *** Waiting for unfinished jobs Thanks Mike On 06/11/2014 08:22 PM, Michal Marek wrote: Dne 11.6.2014 14:21, Michal Marek napsal(a): On Wed, Jun 11, 2014 at 10:24:24AM +0200, Michal Marek wrote: Dne 11.6.2014 08:02, Mike Qiu napsal(a): make[1]: Circular arch/powerpc/boot/zImage.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.coff.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.coff.lds.S <- arch/powerpc/boot/zImage.coff.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.coff.lds dependency dropped. make[1]: Circular arch/powerpc/boot/zImage.ps3.lds.S <- arch/powerpc/boot/zImage.ps3.lds dependency dropped. WRAParch/powerpc/boot/zImage.ps3.lds.S INFO: Uncompressed kernel (size 0x14d0db8) overlaps the address of the wrapper(0x40) INFO: Fixing the link_address of wrapper to (0x150) ld: cannot open linker script file arch/powerpc/boot/zImage.lds: No such file or directory make[1]: *** [arch/powerpc/boot/zImage.ps3.lds.S] Error 1 make: *** [zImage] Error 2 make: *** Waiting for unfinished jobs MODPOST 1853 modules I use git bisect to find out the possible commits to lead this problem: 7e1c04779efd51154baf652e653ceb24ce68939b kbuild: Use relative path for $(objtree) 890676c65d699db3ad82e70cf8fb449031af kbuild: Use relative path when building in the source tree 9da0763bdd82572be243fcf5161734f11568960f kbuild: Use relative path when building in a subdir of the source tree Thanks for the report, I'll have a look. If I do not come up with a solution soon, I'll revert the series. I have yet to test this, but can you try the patch below? Thanks! Michal From 7f8336f4c7f2131efbe82543580dda3ec1988609 Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Wed, 11 Jun 2014 13:53:48 +0200 Subject: [PATCH] powerpc: Avoid circular dependency with zImage.% The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. If merged, this should of course have a Reported-by: Mike Qiu Michal ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc: Avoid circular dependency with zImage.%
This v2 patch is good, Tested-by: Mike Qiu On 06/11/2014 11:40 PM, Michal Marek wrote: The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. Signed-off-by: Michal Marek --- v2: - Filter out duplicates in the target list - fix the platform argument to cmd_wrap arch/powerpc/boot/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 426dce7..ccc25ed 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -333,8 +333,8 @@ $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz $(obj)/zImage.initrd.%: vmlinux $(wrapperbits) $(call if_changed,wrap,$*,,,$(obj)/ramdisk.image.gz) -$(obj)/zImage.%: vmlinux $(wrapperbits) - $(call if_changed,wrap,$*) +$(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux $(wrapperbits) + $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@)) # dtbImage% - a dtbImage is a zImage with an embedded device tree blob $(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Boot failure in Power7 pSeries
Hi all, I use newest linux-next( top commit: 5f295cdf5c5dbbb0c40f10f2ddae02ff46bbf773) to boot up my Power7 machine, PowerVM mode(HypMode 01), use defualt config file in /boot/, it show error log below: OF stdout device is: /vdevice/vty@3000 Preparing to boot Linux version 3.16.0-rc1-next-20140617+ (root@shui) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #5 SMP Tue Jun 17 05:16:21 EDT 2014 Detected machine type: 0101 Max number of cores passed to firmware: 256 (NR_CPUS = 1024) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/vmlinux-3.16.0-rc1-next-20140617+ root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.md=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvt memory layout at init: memory_limit : (16 MB aligned) alloc_bottom : 0591 alloc_top: 1000 alloc_top_hi : 1000 rmo_top : 1000 ram_top : 1000 instantiating rtas at 0x0ee8... done Querying for OPAL presence... DEFAULT CATCH!, exception-handler=fff00700 at %SRR0: 041a1c14 %SRR1: 00081002 Open Firmware exception handler entered from non-OF code Client's Fix Pt Regs: 00 042c017c 042c2ce8 04ae8d58 042c2f38 04 0369aafc 042c2f38 01adc100 042c2f38 08 04328d58 28002024 1002 0c a001 01a9fd20 041a7df8 10 041a2130 041a1e70 f821ff913d220005 01a9fd20 14 7962 0ee8 0118 0ee8 18 041a2610 0369 042c3070 041a1ce8 1c 041a1ce0 041b89f0 0003 0001 Special Regs: %IV: 0700 %CR: 48002024%XER: %DSISR: 4000 %SRR0: 041a1c14 %SRR1: 00081002 %LR: 0369aafc%CTR: %DAR: f821ff913d220035 Virtual PID = 0 ok 0 > Thanks Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
Anyone has a idea on this issue? Thanks Mike On 06/17/2014 05:45 PM, Mike Qiu wrote: Hi all, I use newest linux-next( top commit: 5f295cdf5c5dbbb0c40f10f2ddae02ff46bbf773) to boot up my Power7 machine, PowerVM mode(HypMode 01), use defualt config file in /boot/, it show error log below: OF stdout device is: /vdevice/vty@3000 Preparing to boot Linux version 3.16.0-rc1-next-20140617+ (root@shui) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #5 SMP Tue Jun 17 05:16:21 EDT 2014 Detected machine type: 0101 Max number of cores passed to firmware: 256 (NR_CPUS = 1024) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/vmlinux-3.16.0-rc1-next-20140617+ root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.md=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvt memory layout at init: memory_limit : (16 MB aligned) alloc_bottom : 0591 alloc_top: 1000 alloc_top_hi : 1000 rmo_top : 1000 ram_top : 1000 instantiating rtas at 0x0ee8... done Querying for OPAL presence... DEFAULT CATCH!, exception-handler=fff00700 at %SRR0: 041a1c14 %SRR1: 00081002 Open Firmware exception handler entered from non-OF code Client's Fix Pt Regs: 00 042c017c 042c2ce8 04ae8d58 042c2f38 04 0369aafc 042c2f38 01adc100 042c2f38 08 04328d58 28002024 1002 0c a001 01a9fd20 041a7df8 10 041a2130 041a1e70 f821ff913d220005 01a9fd20 14 7962 0ee8 0118 0ee8 18 041a2610 0369 042c3070 041a1ce8 1c 041a1ce0 041b89f0 0003 0001 Special Regs: %IV: 0700 %CR: 48002024%XER: %DSISR: 4000 %SRR0: 041a1c14 %SRR1: 00081002 %LR: 0369aafc%CTR: %DAR: f821ff913d220035 Virtual PID = 0 ok 0 > Thanks Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
On 06/18/2014 03:54 PM, Michael Ellerman wrote: On Wed, 2014-06-18 at 11:27 +0800, Mike Qiu wrote: Anyone has a idea on this issue? Did it ever work? If so which kernel version? It works for 3.15, but failed with linux version 3.16.0-rc1-next-20140617 The config file can be simply get from /boot/configxxx. and "make menuconfig" and save without change anything(default). Can you attach your actual .config. You could try building without CONFIG_PPC_POWERNV. Trying, but it should work as default config from /boot/ Thanks Mike cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
On 06/19/2014 09:32 AM, Michael Ellerman wrote: On Wed, 2014-06-18 at 17:02 +0800, Mike Qiu wrote: On 06/18/2014 03:54 PM, Michael Ellerman wrote: On Wed, 2014-06-18 at 11:27 +0800, Mike Qiu wrote: Anyone has a idea on this issue? Did it ever work? If so which kernel version? It works for 3.15, but failed with linux version 3.16.0-rc1-next-20140617 What about 3.16-rc1 ? 3.16-rc1 still not work. the same issue. The config file can be simply get from /boot/configxxx. and "make menuconfig" and save without change anything(default). Sure, but I don't have access to your box so please .. * OS address 9.3.110.192 root/linux123 * Minicom server: 9.3.191.26 root/linux123 o minicom -D /dev/ttyS0 fsp: http://cena-fsp.austin.ibm.com Can you attach your actual .config. You could try building without CONFIG_PPC_POWERNV. Trying, but it should work as default config from /boot/ Well yes that would be nice, but I'm trying to help you narrow down what the problem is. Great thanks Mike cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
On 06/19/2014 09:32 AM, Michael Ellerman wrote: On Wed, 2014-06-18 at 17:02 +0800, Mike Qiu wrote: On 06/18/2014 03:54 PM, Michael Ellerman wrote: On Wed, 2014-06-18 at 11:27 +0800, Mike Qiu wrote: Anyone has a idea on this issue? Did it ever work? If so which kernel version? It works for 3.15, but failed with linux version 3.16.0-rc1-next-20140617 What about 3.16-rc1 ? The config file can be simply get from /boot/configxxx. and "make menuconfig" and save without change anything(default). Sure, but I don't have access to your box so please .. Can you attach your actual .config. You could try building without CONFIG_PPC_POWERNV. Trying, but it should work as default config from /boot/ Well yes that would be nice, but I'm trying to help you narrow down what the problem is. When boot, pls select the "P3-D1" in the menu Thanks Mike cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
On 06/19/2014 09:32 AM, Michael Ellerman wrote: On Wed, 2014-06-18 at 17:02 +0800, Mike Qiu wrote: On 06/18/2014 03:54 PM, Michael Ellerman wrote: On Wed, 2014-06-18 at 11:27 +0800, Mike Qiu wrote: Anyone has a idea on this issue? Did it ever work? If so which kernel version? It works for 3.15, but failed with linux version 3.16.0-rc1-next-20140617 What about 3.16-rc1 ? The config file can be simply get from /boot/configxxx. and "make menuconfig" and save without change anything(default). Sure, but I don't have access to your box so please .. Can you attach your actual .config. The linux-next is in "/home/mike/linux-next" You could try building without CONFIG_PPC_POWERNV. Trying, but it should work as default config from /boot/ Well yes that would be nice, but I'm trying to help you narrow down what the problem is. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure in Power7 pSeries
On 06/19/2014 11:55 AM, Michael Ellerman wrote: On Thu, 2014-06-19 at 10:18 +0800, Mike Qiu wrote: On 06/19/2014 09:32 AM, Michael Ellerman wrote: On Wed, 2014-06-18 at 17:02 +0800, Mike Qiu wrote: On 06/18/2014 03:54 PM, Michael Ellerman wrote: On Wed, 2014-06-18 at 11:27 +0800, Mike Qiu wrote: Anyone has a idea on this issue? Did it ever work? If so which kernel version? It works for 3.15, but failed with linux version 3.16.0-rc1-next-20140617 What about 3.16-rc1 ? 3.16-rc1 still not work. the same issue. The config file can be simply get from /boot/configxxx. and "make menuconfig" and save without change anything(default). Sure, but I don't have access to your box so please .. * OS address 9.3.110.192 root/linux123 * Minicom server: 9.3.191.26 root/linux123 o minicom -D /dev/ttyS0 fsp: http://cena-fsp.austin.ibm.com Please change that password immediately, and don't EVER post the login details for a machine on a public list again. Also I'm not offering to logon to your machine and debug it, I'm giving you advice on how you can debug it. ...OK, got it, my mistake... Thanks Mike cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 06/23/2014 10:14 AM, Gavin Shan wrote: The patch implements one OPAL firmware sysfs file to support PCI error injection: "/sys/firmware/opal/errinjct", which will be used like the way described as follows. According to PAPR spec, there are 3 RTAS calls related to error injection: "ibm,open-errinjct": allocate token prior to doing error injection. "ibm,close-errinjct": release the token allocated from "ibm,open-errinjct". "ibm,errinjct": do error injection. Sysfs file /sys/firmware/opal/errinjct accepts strings that have fixed format "ei_token ...". For now, we only support 32-bits and 64-bits PCI error injection and they should have following strings written to /sys/firmware/opal/errinjct as follows. We don't have corresponding sysfs files for "ibm,open-errinjct" and "ibm,close-errinjct", which means that we rely on userland to maintain the token by itself. 32-bits PCI error: "7:addr:mask:iommu_group_id:function". 64-bits PCI error: "8:addr:mask:iommu_group_id:function". The above "7" and "8" represent 32-bits and 64-bits PCI error seperately and "function" is one of the specific PCI errors (e.g. MMIO access address parity error), which are defined by PAPR spec. Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-errinjct.c | 184 + arch/powerpc/platforms/powernv/opal.c | 2 + 4 files changed, 188 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/platforms/powernv/opal-errinjct.c diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index d982bb8..bf280d9 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -985,6 +985,7 @@ extern int opal_elog_init(void); extern void opal_platform_dump_init(void); extern void opal_sys_param_init(void); extern void opal_msglog_init(void); +extern void opal_errinjct_init(void); extern int opal_machine_check(struct pt_regs *regs); extern bool opal_mce_check_early_recovery(struct pt_regs *regs); diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 63cebb9..4711de8 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -1,7 +1,7 @@ obj-y += setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o -obj-y += opal-msglog.o +obj-y += opal-msglog.o opal-errinjct.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o diff --git a/arch/powerpc/platforms/powernv/opal-errinjct.c b/arch/powerpc/platforms/powernv/opal-errinjct.c new file mode 100644 index 000..29c9e83 --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-errinjct.c @@ -0,0 +1,184 @@ +/* + * The file supports error injection, which works based on OPAL API. + * For now, we only support PCI error injection. We need support + * injecting other types of errors in future. + * + * Copyright Gavin Shan, IBM Corporation 2014. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "powernv.h" +#include "pci.h" + +static DEFINE_MUTEX(errinjct_mutex); + +static int errinjct_iommu_group_to_phb_and_pe(uint32_t iommu_grp_id, + uint64_t *phb_id, + uint32_t *pe_num) +{ +#ifdef CONFIG_IOMMU_API Is it reasonable to do error injection with "CONFIG_IOMMU_API" ? That means if use default config(CONFIG_IOMMU_API = n), we can not do error injection to pci devices? Thanks Mike + struct iommu_group *iommu_grp; + struct iommu_table *tbl; + struct pnv_ioda_pe *pe; + + iommu_grp = iommu_group_get_by_id(iommu_grp_id); + if (!iommu_grp) + return -ENODEV; + + tbl = iommu_group_get_iommudata(iommu_grp); + if (!tbl) + return -ENODEV; + + pe = container_of(tbl, struct pnv_ioda_pe, tce32_table); + if (!pe->phb) + return -ENODEV; + + *phb_id = pe->phb->opal_id; + *pe_num = pe->pe_number; + + return 0; +#endif + + return -ENXIO; +} + +static int errinjct_ioa_bus_error(const char *buf, struct OpalErrinjct *ei) +{ + uint32_t iommu_grp_id; + int ret; + + /* Extract parameters */ + ret = sscanf(buf, "%x:%x:%x:%x:%x", +
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 06/24/2014 02:36 PM, Benjamin Herrenschmidt wrote: Is it reasonable to do error injection with "CONFIG_IOMMU_API" ? That means if use default config(CONFIG_IOMMU_API = n), we can not do error injection to pci devices? Well we can't pass them through either so ... In any case, this is not a priority. First we need to implement a solid error injection facility for the *host*. The guest one is really really OK. Is that mean *host* side error injection should base on "CONFIG_IOMMU_API" ? If it is just host side(no guest, no pass through), can't we do error inject? Maybe I misunderstand :) Thanks Mike low on the list. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: Remove OPAL v1 takeover
Reported-and-tested-by: Mike Qiu Thanks Mike On 06/24/2014 03:17 PM, Michael Ellerman wrote: In commit 27f4488872d9 "Add OPAL takeover from PowerVM" we added support for "takeover" on OPAL v1 machines. This was a mode of operation where we would boot under pHyp, and query for the presence of OPAL. If detected we would then do a special sequence to take over the machine, and the kernel would end up running in hypervisor mode. OPAL v1 was never a supported product, and was never shipped outside IBM. As far as we know no one is still using it. Newer versions of OPAL do not use the takeover mechanism. Although the query for OPAL should be harmless on machines with newer OPAL, we have seen a machine where it causes a crash in Open Firmware. The code in early_init_devtree() to copy boot_command_line into cmd_line was added in commit 817c21ad9a1f "Get kernel command line accross OPAL takeover", and AFAIK is only used by takeover, so should also be removed. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig.debug | 1 - arch/powerpc/include/asm/opal.h| 29 arch/powerpc/kernel/prom.c | 7 - arch/powerpc/kernel/prom_init.c| 211 - arch/powerpc/kernel/prom_init_check.sh | 4 +- arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-takeover.S | 140 7 files changed, 2 insertions(+), 392 deletions(-) delete mode 100644 arch/powerpc/platforms/powernv/opal-takeover.S diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 790352f..35d16bd 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -303,7 +303,6 @@ config PPC_EARLY_DEBUG_OPAL_VTERMNO This correspond to which /dev/hvcN you want to use for early debug. - On OPAL v1 (takeover) this should always be 0 On OPAL v2, this will be 0 for network console and 1 or 2 for the machine built-in serial ports. diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 4600188..0da1dbd 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -12,27 +12,7 @@ #ifndef __OPAL_H #define __OPAL_H -/** Takeover interface / - -/* PAPR H-Call used to querty the HAL existence and/or instanciate - * it from within pHyp (tech preview only). - * - * This is exclusively used in prom_init.c - */ - #ifndef __ASSEMBLY__ - -struct opal_takeover_args { - u64 k_image;/* r4 */ - u64 k_size; /* r5 */ - u64 k_entry;/* r6 */ - u64 k_entry2; /* r7 */ - u64 hal_addr; /* r8 */ - u64 rd_image; /* r9 */ - u64 rd_size;/* r10 */ - u64 rd_loc; /* r11 */ -}; - /* * SG entry * @@ -55,15 +35,6 @@ struct opal_sg_list { /* We calculate number of sg entries based on PAGE_SIZE */ #define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry)) -extern long opal_query_takeover(u64 *hal_size, u64 *hal_align); - -extern long opal_do_takeover(struct opal_takeover_args *args); - -struct rtas_args; -extern int opal_enter_rtas(struct rtas_args *args, - unsigned long data, - unsigned long entry); - #endif /* __ASSEMBLY__ */ /** OPAL APIs **/ diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 613a860..b694b07 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -662,13 +662,6 @@ void __init early_init_devtree(void *params) of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL); #endif - /* Pre-initialize the cmd_line with the content of boot_commmand_line, -* which will be empty except when the content of the variable has -* been overriden by a bootloading mechanism. This happens typically -* with HAL takeover -*/ - strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE); - /* Retrieve various informations from the /chosen node of the * device-tree, including the platform type, initrd location and * size, TCE reserve, and more ... diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 078145a..1a85d8f 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1268,201 +1268,6 @@ static u64 __initdata prom_opal_base; static u64 __initdata prom_opal_entry; #endif -#ifdef __BIG_ENDIAN__ -/* XXX Don't change this structure without updating opal-takeover.S */ -static struct opal_secondary_data { - s64 ack;/* 0 */ - u64 go; /* 8 */ - struct opal_takeover_args args; /* 16 *
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 06/25/2014 08:03 AM, Gavin Shan wrote: On Tue, Jun 24, 2014 at 05:00:52PM +1000, Benjamin Herrenschmidt wrote: On Tue, 2014-06-24 at 14:57 +0800, Mike Qiu wrote: Is that mean *host* side error injection should base on "CONFIG_IOMMU_API" ? If it is just host side(no guest, no pass through), can't we do error inject? Maybe I misunderstand :) Ah no, make different patches, we don't want to use IOMMU group ID, just PE numbers. Maybe we should expose in sysfs the PEs from the platform code with the error injection files underneath ... Yeah, "errinjct" needs grab PCI_domain_nr+PE number from sysfs. We already had PE number sysfs file: [root@ltcfbl8eb :01:00.1]# pwd /sys/bus/pci/devices/:01:00.1 [root@ltcfbl8eb :01:00.1]# cat eeh_pe_config_addr 0x1 For guest support, we will rely on VFIO group ioctl command, which naturally depends on pass-through. --- We probably implement it like this. If there're anything wrong, please correct me: - Introduce EEH callback struct eeh_ops::err_inject(), which will be implemented for PowerNV (NULL for pSeries) by calling the PCI error injection dedicated OPAL API (opal_pci_err_inject()). - Introduce global function eeh.c::eeh_err_inject(), which calls to eeh_ops::err_inject() and newly introduced VFIO EEH operation will be implemented based on this function. - Introduce debugfs /sys/kernel/debug/powerpc/PCI/errinjct, which Here maybe "/sys/kernel/debug/powerpc/errinjct" is better, because it will supply "PCI_domain_nr" in parameters, so no need supply errinjct for each PCI domain. Another reason is error inject not only for PCI(in future), so better not in PCI domain entry. Also it simple for userland tools to has a fixed path. Thanks Mike receives PCI error injection parameters from "errinjct". It could have format: "ei_token:addr:mask:PCI_domain_nr:PE_num:function". Eventually, eeh_err_inject() is invoked to call the corresponding OPAL API. Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()
Eeh sysfs entry created must be after EEH_ENABLED been set in eeh_subsystem_flags. In PowerNV platform, it try to create sysfs entry before EEH_ENABLED been set, when boot up. So nothing will be created for eeh in sysfs. Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 8ad0c5b..5f95581 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -136,6 +136,9 @@ static int ioda_eeh_post_init(struct pci_controller *hose) struct pnv_phb *phb = hose->private_data; int ret; + /* Creat sysfs after EEH_ENABLED been set */ + eeh_add_sysfs_files(hose->bus); + /* Register OPAL event notifier */ if (!ioda_eeh_nb_init) { ret = opal_notifier_register(&ioda_eeh_nb); -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH powerpc] Fix parameter restoring issue in commit 752a6422f
Hi Zhong, I really need this patch :) BTW, it seems that this bug just happens in 3.16-rc1 as I reported. Thanks Mike On 06/25/2014 12:00 PM, Li Zhong wrote: In commit 752a6422f, new stack frame is created for parameters. However, the r1 is added back a little earlier, so r3 and r4 are restored from a wrong place, which could cause following error during boot: Querying for OPAL presence... there ! DEFAULT CATCH!, exception-handler=fff00700 at %SRR0: 04223058 %SRR1: 80081002 Open Firmware exception handler entered from non-OF code Client's Fix Pt Regs: 00 04223054 04223020 04fbe838 0002 04 28002024 04fbe838 0427e838 04222f20 08 04222f20 1002 0c a001 01a3fd20 040eb170 10 040eb628 040eb368 fffd 01a3fd20 14 01b37f00 0f34 00cc 0f34 18 040ebb08 0358 040eb128 04285920 1c 01a3fd60 040eb100 7c0802a6f8010010 f821ff914b91ebd1 Special Regs: %IV: 0700 %CR: 28002022%XER: %DSISR: 4200 %SRR0: 04223058 %SRR1: 80081002 %LR: 04223054%CTR: %DAR: 01a3fcf00020b4ac Virtual PID = 0 ok Signed-off-by: Li Zhong --- diff --git a/arch/powerpc/platforms/powernv/opal-takeover.S b/arch/powerpc/platforms/powernv/opal-takeover.S index 11a3169..9093540 100644 --- a/arch/powerpc/platforms/powernv/opal-takeover.S +++ b/arch/powerpc/platforms/powernv/opal-takeover.S @@ -5,6 +5,7 @@ * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License +B * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ @@ -27,11 +28,11 @@ _GLOBAL(opal_query_takeover) li r3,H_HAL_TAKEOVER li r4,H_HAL_TAKEOVER_QUERY_MAGIC HVSC - addir1,r1,STACKFRAMESIZE ld r10,STK_PARAM(R3)(r1) std r4,0(r10) ld r10,STK_PARAM(R4)(r1) std r5,0(r10) + addir1,r1,STACKFRAMESIZE lwz r0,8(r1) mtcrf 0xff,r0 blr ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()
On 06/25/2014 01:33 PM, Gavin Shan wrote: On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote: [ cc Richard ] Eeh sysfs entry created must be after EEH_ENABLED been set in eeh_subsystem_flags. In PowerNV platform, it try to create sysfs entry before EEH_ENABLED been set, when boot up. So nothing will be created for eeh in sysfs. Could you please make the commit log more clear? :-) I guess the issue is introduced by commit 2213fb1 (" powerpc/eeh: Skip eeh sysfs when eeh is disabled"). The commit checks EEH is enabled while creating PCI device EEH sysfs files. If not, the sysfs files won't be created. That's to avoid warning reported during PCI hotplug. The problem you're reporting (if I understand completely): You don't see the sysfs files after the system boots up. If it's the case, you probably need following changes in arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup(). Could you have a try with it? #ifdef CONFIG_EEH eeh_probe_mode_set(EEH_PROBE_MODE_DEV); - eeh_addr_cache_build(); eeh_init(); + eeh_addr_cache_build(); #endif But this was not work, as I test, see boot log below: [0.233993] Unable to handle kernel paging request for data at address 0x0010 [0.234086] Faulting instruction address: 0xc0036c84 [0.234144] Oops: Kernel access of bad area, sig: 11 [#1] [0.234188] SMP NR_CPUS=1024 NUMA PowerNV [0.234235] Modules linked in: [0.234282] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 3.16.0-rc1+ #61 [0.234339] task: c003bfcc ti: c003bfd0 task.ti: c003bfd0 [0.234405] NIP: c0036c84 LR: c0036c4c CTR: [0.234472] REGS: c003bfd03430 TRAP: 0300 Not tainted (3.16.0-rc1+) [0.234528] MSR: 90009032 CR: 44008088 XER: [0.234686] CFAR: c0009358 DAR: 0010 DSISR: 4000 SOFTE: 1 GPR00: c0036c4c c003bfd036b0 c1448d58 c003bce30080 GPR04: 0001 c003bce300c8 GPR08: c003bce300e8 3030f000 GPR12: 22008042 cfee1200 c0b0e1f0 GPR16: f0019600 0008 003f c3022280 GPR20: c0b0e058 0040 0008 0007 GPR24: c3120f80 c0b0e2d0 c13bc6f0 c003bca18400 GPR28: c301 c003bce30080 c003bb2c3b40 [0.235582] NIP [c0036c84] .eeh_add_to_parent_pe+0x164/0x340 [0.235639] LR [c0036c4c] .eeh_add_to_parent_pe+0x12c/0x340 [0.235695] Call Trace: [0.235719] [c003bfd036b0] [c0036c4c] .eeh_add_to_parent_pe+0x12c/0x340 (unreliable) [0.235810] [c003bfd03730] [c0070ee8] .powernv_eeh_dev_probe+0x158/0x1d0 [0.235890] [c003bfd037c0] [c048768c] .pci_walk_bus+0x8c/0x120 [0.235957] [c003bfd03860] [c00341c4] .eeh_init+0xf4/0x310 [0.236025] [c003bfd03900] [c006e7a8] .pnv_pci_ioda_fixup+0x688/0xb30 [0.236105] [c003bfd03a60] [c0c2ee90] .pcibios_resource_survey+0x334/0x3f4 [0.236183] [c003bfd03b50] [c0c2e65c] .pcibios_init+0xa0/0xd4 [0.236251] [c003bfd03be0] [c000bc94] .do_one_initcall+0x124/0x280 [0.236329] [c003bfd03cd0] [c0c24acc] .kernel_init_freeable+0x250/0x348 [0.236408] [c003bfd03db0] [c000c4c4] .kernel_init+0x24/0x140 [0.236475] [c003bfd03e30] [c000a45c] .ret_from_kernel_thread+0x58/0x7c [0.236553] Instruction dump: [0.236586] 815f000c 6000 e9228890 915e000c 8129 7926f7e3 813f0008 913e0008 [0.236698] 41820018 2fbf 419e0154 e93f0088 f93e0018 e93f0080 4834 [0.236819] ---[ end trace e78b31e354e84859 ]--- [0.236864] [2.236933] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b This may because edev->pdev is set in eeh_addr_cache_build(), while eeh_init() use that entry. After changed the code, the call patch: eeh_init() > pci_walk_bus()> powernv_eeh_dev_probe() -> eeh_add_to_parent_pe() eeh_addr_cache_build() We can see in eeh_add_to_parent_pe() { .. pe->bus = eeh_dev_to_pci_dev(edev)->bus; .. } That is sure eeh_dev_to_pci_dev(edev) will be *NULL*, because this is set in eeh_addr_cache_build() Thanks Mike Eventually PowerNV/pSeries have same function call sequence: - Set EEH probe mode - Doing probe (with device node or PCI device) - Build address cache. Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 8ad0c5b..5f95581 100644 --- a/arch/powerpc/platforms/po
Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()
On 06/26/2014 08:12 AM, Gavin Shan wrote: On Wed, Jun 25, 2014 at 03:27:55PM +0800, Mike Qiu wrote: On 06/25/2014 01:33 PM, Gavin Shan wrote: On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote: [ cc Richard ] Eeh sysfs entry created must be after EEH_ENABLED been set in eeh_subsystem_flags. In PowerNV platform, it try to create sysfs entry before EEH_ENABLED been set, when boot up. So nothing will be created for eeh in sysfs. Could you please make the commit log more clear? :-) I guess the issue is introduced by commit 2213fb1 (" powerpc/eeh: Skip eeh sysfs when eeh is disabled"). The commit checks EEH is enabled while creating PCI device EEH sysfs files. If not, the sysfs files won't be created. That's to avoid warning reported during PCI hotplug. The problem you're reporting (if I understand completely): You don't see the sysfs files after the system boots up. If it's the case, you probably need following changes in arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup(). Could you have a try with it? #ifdef CONFIG_EEH eeh_probe_mode_set(EEH_PROBE_MODE_DEV); - eeh_addr_cache_build(); eeh_init(); + eeh_addr_cache_build(); #endif But this was not work, as I test, see boot log below: Yeah, we can't convert eeh_dev to pci_dev that time. The association is populated by eeh_addr_cache_build(). The attached patch should fix your issue. I tried on P7 machine and sysfs entries created. Could you help having a test on your machine? :-) I have tested, works good. Thanks Mike Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/eeh: sysfs entries lost
The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh: Skip eeh sysfs when eeh is disabled"). That commit added condition to create sysfs entries with EEH_ENABLED, which isn't populated when trying to create sysfs entries on PowerNV platform during system boot time. The patch fixes the issue by: * Reoder EEH initialization functions so that they're same on PowerNV/pSeries. * Cache PE's primary bus by PowerNV platform instead of EEH core to avoid kernel crash caused by the function reorder. Another benefit with this is to avoid one eeh_probe_mode_dev() in EEH core. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/eeh_pe.c | 11 --- arch/powerpc/platforms/powernv/eeh-powernv.c | 17 - arch/powerpc/platforms/powernv/pci-ioda.c| 2 +- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..1dce071a 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -351,17 +351,6 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) pe->config_addr = edev->config_addr; /* -* While doing PE reset, we probably hot-reset the -* upstream bridge. However, the PCI devices including -* the associated EEH devices might be removed when EEH -* core is doing recovery. So that won't safe to retrieve -* the bridge through downstream EEH device. We have to -* trace the parent PCI bus, then the upstream bridge. -*/ - if (eeh_probe_mode_dev()) - pe->bus = eeh_dev_to_pci_dev(edev)->bus; - - /* * Put the new EEH PE into hierarchy tree. If the parent * can't be found, the newly created PE will be attached * to PHB directly. Otherwise, we have to associate the diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index 56a206f..48eb223 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -107,6 +107,7 @@ static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag) struct pnv_phb *phb = hose->private_data; struct device_node *dn = pci_device_to_OF_node(dev); struct eeh_dev *edev = of_node_to_eeh_dev(dn); + int ret; /* * When probing the root bridge, which doesn't have any @@ -143,7 +144,21 @@ static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag) edev->pe_config_addr= phb->bdfn_to_pe(phb, dev->bus, dev->devfn & 0xff); /* Create PE */ - eeh_add_to_parent_pe(edev); + ret = eeh_add_to_parent_pe(edev); + if (ret) { + pr_warn("%s: Can't add PCI dev %s to parent PE (%d)\n", + __func__, pci_name(dev), ret); + return ret; + } + + /* +* Cache the PE primary bus, which can't be fetched when +* full hotplug is in progress. In that case, all child +* PCI devices of the PE are expected to be removed prior +* to PE reset. +*/ + if (!edev->pe->bus) + edev->pe->bus = dev->bus; /* * Enable EEH explicitly so that we will do EEH check diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index de19ede..81f2d3a 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1142,8 +1142,8 @@ static void pnv_pci_ioda_fixup(void) #ifdef CONFIG_EEH eeh_probe_mode_set(EEH_PROBE_MODE_DEV); - eeh_addr_cache_build(); eeh_init(); + eeh_addr_cache_build(); #endif } -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot()
[ 121.133381] WARNING: at drivers/pci/search.c:223 [ 121.133422] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 [ 121.133424] task: c1367af0 ti: c1444000 task.ti: c1444000 [ 121.133425] NIP: c0497b70 LR: c0037530 CTR: 3003d114 [ 121.133427] REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) [ 121.133428] MSR: 90029032 CR: 48002422 XER: 2000 [ 121.133433] CFAR: c003752c SOFTE: 0 GPR00: c0037530 c1447220 c1448c30 c003bca1dc00 GPR04: c0066064 90009032 0008 GPR08: 0007 0001 0100 3003d200 GPR12: 44002482 cfee c15e8830 GPR16: c15e8c30 c15e8430 c15e8030 GPR20: c1348c30 c1482180 GPR24: 00200200 c003bc243500 c003feff4070 c003bcec3000 GPR28: c14cac00 c003bca1dc00 [ 121.133454] NIP [c0497b70] .pci_get_slot+0x40/0x110 [ 121.133457] LR [c0037530] .eeh_pe_loc_get+0x150/0x190 [ 121.133458] Call Trace: [ 121.133461] [c1447220] [c0721730] .of_get_property+0x30/0x60 (unreliable) [ 121.133464] [c14472b0] [c0037530] .eeh_pe_loc_get+0x150/0x190 [ 121.133466] [c1447340] [c0034684] .eeh_dev_check_failure+0x1b4/0x550 [ 121.133468] [c14473f0] [c0034ab0] .eeh_check_failure+0x90/0xf0 [ 121.133493] [c1447490] [d2c03e84] .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] [ 121.133501] [c1447520] [d2c041a4] .lpfc_poll_eratt+0x64/0x100 [lpfc] [ 121.133504] [c14475a0] [c00b45b4] .call_timer_fn+0x64/0x190 [ 121.133506] [c1447650] [c00b4d1c] .run_timer_softirq+0x2cc/0x3e0 [ 121.133508] [c1447760] [c00a90c8] .__do_softirq+0x198/0x3c0 [ 121.133510] [c1447880] [c00a9658] .irq_exit+0xc8/0x110 [ 121.133513] [c1447900] [c001e010] .timer_interrupt+0xa0/0xe0 [ 121.133515] [c1447980] [c00026d8] decrementer_common+0x158/0x180 [ 121.133518] --- Exception: 901 at .arch_local_irq_restore+0x74/0x90 pci_get_slot() should not be used in interrupt. But eeh subsystem do the error checking in interrupt in this situation. This patch is to solve this issue. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/eeh_pe.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..6f4bfee 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -792,6 +792,28 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) } /** + * __dn_get_pdev - Retrieve the pci_dev from device_node by bus/devfn + * @dn: device_node of the pci_dev + * @data: the pci device's bus/devfn + * + * Retrieve the pci_dev using the given device_node and bus/devfn. + */ +void *__dn_get_pdev(struct device_node *dn, void *data) +{ + struct pci_dn *pdn = PCI_DN(dn); + int busno = *((int *)data) >> 8; + int devfn = *((int *)data) & 0xff; + + if (!pdn) + return NULL; + + if (pdn->busno == busno && pdn->devfn == devfn) + return pdn->pcidev; + + return NULL; +} + +/** * eeh_pe_loc_get - Retrieve location code binding to the given PE * @pe: EEH PE * @@ -807,6 +829,7 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) struct pci_dev *pdev; struct device_node *dn; const char *loc; + int bdevfn; if (!bus) return "N/A"; @@ -823,7 +846,9 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) if (loc) return loc; - pdev = pci_get_slot(bus, 0x0); + /* Get the root port */ + bdevfn = (bus->number) << 8 || 0x0; + pdev = traverse_pci_devices(hose->dn, __dn_get_pdev, &bdevfn); } else { pdev = bus->self; } @@ -846,8 +871,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) loc = "N/A"; out: - if (pci_is_root_bus(bus) && pdev) - pci_dev_put(pdev); return loc; } -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot()
On 07/14/2014 09:01 PM, Gavin Shan wrote: On Mon, Jul 14, 2014 at 04:19:23AM -0400, Mike Qiu wrote: [ 121.133381] WARNING: at drivers/pci/search.c:223 [ 121.133422] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 [ 121.133424] task: c1367af0 ti: c1444000 task.ti: c1444000 [ 121.133425] NIP: c0497b70 LR: c0037530 CTR: 3003d114 [ 121.133427] REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) [ 121.133428] MSR: 90029032 CR: 48002422 XER: 2000 [ 121.133433] CFAR: c003752c SOFTE: 0 GPR00: c0037530 c1447220 c1448c30 c003bca1dc00 GPR04: c0066064 90009032 0008 GPR08: 0007 0001 0100 3003d200 GPR12: 44002482 cfee c15e8830 GPR16: c15e8c30 c15e8430 c15e8030 GPR20: c1348c30 c1482180 GPR24: 00200200 c003bc243500 c003feff4070 c003bcec3000 GPR28: c14cac00 c003bca1dc00 [ 121.133454] NIP [c0497b70] .pci_get_slot+0x40/0x110 [ 121.133457] LR [c0037530] .eeh_pe_loc_get+0x150/0x190 [ 121.133458] Call Trace: [ 121.133461] [c1447220] [c0721730] .of_get_property+0x30/0x60 (unreliable) [ 121.133464] [c14472b0] [c0037530] .eeh_pe_loc_get+0x150/0x190 [ 121.133466] [c1447340] [c0034684] .eeh_dev_check_failure+0x1b4/0x550 [ 121.133468] [c14473f0] [c0034ab0] .eeh_check_failure+0x90/0xf0 [ 121.133493] [c1447490] [d2c03e84] .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] [ 121.133501] [c1447520] [d2c041a4] .lpfc_poll_eratt+0x64/0x100 [lpfc] [ 121.133504] [c14475a0] [c00b45b4] .call_timer_fn+0x64/0x190 [ 121.133506] [c1447650] [c00b4d1c] .run_timer_softirq+0x2cc/0x3e0 [ 121.133508] [c1447760] [c00a90c8] .__do_softirq+0x198/0x3c0 [ 121.133510] [c1447880] [c00a9658] .irq_exit+0xc8/0x110 [ 121.133513] [c1447900] [c001e010] .timer_interrupt+0xa0/0xe0 [ 121.133515] [c1447980] [c00026d8] decrementer_common+0x158/0x180 [ 121.133518] --- Exception: 901 at .arch_local_irq_restore+0x74/0x90 pci_get_slot() should not be used in interrupt. But eeh subsystem do the error checking in interrupt in this situation. This patch is to solve this issue. The commit log has been clear enough, but the following message might be better. I'm not good at writing good commit log as well: --- pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c1367af0 ti: c1444000 task.ti: c1444000 NIP: c0497b70 LR: c0037530 CTR: 3003d114 REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 90029032 CR: 48002422 XER: 2000 CFAR: c003752c SOFTE: 0 : NIP [c0497b70] .pci_get_slot+0x40/0x110 LR [c0037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Yes, it's better enough. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/eeh_pe.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..6f4bfee 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -792,6 +792,28 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) } /** + * __dn_get_pdev - Retrieve the pci_dev from device_node by bus/devfn + * @dn: device_node of the pci_dev + * @data: the pci device's bus/devfn + * + * Retrieve the pci_dev using the given device_node and bus/devfn. + */ +void *__dn_get_pdev(struct device_node *dn, void *data) +{ The function isn't necessarily public. "static" is enough, I think. I don't think we need this actually. Please refer to more comments below. + struct pci_dn *pdn = PCI_DN(dn); + int busno = *((int *)data) >> 8; + int devfn = *((int *)data) & 0xff; + + if (!pdn) + return NULL; + + if (pdn->busno == busno &
[PATCH v2] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot()
pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c1367af0 ti: c1444000 task.ti: c1444000 NIP: c0497b70 LR: c0037530 CTR: 3003d114 REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 90029032 CR: 48002422 XER: 2000 CFAR: c003752c SOFTE: 0 : NIP [c0497b70] .pci_get_slot+0x40/0x110 LR [c0037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Signed-off-by: Mike Qiu --- Changelog[v2]: Check the child device_node of root bus for root port directly instead of search pdev from device-tree and then translate it to device-node arch/powerpc/kernel/eeh_pe.c | 24 ++-- 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..f96c10f 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -802,7 +802,6 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) */ const char *eeh_pe_loc_get(struct eeh_pe *pe) { - struct pci_controller *hose; struct pci_bus *bus = eeh_pe_bus_get(pe); struct pci_dev *pdev; struct device_node *dn; @@ -811,29 +810,20 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) if (!bus) return "N/A"; + dn = pci_bus_to_OF_node(bus); /* PHB PE or root PE ? */ - if (pci_is_root_bus(bus)) { - hose = pci_bus_to_host(bus); - loc = of_get_property(hose->dn, - "ibm,loc-code", NULL); + if (dn && pci_is_root_bus(bus)) { + loc = of_get_property(dn, "ibm,loc-code", NULL); if (loc) return loc; - loc = of_get_property(hose->dn, - "ibm,io-base-loc-code", NULL); + loc = of_get_property(dn, "ibm,io-base-loc-code", NULL); if (loc) return loc; - pdev = pci_get_slot(bus, 0x0); - } else { - pdev = bus->self; - } - - if (!pdev) { - loc = "N/A"; - goto out; + /* Check the root port */ + dn = dn->child; } - dn = pci_device_to_OF_node(pdev); if (!dn) { loc = "N/A"; goto out; @@ -846,8 +836,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) loc = "N/A"; out: - if (pci_is_root_bus(bus) && pdev) - pci_dev_put(pdev); return loc; } -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot()
On 07/15/2014 01:07 PM, Gavin Shan wrote: On Mon, Jul 14, 2014 at 10:33:48PM -0400, Mike Qiu wrote: pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c1367af0 ti: c1444000 task.ti: c1444000 NIP: c0497b70 LR: c0037530 CTR: 3003d114 REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 90029032 CR: 48002422 XER: 2000 CFAR: c003752c SOFTE: 0 : NIP [c0497b70] .pci_get_slot+0x40/0x110 LR [c0037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Signed-off-by: Mike Qiu --- Changelog[v2]: Check the child device_node of root bus for root port directly instead of search pdev from device-tree and then translate it to device-node I run into following warning with your patch. Please test the attached one. If no problem found, send that one please. arch/powerpc/kernel/eeh_pe.c: In function 'eeh_pe_loc_get': arch/powerpc/kernel/eeh_pe.c:806:18: warning: unused variable 'pdev' OK, I will remove unused variable. Thanks, Mike Thanks, Gavin arch/powerpc/kernel/eeh_pe.c | 24 ++-- 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..f96c10f 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -802,7 +802,6 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) */ const char *eeh_pe_loc_get(struct eeh_pe *pe) { - struct pci_controller *hose; struct pci_bus *bus = eeh_pe_bus_get(pe); struct pci_dev *pdev; struct device_node *dn; @@ -811,29 +810,20 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) if (!bus) return "N/A"; + dn = pci_bus_to_OF_node(bus); /* PHB PE or root PE ? */ - if (pci_is_root_bus(bus)) { - hose = pci_bus_to_host(bus); - loc = of_get_property(hose->dn, - "ibm,loc-code", NULL); + if (dn && pci_is_root_bus(bus)) { + loc = of_get_property(dn, "ibm,loc-code", NULL); if (loc) return loc; - loc = of_get_property(hose->dn, - "ibm,io-base-loc-code", NULL); + loc = of_get_property(dn, "ibm,io-base-loc-code", NULL); if (loc) return loc; - pdev = pci_get_slot(bus, 0x0); - } else { - pdev = bus->self; - } - - if (!pdev) { - loc = "N/A"; - goto out; + /* Check the root port */ + dn = dn->child; } - dn = pci_device_to_OF_node(pdev); if (!dn) { loc = "N/A"; goto out; @@ -846,8 +836,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) loc = "N/A"; out: - if (pci_is_root_bus(bus) && pdev) - pci_dev_put(pdev); return loc; } -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] powerpc/eeh: Wrong place to call pci_get_slot()
pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c1367af0 ti: c1444000 task.ti: c1444000 NIP: c0497b70 LR: c0037530 CTR: 3003d114 REGS: c1446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 90029032 CR: 48002422 XER: 2000 CFAR: c003752c SOFTE: 0 : NIP [c0497b70] .pci_get_slot+0x40/0x110 LR [c0037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Cc: sta...@vger.kernel.org Signed-off-by: Mike Qiu Acked-by: Gavin Shan --- Changelog[v3]: Remove unused variables Code refactoring for eeh_pe_loc_get() Changelog[v2]: Check the child device_node of root bus for root port directly instead of search pdev from device-tree and then translate it to device-node arch/powerpc/kernel/eeh_pe.c | 46 +--- 1 file changed, 13 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index fbd01eb..94802d2 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -802,53 +802,33 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) */ const char *eeh_pe_loc_get(struct eeh_pe *pe) { - struct pci_controller *hose; struct pci_bus *bus = eeh_pe_bus_get(pe); - struct pci_dev *pdev; - struct device_node *dn; - const char *loc; + struct device_node *dn = pci_bus_to_OF_node(bus); + const char *loc = NULL; - if (!bus) - return "N/A"; + if (!dn) + goto out; /* PHB PE or root PE ? */ if (pci_is_root_bus(bus)) { - hose = pci_bus_to_host(bus); - loc = of_get_property(hose->dn, - "ibm,loc-code", NULL); - if (loc) - return loc; - loc = of_get_property(hose->dn, - "ibm,io-base-loc-code", NULL); + loc = of_get_property(dn, "ibm,loc-code", NULL); + if (!loc) + loc = of_get_property(dn, "ibm,io-base-loc-code", NULL); if (loc) - return loc; - - pdev = pci_get_slot(bus, 0x0); - } else { - pdev = bus->self; - } - - if (!pdev) { - loc = "N/A"; - goto out; - } + goto out; - dn = pci_device_to_OF_node(pdev); - if (!dn) { - loc = "N/A"; - goto out; + /* Check the root port */ + dn = dn->child; + if (!dn) + goto out; } loc = of_get_property(dn, "ibm,loc-code", NULL); if (!loc) loc = of_get_property(dn, "ibm,slot-location-code", NULL); - if (!loc) - loc = "N/A"; out: - if (pci_is_root_bus(bus) && pdev) - pci_dev_put(pdev); - return loc; + return loc ? loc : "N/A"; } /** -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 06/25/2014 11:19 AM, Benjamin Herrenschmidt wrote: On Wed, 2014-06-25 at 11:05 +0800, Mike Qiu wrote: Here maybe "/sys/kernel/debug/powerpc/errinjct" is better, because it will supply "PCI_domain_nr" in parameters, so no need supply errinjct for each PCI domain. Another reason is error inject not only for PCI(in future), so better not in PCI domain entry. Also it simple for userland tools to has a fixed path. I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Thanks, Mike Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. Thanks, Mike Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 07/22/2014 11:26 AM, Gavin Shan wrote: On Tue, Jul 22, 2014 at 11:10:42AM +0800, Mike Qiu wrote: On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. We won't use syscall for routing the error injection on PowerNV any more. Generally speaking, we will use ioctl commands or subcode of EEH ioctl command, which was invented for EEH support for VFIO devices to suport QEMU. For the utility (errinjct) running on PowerNV, we will use debugfs entries. I have premature code for that, but don't have chance to polish it yet. Let me send you that so that you can start working from there. OK, thanks Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/powernv: Avoid to set EEH_PE_ISOLATED for passed PE
When PE passed to guest, and guest EEH occured with this PE, EEH_PE_ISOLATED maybe set in host. It is a big issue when the PE is reused by host, host EEH will not work on this PE because it was set to EEH_PE_ISOLATED unexpectly. Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index c945bed..e88eaf6 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -371,7 +371,8 @@ static int ioda_eeh_get_pe_state(struct eeh_pe *pe) !(result & EEH_STATE_UNAVAILABLE) && !(result & EEH_STATE_MMIO_ACTIVE) && !(result & EEH_STATE_DMA_ACTIVE) && - !(pe->state & EEH_PE_ISOLATED)) { + !(pe->state & EEH_PE_ISOLATED)&& + !eeh_pe_passed(pe)) { if (phb->freeze_pe) phb->freeze_pe(phb, pe->addr); -- 1.8.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: Avoid to set EEH_PE_ISOLATED for passed PE
Hi, all After discussing with Gavin offline, it's inappropriate to drop ISOLATED state. Please ignore this patch. Otherwise, somebody will merge that to mainline, which would be a problem. Thanks, Mike On 08/13/2014 07:14 PM, Mike Qiu wrote: When PE passed to guest, and guest EEH occured with this PE, EEH_PE_ISOLATED maybe set in host. It is a big issue when the PE is reused by host, host EEH will not work on this PE because it was set to EEH_PE_ISOLATED unexpectly. Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index c945bed..e88eaf6 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -371,7 +371,8 @@ static int ioda_eeh_get_pe_state(struct eeh_pe *pe) !(result & EEH_STATE_UNAVAILABLE) && !(result & EEH_STATE_MMIO_ACTIVE) && !(result & EEH_STATE_DMA_ACTIVE) && - !(pe->state & EEH_PE_ISOLATED)) { + !(pe->state & EEH_PE_ISOLATED)&& + !eeh_pe_passed(pe)) { if (phb->freeze_pe) phb->freeze_pe(phb, pe->addr); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Fix 3bc95598 'powerpc/PCI: Use list_for_each_entry() for bus traversal'
Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0041d78 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0041d78] .sys_pciconfig_iobase+0x68/0x1f0 LR [c0041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 Call Trace: [c003b4787db0] [c0041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable) [c003b4787e30] [c0009ed8] syscall_exit+0x0/0x98 This bug was introduced by commit 3bc955987fb377f3c95bc29deb498e96819b8451 The root cause was the 'bus' has been set to null while try to access bus->next. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/pci_64.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 2a47790..7b6c1ae 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -209,6 +209,7 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus, { struct pci_controller* hose; struct pci_bus *bus = NULL; + struct pci_bus *tmp_bus = NULL; struct device_node *hose_node; /* Argh ! Please forgive me for that hack, but that's the @@ -229,10 +230,12 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus, * used on pre-domains setup. We return the first match */ - list_for_each_entry(bus, &pci_root_buses, node) { - if (in_bus >= bus->number && in_bus <= bus->busn_res.end) + list_for_each_entry(tmp_bus, &pci_root_buses, node) { + if (in_bus >= tmp_bus->number && + in_bus <= tmp_bus->busn_res.end) { + bus = tmp_bus; break; - bus = NULL; + } } if (bus == NULL || bus->dev.of_node == NULL) return -ENODEV; -- 1.8.0.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Fix 3bc95598 'powerpc/PCI: Use list_for_each_entry() for bus traversal'
On 04/10/2014 03:54 PM, Benjamin Herrenschmidt wrote: On Thu, 2014-04-10 at 02:51 -0400, Mike Qiu wrote: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0041d78 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0041d78] .sys_pciconfig_iobase+0x68/0x1f0 LR [c0041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 Call Trace: [c003b4787db0] [c0041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable) [c003b4787e30] [c0009ed8] syscall_exit+0x0/0x98 This bug was introduced by commit 3bc955987fb377f3c95bc29deb498e96819b8451 The root cause was the 'bus' has been set to null while try to access bus->next. Good catch. Out of curiosity, what is using that syscall nowadays ? It's been long buggy in all sort of ways and is pretty much deprecated... I just boot my Power7 machine with newest mainline kernel, it happens and block the system. I really do not know which software use this syscall, need to do some research on it. Thanks Mike Cheers, Ben. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/pci_64.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 2a47790..7b6c1ae 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -209,6 +209,7 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus, { struct pci_controller* hose; struct pci_bus *bus = NULL; + struct pci_bus *tmp_bus = NULL; struct device_node *hose_node; /* Argh ! Please forgive me for that hack, but that's the @@ -229,10 +230,12 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus, * used on pre-domains setup. We return the first match */ - list_for_each_entry(bus, &pci_root_buses, node) { - if (in_bus >= bus->number && in_bus <= bus->busn_res.end) + list_for_each_entry(tmp_bus, &pci_root_buses, node) { + if (in_bus >= tmp_bus->number && + in_bus <= tmp_bus->busn_res.end) { + bus = tmp_bus; break; - bus = NULL; + } } if (bus == NULL || bus->dev.of_node == NULL) return -ENODEV; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/mm: Fix ".__node_distance" undefined
CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h ... Building modules, stage 2. WARNING: 1 bad relocations c13d6a30 R_PPC64_ADDR64uprobes_fetch_type_table WRAParch/powerpc/boot/zImage.pseries WRAParch/powerpc/boot/zImage.epapr MODPOST 1849 modules ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 make: *** Waiting for unfinished jobs The reason is symbol "__node_distance" not been exported in powerpc. Signed-off-by: Mike Qiu --- arch/powerpc/mm/numa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 4ebbb9e..3b181b2 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -232,6 +232,7 @@ int __node_distance(int a, int b) return distance; } +EXPORT_SYMBOL(__node_distance); static void initialize_distance_lookup_table(int nid, const __be32 *associativity) -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm: Fix ".__node_distance" undefined
Any update about this patch ? Thanks Mike On 04/15/2014 10:00 PM, Mike Qiu wrote: CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h ... Building modules, stage 2. WARNING: 1 bad relocations c13d6a30 R_PPC64_ADDR64uprobes_fetch_type_table WRAParch/powerpc/boot/zImage.pseries WRAParch/powerpc/boot/zImage.epapr MODPOST 1849 modules ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 make: *** Waiting for unfinished jobs The reason is symbol "__node_distance" not been exported in powerpc. Signed-off-by: Mike Qiu --- arch/powerpc/mm/numa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 4ebbb9e..3b181b2 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -232,6 +232,7 @@ int __node_distance(int a, int b) return distance; } +EXPORT_SYMBOL(__node_distance); static void initialize_distance_lookup_table(int nid, const __be32 *associativity) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev