Re: [PATCH v8 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC
On Sun, May 05, 2013 at 03:41:49AM +0100, Benjamin Herrenschmidt wrote: > On Mon, 2013-04-22 at 11:41 +0100, Andrew Murray wrote: > > The pci_process_bridge_OF_ranges function, used to parse the "ranges" > > property of a PCI host device, is found in both Microblaze and PowerPC > > architectures. These implementations are nearly identical. This patch > > moves this common code to a common place. > > What's happening with this ? I'd like to avoid that patch for now > as I'm doing some changes to pci_process_bridge_OF_ranges > which are fairly urgent (I might even stick them in the current > merge window) to deal with memory windows having separate offsets. There were no objections to this latest revision until now and it is currently sitting with Jason Cooper (mvebu-next/pcie). [1] > > There's also a few hacks in there that are really ppc specific... > > I think the right long term approach is to change the way powerpc > (and microblaze ?) initializes PCI host bridges. Move it away from > setup_arch() (which is a PITA anyway since it's way too early) to > an early init call of some sort, and encapsulate the new struct > pci_host_bridge. > > We can then directly configure the host bridge windows rather > than having this "intermediary" set of resources in our pci_controller > and in fact move most of the fields from pci_controller to > pci_host_bridge to the point where the former can remain as a > simple platform specific wrapper if needed. This is a view that was also shared by Bjorn [2] when I attempted to submit a patchset which moves struct pci_controller to asm-generic. > > So for new stuff (hint: DT based ARM PCI) or stuff that has to deal with > a lot less archaic platforms (hint: Microblaze), I'd recommend going > straight for that approach rather than perpetuating the PowerPC code > which I'll try to deal with in the next few monthes. The motativation for my patchsets were to give a way for ARM PCI host bridge drivers to parse the DT ranges property, but this snow-balled into unifying pci_process_bridge_OF_ranges. My v8 patchset provides a of_pci_range_parser which is used directly by a few ARM PCI DT host bridge drivers, this has been generally accepted and tested. I don't see why this can't remain and so I'd really like to keep this around. Grant, Benjamin would you be happy for me to resubmit this series which provides the of_pci_range_parser which will be used by the separate implementations of pci_process_bridge_OF_ranges in PowerPC/Microblaze? Benjamin are you able to still use of_pci_range_parser in your 'Support per-aperture memory offset' patch? Thanks, Andrew Murray [1] https://lkml.org/lkml/2013/4/22/505 [2] https://patchwork.kernel.org/patch/2487671 > > Cheers, > Ben. > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v9 1/3] of/pci: Provide support for parsing PCI DT ranges property
This patch factors out common implementation patterns to reduce overall kernel code and provide a means for host bridge drivers to directly obtain struct resources from the DT's ranges property without relying on architecture specific DT handling. This will make it easier to write archiecture independent host bridge drivers and mitigate against further duplication of DT parsing code. This patch can be used in the following way: struct of_pci_range_parser parser; struct of_pci_range range; if (of_pci_range_parser_init(&parser, np)) ; //no ranges property for_each_of_pci_range(&parser, &range) { /* directly access properties of the address range, e.g.: range.pci_space, range.pci_addr, range.cpu_addr, range.size, range.flags alternatively obtain a struct resource, e.g.: struct resource res; of_pci_range_to_resource(&range, np, &res); */ } Additionally the implementation takes care of adjacent ranges and merges them into a single range (as was the case with powerpc and microblaze). Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Signed-off-by: Thomas Petazzoni Reviewed-by: Rob Herring Tested-by: Thomas Petazzoni Tested-by: Linus Walleij Tested-by: Jingoo Han Acked-by: Grant Likely --- drivers/of/address.c | 67 include/linux/of_address.h | 48 +++ 2 files changed, 115 insertions(+), 0 deletions(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 04da786..fdd0636 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, int bar, return __of_address_to_resource(dev, addrp, size, flags, NULL, r); } EXPORT_SYMBOL_GPL(of_pci_address_to_resource); + +int of_pci_range_parser_init(struct of_pci_range_parser *parser, + struct device_node *node) +{ + const int na = 3, ns = 2; + int rlen; + + parser->node = node; + parser->pna = of_n_addr_cells(node); + parser->np = parser->pna + na + ns; + + parser->range = of_get_property(node, "ranges", &rlen); + if (parser->range == NULL) + return -ENOENT; + + parser->end = parser->range + rlen / sizeof(__be32); + + return 0; +} +EXPORT_SYMBOL_GPL(of_pci_range_parser_init); + +struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser *parser, + struct of_pci_range *range) +{ + const int na = 3, ns = 2; + + if (!range) + return NULL; + + if (!parser->range || parser->range + parser->np > parser->end) + return NULL; + + range->pci_space = parser->range[0]; + range->flags = of_bus_pci_get_flags(parser->range); + range->pci_addr = of_read_number(parser->range + 1, ns); + range->cpu_addr = of_translate_address(parser->node, + parser->range + na); + range->size = of_read_number(parser->range + parser->pna + na, ns); + + parser->range += parser->np; + + /* Now consume following elements while they are contiguous */ + while (parser->range + parser->np <= parser->end) { + u32 flags, pci_space; + u64 pci_addr, cpu_addr, size; + + pci_space = be32_to_cpup(parser->range); + flags = of_bus_pci_get_flags(parser->range); + pci_addr = of_read_number(parser->range + 1, ns); + cpu_addr = of_translate_address(parser->node, + parser->range + na); + size = of_read_number(parser->range + parser->pna + na, ns); + + if (flags != range->flags) + break; + if (pci_addr != range->pci_addr + range->size || + cpu_addr != range->cpu_addr + range->size) + break; + + range->size += size; + parser->range += parser->np; + } + + return range; +} +EXPORT_SYMBOL_GPL(of_pci_range_parser_one); + #endif /* CONFIG_PCI */ /* diff --git a/include/linux/of_address.h b/include/linux/of_address.h index 0506eb5..4c2e6f2 100644 --- a/include/linux/of_address.h +++ b/include/linux/of_address.h @@ -4,6 +4,36 @@ #include #include +struct of_pci_range_parser { + struct device_node *node; + const __be32 *range; + const __be32 *end; + int np; + int pna; +}; + +struct of_pci_range { + u32 pci_space; + u64 pci_addr; + u64 c
[PATCH v9 0/3] of/pci: Provide common support for PCI DT parsing
This patchset factors out duplicated code associated with parsing PCI DT "ranges" properties across the architectures and introduces a "ranges" parser. This parser "of_pci_range_parser" can be used directly by ARM host bridge drivers enabling them to obtain ranges from device trees. I've included the Reviewed-by, Tested-by and Acked-by's received from v5/v6/v7/v8 in this patchset, earlier versions of this patchset (v3) have been tested-by: Thierry Reding Jingoo Han I've tested that this patchset builds and runs on ARM and that it builds on PowerPC, x86_64, MIPS and Microblaze. Compared to the v8 sent by Andrew Murray, the following changes have been made (please note that the MIPS patch is unchanged from v8): * Remove the unification of pci_process_bridge_OF_ranges between PowerPC and Microblaze. Feedback from Bjorn and Benjamin (along with a NAK) suggested that this goes against their future direction (using more of struct pci_host_bridge and less of arch specific struct pci_controller). Compared to the v7 sent by Andrew Murray, the following changes have been made (please note that the first patch is unchanged from v7): * Rename of_pci_range_parser to of_pci_range_parser_init and of_pci_process_ranges to of_pci_range_parser_one as suggested by Grant Likely. * Reverted back to using a switch statement instead of if/else in pci_process_bridge_OF_ranges. Grant Likely highlighted this change from the original code which was unnecessary. * Squashed in a patch provided by Gabor Juhos which fixes build errors on MIPS found in the last patchset. Compared to the v6 sent by Andrew Murray, the following changes have been made in response to build errors/warnings: * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal Simek to prevent compilation failures on Microblaze (and others) and his ack. * Use of externs, static inlines and a typo in linux/of_address.h in response to linker errors (multiple defination) on x86_64 as spotted by a kbuild test robot on (jcooper/linux.git mvebu/drivers) * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent with of_pci_process_ranges function Compared to the v5 sent by Andrew Murray, the following changes have been made: * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by Rob Herring in drivers/of/of_pci.c * Added forward declaration of struct pci_controller in linux/of_pci.h to prevent compiler warning as suggested by Thomas Petazzoni * Improved error checking (!range check), removal of unnecessary be32_to_cpup call, improved formatting of struct of_pci_range_parser layout and replacement of macro with a static inline. All suggested by Rob Herring. Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray, the following changes have been made: * Split the patch as suggested by Rob Herring Compared to the v3 sent by Andrew Murray, the following changes have been made: * Unify and move duplicate pci_process_bridge_OF_ranges functions to drivers/of/of_pci.c as suggested by Rob Herring * Fix potential build errors with Microblaze/MIPS Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT ranges property", the following changes have been made: * Correct use of IORESOURCE_* as suggested by Russell King * Improved interface and naming as suggested by Thierry Reding Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did: * Add a memset() on the struct of_pci_range_iter when starting the for loop in for_each_pci_range(). Otherwise, with an uninitialized of_pci_range_iter, of_pci_process_ranges() may crash. * Add parenthesis around 'res', 'np' and 'iter' in the for_each_of_pci_range macro definitions. Otherwise, passing something like &foobar as 'res' didn't work. * Rebased on top of 3.9-rc2, which required fixing a few conflicts in the Microblaze code. v2: This follows on from suggestions made by Grant Likely (marc.info/?l=linux-kernel&m=136079602806328) Andrew Murray (3): of/pci: Provide support for parsing PCI DT ranges property of/pci: mips: convert to common of_pci_range_parser of/pci: microblaze: convert to common of_pci_range_parser arch/microblaze/pci/pci-common.c | 106 ++ arch/mips/pci/pci.c | 50 ++--- drivers/of/address.c | 67 include/linux/of_address.h | 48 + 4 files changed, 171 insertions(+), 100 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v9 2/3] of/pci: mips: convert to common of_pci_range_parser
This patch converts the pci_load_of_ranges function to use the new common of_pci_range_parser. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Signed-off-by: Gabor Juhos Reviewed-by: Rob Herring Reviewed-by: Grant Likely Tested-by: Linus Walleij --- arch/mips/pci/pci.c | 50 ++ 1 files changed, 18 insertions(+), 32 deletions(-) diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c index 0872f12..0d291e9 100644 --- a/arch/mips/pci/pci.c +++ b/arch/mips/pci/pci.c @@ -122,51 +122,37 @@ static void pcibios_scanbus(struct pci_controller *hose) #ifdef CONFIG_OF void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) { - const __be32 *ranges; - int rlen; - int pna = of_n_addr_cells(node); - int np = pna + 5; + struct of_pci_range range; + struct of_pci_range_parser parser; pr_info("PCI host bridge %s ranges:\n", node->full_name); - ranges = of_get_property(node, "ranges", &rlen); - if (ranges == NULL) - return; hose->of_node = node; - while ((rlen -= np * 4) >= 0) { - u32 pci_space; + if (of_pci_range_parser_init(&parser, node)) + return; + + for_each_of_pci_range(&parser, &range) { struct resource *res = NULL; - u64 addr, size; - - pci_space = be32_to_cpup(&ranges[0]); - addr = of_translate_address(node, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - ranges += np; - switch ((pci_space >> 24) & 0x3) { - case 1: /* PCI IO space */ + + switch (range.flags & IORESOURCE_TYPE_BITS) { + case IORESOURCE_IO: pr_info(" IO 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.cpu_addr, + range.cpu_addr + range.size - 1); hose->io_map_base = - (unsigned long)ioremap(addr, size); + (unsigned long)ioremap(range.cpu_addr, + range.size); res = hose->io_resource; - res->flags = IORESOURCE_IO; break; - case 2: /* PCI Memory space */ - case 3: /* PCI 64 bits Memory space */ + case IORESOURCE_MEM: pr_info(" MEM 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.cpu_addr, + range.cpu_addr + range.size - 1); res = hose->mem_resource; - res->flags = IORESOURCE_MEM; break; } - if (res != NULL) { - res->start = addr; - res->name = node->full_name; - res->end = res->start + size - 1; - res->parent = NULL; - res->sibling = NULL; - res->child = NULL; - } + if (res != NULL) + of_pci_range_to_resource(&range, node, res); } } #endif -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v9 3/3] of/pci: microblaze: convert to common of_pci_range_parser
This patch converts the pci_load_of_ranges function to use the new common of_pci_range_parser. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau --- arch/microblaze/pci/pci-common.c | 106 ++ 1 files changed, 38 insertions(+), 68 deletions(-) diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c index 9ea521e..ba9e4a1 100644 --- a/arch/microblaze/pci/pci-common.c +++ b/arch/microblaze/pci/pci-common.c @@ -658,67 +658,42 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar, void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev, int primary) { - const u32 *ranges; - int rlen; - int pna = of_n_addr_cells(dev); - int np = pna + 5; int memno = 0, isa_hole = -1; - u32 pci_space; - unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size; unsigned long long isa_mb = 0; struct resource *res; + struct of_pci_range range; + struct of_pci_range_parser parser; pr_info("PCI host bridge %s %s ranges:\n", dev->full_name, primary ? "(primary)" : ""); - /* Get ranges property */ - ranges = of_get_property(dev, "ranges", &rlen); - if (ranges == NULL) + /* Check for ranges property */ + if (of_pci_range_parser_init(&parser, dev)) return; - /* Parse it */ pr_debug("Parsing ranges property...\n"); - while ((rlen -= np * 4) >= 0) { + for_each_of_pci_range(&parser, &range) { /* Read next ranges element */ - pci_space = ranges[0]; - pci_addr = of_read_number(ranges + 1, 2); - cpu_addr = of_translate_address(dev, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ", - pci_space, pci_addr); + range.pci_space, range.pci_addr); pr_debug("cpu_addr:0x%016llx size:0x%016llx\n", - cpu_addr, size); - - ranges += np; + range.cpu_addr, range.size); /* If we failed translation or got a zero-sized region * (some FW try to feed us with non sensical zero sized regions * such as power3 which look like some kind of attempt * at exposing the VGA memory hole) */ - if (cpu_addr == OF_BAD_ADDR || size == 0) + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) continue; - /* Now consume following elements while they are contiguous */ - for (; rlen >= np * sizeof(u32); -ranges += np, rlen -= np * 4) { - if (ranges[0] != pci_space) - break; - pci_next = of_read_number(ranges + 1, 2); - cpu_next = of_translate_address(dev, ranges + 3); - if (pci_next != pci_addr + size || - cpu_next != cpu_addr + size) - break; - size += of_read_number(ranges + pna + 3, 2); - } - /* Act based on address space type */ res = NULL; - switch ((pci_space >> 24) & 0x3) { - case 1: /* PCI IO space */ + switch (range.flags & IORESOURCE_TYPE_BITS) { + case IORESOURCE_IO: pr_info(" IO 0x%016llx..0x%016llx -> 0x%016llx\n", - cpu_addr, cpu_addr + size - 1, pci_addr); + range.cpu_addr, range.cpu_addr + range.size - 1, + range.pci_addr); /* We support only one IO range */ if (hose->pci_io_size) { @@ -726,11 +701,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose, continue; } /* On 32 bits, limit I/O space to 16MB */ - if (size > 0x0100) - size = 0x0100; + if (range.size > 0x0100) + range.size = 0x0100; /* 32 bits needs to map IOs here */ - hose->io_base_virt = ioremap(cpu_addr, size); + hose->io_base_virt = ioremap(range.cpu_addr, + range.size); /* Expect trouble if pci_addr is not 0 */
Re: pci and pcie device-tree binding - range No cells
On Wed, Dec 12, 2012 at 10:49 AM, Grant Likely wrote: > On Wed, Dec 12, 2012 at 10:37 AM, Michal Simek > mailto:mon...@monstr.eu>> wrote: > > On 12/10/2012 10:41 PM, Grant Likely wrote: > >> drivers/pci/pci-of.c would be good. I'd also accept drivers/of/pci.c > >> which might actually be a good idea in the short term so that it gets > >> appropriate supervision while being generalized before being moved into > >> the pci directory. > > > > Ben: Are you willing to move that ppc code to this location? > > It is probably not good idea that I should do it when I even don't have > > hardware available for testing (Asking someone else). > > You're a clever guy, you are more than capable of crafting the patch, > even if you can't test on hardware. :-) > > I refactored most of the OF support code without having access to most > of the affected hardware. Once I got the changes out there for review > I also asked for spot testing before getting it into linux-next for > even more testing. I've been working on a relatively architecture agnostic PCI host bridge driver and also wanted to avoid duplicating more generic DT parsing code for PCI bindings. I've ended up with a patch which provides an iterator for returning resources based on the the typical 'ranges' binding. This has ended up living in drivers/of/address.c. I originally started out in drivers/of/pci.c and drivers/pci/pci-of.c but found there were good (and static) implementations in drivers/of/address.c which can be reused (e.g. of_bus_pci_get_flags, bus->count_cells). I'm not just ready to post it - but can do before early next week if you can wait. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] pci: Provide support for parsing PCI DT ranges property
DT bindings for PCI host bridges often use the ranges property to describe memory and IO ranges - this binding tends to be the same across architectures yet several parsing implementations exist, e.g. arch/mips/pci/pci.c, arch/powerpc/kernel/pci-common.c, arch/sparc/kernel/pci.c and arch/microblaze/pci/pci-common.c (clone of PPC). Some of these duplicate functionality provided by drivers/of/address.c. This patch provides a common iterator-based parser for the ranges property, it is hoped this will reduce DT representation differences between architectures and that architectures will migrate in part to this new parser. It is also hoped (and the motativation for the patch) that this patch will reduce duplication of code when writing host bridge drivers that are supported by multiple architectures. This patch provides struct resources from a device tree node, e.g.: u32 *last = NULL; struct resource res; while ((last = of_pci_process_ranges(np, res, last))) { //do something with res } Platforms with quirks can then do what they like with the resource or migrate common quirk handling to the parser. In an ideal world drivers can just request the obtained resources and pass them on (e.g. pci_add_resource_offset). Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau --- drivers/of/address.c | 53 +++- include/linux/of_address.h |7 + 2 files changed, 59 insertions(+), 1 deletions(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 7e262a6..03bfe61 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -219,6 +219,57 @@ int of_pci_address_to_resource(struct device_node *dev, int bar, return __of_address_to_resource(dev, addrp, size, flags, NULL, r); } EXPORT_SYMBOL_GPL(of_pci_address_to_resource); + +const __be32 *of_pci_process_ranges(struct device_node *node, + struct resource *res, const __be32 *from) +{ + const __be32 *start, *end; + int na, ns, np, pna; + int rlen; + struct of_bus *bus; + WARN_ON(!res); + + bus = of_match_bus(node); + bus->count_cells(node, &na, &ns); + if (!OF_CHECK_COUNTS(na, ns)) { + pr_err("Bad cell count for %s\n", node->full_name); + return NULL; + } + + pna = of_n_addr_cells(node); + np = pna + na + ns; + + start = of_get_property(node, "ranges", &rlen); + if (start == NULL) + return NULL; + + end = start + rlen; + + if (!from) + from = start; + + while (from + np <= end) { + u64 cpu_addr, size; + + cpu_addr = of_translate_address(node, from + na); + size = of_read_number(from + na + pna, ns); + res->flags = bus->get_flags(from); + from += np; + + if (cpu_addr == OF_BAD_ADDR || size == 0) + continue; + + res->name = node->full_name; + res->start = cpu_addr; + res->end = res->start + size - 1; + res->parent = res->child = res->sibling = NULL; + return from; + } + + return NULL; +} +EXPORT_SYMBOL_GPL(of_pci_process_ranges); + #endif /* CONFIG_PCI */ /* @@ -421,7 +472,7 @@ u64 __of_translate_address(struct device_node *dev, const __be32 *in_addr, goto bail; bus = of_match_bus(parent); - /* Cound address cells & copy address locally */ + /* Count address cells & copy address locally */ bus->count_cells(dev, &na, &ns); if (!OF_CHECK_COUNTS(na, ns)) { printk(KERN_ERR "prom_parse: Bad cell count for %s\n", diff --git a/include/linux/of_address.h b/include/linux/of_address.h index 01b925a..4582b20 100644 --- a/include/linux/of_address.h +++ b/include/linux/of_address.h @@ -26,6 +26,8 @@ static inline unsigned long pci_address_to_pio(phys_addr_t addr) { return -1; } #define pci_address_to_pio pci_address_to_pio #endif +const __be32 *of_pci_process_ranges(struct device_node *node, + struct resource *res, const __be32 *from); #else /* CONFIG_OF_ADDRESS */ static inline int of_address_to_resource(struct device_node *dev, int index, struct resource *r) @@ -48,6 +50,11 @@ static inline const u32 *of_get_address(struct device_node *dev, int index, { return NULL; } +const __be32 *of_pci_process_ranges(struct device_node *node, + struct resource *res, const __be32 *from) +{ + return NULL; +} #endif /* CONFIG_OF_ADDRESS */ -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: pci and pcie device-tree binding - range No cells
On Wed, Dec 12, 2012 at 01:34:24PM +, Thierry Reding wrote: > On Wed, Dec 12, 2012 at 12:19:12PM +0000, Andrew Murray wrote: > > I've been working on a relatively architecture agnostic PCI host bridge > > driver > > and also wanted to avoid duplicating more generic DT parsing code for PCI > > bindings. > > > > I've ended up with a patch which provides an iterator for returning > > resources > > based on the the typical 'ranges' binding. This has ended up living in > > drivers/of/address.c. I originally started out in drivers/of/pci.c and > > drivers/pci/pci-of.c but found there were good (and static) implementations > > in > > drivers/of/address.c which can be reused (e.g. of_bus_pci_get_flags, > > bus->count_cells). > > > > I'm not just ready to post it - but can do before early next week if you can > > wait. > > I already posted a similar patch[0] as part of a larger series to bring > DT support to Tegra PCIe back in July. I suppose what you have must be > something pretty close to that. Most of the stuff that had me occupied > since then should be done soon and I was planning on resurrecting the > series one of these days. Thanks for the reference. I've submitted my patch, it's along the lines of your existing patch. I'm happy to take the best bits from both, drop mine, etc. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] pci: Provide support for parsing PCI DT ranges property
On Thu, Dec 13, 2012 at 09:13:33AM +, Thierry Reding wrote: > Hi Andrew, > > I don't like iterator interfaces too much, but I can live with that. > Other than that the patch looks good to me and I'll try to work it into > my Tegra PCIe patch series. > > Just two minor comments below. > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > [...] > > @@ -421,7 +472,7 @@ u64 __of_translate_address(struct device_node *dev, > > const __be32 *in_addr, > > goto bail; > > bus = of_match_bus(parent); > > > > - /* Cound address cells & copy address locally */ > > + /* Count address cells & copy address locally */ > > bus->count_cells(dev, &na, &ns); > > if (!OF_CHECK_COUNTS(na, ns)) { > > printk(KERN_ERR "prom_parse: Bad cell count for %s\n", > > This is really minor, but it should still go into a separate patch. > > > diff --git a/include/linux/of_address.h b/include/linux/of_address.h > > index 01b925a..4582b20 100644 > > --- a/include/linux/of_address.h > > +++ b/include/linux/of_address.h > > @@ -26,6 +26,8 @@ static inline unsigned long > > pci_address_to_pio(phys_addr_t addr) { return -1; } > > #define pci_address_to_pio pci_address_to_pio > > #endif > > > > +const __be32 *of_pci_process_ranges(struct device_node *node, > > + struct resource *res, const __be32 *from); > > #else /* CONFIG_OF_ADDRESS */ > > static inline int of_address_to_resource(struct device_node *dev, int > > index, > > struct resource *r) > > @@ -48,6 +50,11 @@ static inline const u32 *of_get_address(struct > > device_node *dev, int index, > > { > > return NULL; > > } > > +const __be32 *of_pci_process_ranges(struct device_node *node, > > There should be a blank line to separate the above two lines. > Thanks for the feedback. I will send another patch for the typo and leave this patch with you for working into your existing series. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] pci: Provide support for parsing PCI DT ranges property
On Thu, Dec 13, 2012 at 10:03:18AM +, Thierry Reding wrote: > On Thu, Dec 13, 2012 at 09:45:43AM +0000, Andrew Murray wrote: > > On Thu, Dec 13, 2012 at 09:13:33AM +, Thierry Reding wrote: > > > Hi Andrew, > > > > > > I don't like iterator interfaces too much, but I can live with that. > > > Other than that the patch looks good to me and I'll try to work it into > > > my Tegra PCIe patch series. > > > > > > Just two minor comments below. > > > > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > > [...] > > > > @@ -421,7 +472,7 @@ u64 __of_translate_address(struct device_node *dev, > > > > const __be32 *in_addr, > > > > goto bail; > > > > bus = of_match_bus(parent); > > > > > > > > - /* Cound address cells & copy address locally */ > > > > + /* Count address cells & copy address locally */ > > > > bus->count_cells(dev, &na, &ns); > > > > if (!OF_CHECK_COUNTS(na, ns)) { > > > > printk(KERN_ERR "prom_parse: Bad cell count for %s\n", > > > > > > This is really minor, but it should still go into a separate patch. > > > > > > > diff --git a/include/linux/of_address.h b/include/linux/of_address.h > > > > index 01b925a..4582b20 100644 > > > > --- a/include/linux/of_address.h > > > > +++ b/include/linux/of_address.h > > > > @@ -26,6 +26,8 @@ static inline unsigned long > > > > pci_address_to_pio(phys_addr_t addr) { return -1; } > > > > #define pci_address_to_pio pci_address_to_pio > > > > #endif > > > > > > > > +const __be32 *of_pci_process_ranges(struct device_node *node, > > > > + struct resource *res, const __be32 > > > > *from); > > > > #else /* CONFIG_OF_ADDRESS */ > > > > static inline int of_address_to_resource(struct device_node *dev, int > > > > index, > > > > struct resource *r) > > > > @@ -48,6 +50,11 @@ static inline const u32 *of_get_address(struct > > > > device_node *dev, int index, > > > > { > > > > return NULL; > > > > } > > > > +const __be32 *of_pci_process_ranges(struct device_node *node, > > > > > > There should be a blank line to separate the above two lines. > > > > > > > Thanks for the feedback. > > > > I will send another patch for the typo and leave this patch with you for > > working into your existing series. > > I suppose you have your own series that uses this patch? Not yet, it may be some time before I submit my PCI host bridge driver. Though I am making changes else where (e.g. this patch) which I'm hoping to submit as early as possible. I can rebase my work for these upstream dependencies. I can re-spin this patch with your suggested changes if you prefer? Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] pci: Provide support for parsing PCI DT ranges property
On Thu, Dec 20, 2012 at 08:25:00AM +, Thierry Reding wrote: > On Wed, Dec 12, 2012 at 04:37:50PM +0000, Andrew Murray wrote: > [...] > > diff --git a/drivers/of/address.c b/drivers/of/address.c > [...] > > + start = of_get_property(node, "ranges", &rlen); > > + if (start == NULL) > > + return NULL; > > + > > + end = start + rlen; > > I'm currently rewriting large parts of the Tegra PCIe controller driver > and I'm trying to use this new API. This seems to work fine, except that > I think this line needs to be: > > end = start + rlen / sizeof(__be32); > > Otherwise we'll try to process 4 times as many ranges as there are. > > Thierry Good catch. Thanks for taking this on. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] pci: Provide support for parsing PCI DT ranges property
On Sat, Dec 15, 2012 at 01:06:41AM +, Grant Likely wrote: > On Wed, Dec 12, 2012 at 4:37 PM, Andrew Murray wrote: > > DT bindings for PCI host bridges often use the ranges property to describe > > memory and IO ranges - this binding tends to be the same across > > architectures > > yet several parsing implementations exist, e.g. arch/mips/pci/pci.c, > > arch/powerpc/kernel/pci-common.c, arch/sparc/kernel/pci.c and > > arch/microblaze/pci/pci-common.c (clone of PPC). Some of these duplicate > > functionality provided by drivers/of/address.c. > > Hi Andrew, > > Thanks for looking into this. This definitely needs to be done. > > However, I cannot merge this patch as-is because it actually makes > things worse by adding yet another implementation of the parsing code. > Plus it doesn't actually have any users. :-) I understand. Though I see Thierry has included this in his patch set - I guess that means there is potentially one user now :). > > Instead, move the existing code that you need out of > arch/powerpc/kernel/pci-common.c into a shared place and add in the > features you need. Bonus points if you fixup microblaze or others at > the same time. In most part the patch I submitted was the common code from powerpc but without quirks and tie-ins to powerpc structures. I'd like to convert powerpc to using my patch and others but won't get time to do this at present :( > > g. > Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v7 0/3] of/pci: Provide common support for PCI DT parsing
This patchset factors out duplicated code associated with parsing PCI DT "ranges" properties across the architectures and introduces a "ranges" parser. This parser "of_pci_range_parser" can be used directly by ARM host bridge drivers enabling them to obtain ranges from device trees. I've included the Reviewed-by and Tested-by's received from v5/v6 in this patchset, earlier versions of this patchset (v3) have been tested-by: Thierry Reding Jingoo Han I've tested that this patchset builds and runs on ARM and that it builds on PowerPC and x86_64. Compared to the v6 sent by Andrew Murray, the following changes have been made in response to build errors/warnings: * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal Simek to prevent compilation failures on Microblaze (and others) and his ack. * Use of externs, static inlines and a typo in linux/of_address.h in response to linker errors (multiple defination) on x86_64 as spotted by a kbuild test robot on (jcooper/linux.git mvebu/drivers) * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent with of_pci_process_ranges function Compared to the v5 sent by Andrew Murray, the following changes have been made: * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by Rob Herring in drivers/of/of_pci.c * Added forward declaration of struct pci_controller in linux/of_pci.h to prevent compiler warning as suggested by Thomas Petazzoni * Improved error checking (!range check), removal of unnecessary be32_to_cpup call, improved formatting of struct of_pci_range_parser layout and replacement of macro with a static inline. All suggested by Rob Herring. Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray, the following changes have been made: * Split the patch as suggested by Rob Herring Compared to the v3 sent by Andrew Murray, the following changes have been made: * Unify and move duplicate pci_process_bridge_OF_ranges functions to drivers/of/of_pci.c as suggested by Rob Herring * Fix potential build errors with Microblaze/MIPS Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT ranges property", the following changes have been made: * Correct use of IORESOURCE_* as suggested by Russell King * Improved interface and naming as suggested by Thierry Reding Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did: * Add a memset() on the struct of_pci_range_iter when starting the for loop in for_each_pci_range(). Otherwise, with an uninitialized of_pci_range_iter, of_pci_process_ranges() may crash. * Add parenthesis around 'res', 'np' and 'iter' in the for_each_of_pci_range macro definitions. Otherwise, passing something like &foobar as 'res' didn't work. * Rebased on top of 3.9-rc2, which required fixing a few conflicts in the Microblaze code. v2: This follows on from suggestions made by Grant Likely (marc.info/?l=linux-kernel&m=136079602806328) Andrew Murray (3): of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC of/pci: Provide support for parsing PCI DT ranges property of/pci: mips: convert to common of_pci_range_parser arch/microblaze/include/asm/pci-bridge.h |5 +- arch/microblaze/pci/pci-common.c | 192 -- arch/mips/pci/pci.c | 50 +++-- arch/powerpc/include/asm/pci-bridge.h|5 +- arch/powerpc/kernel/pci-common.c | 192 -- drivers/of/address.c | 67 +++ drivers/of/of_pci.c | 169 ++ include/linux/of_address.h | 46 +++ include/linux/of_pci.h |4 + 9 files changed, 304 insertions(+), 426 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v7 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC
The pci_process_bridge_OF_ranges function, used to parse the "ranges" property of a PCI host device, is found in both Microblaze and PowerPC architectures. These implementations are nearly identical. This patch moves this common code to a common place. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Reviewed-by: Rob Herring Tested-by: Thomas Petazzoni Tested-by: Linus Walleij Acked-by: Michal Simek --- arch/microblaze/include/asm/pci-bridge.h |5 +- arch/microblaze/pci/pci-common.c | 192 arch/powerpc/include/asm/pci-bridge.h|5 +- arch/powerpc/kernel/pci-common.c | 192 drivers/of/of_pci.c | 200 ++ include/linux/of_pci.h |4 + 6 files changed, 206 insertions(+), 392 deletions(-) diff --git a/arch/microblaze/include/asm/pci-bridge.h b/arch/microblaze/include/asm/pci-bridge.h index cb5d397..5783cd6 100644 --- a/arch/microblaze/include/asm/pci-bridge.h +++ b/arch/microblaze/include/asm/pci-bridge.h @@ -10,6 +10,7 @@ #include #include #include +#include struct device_node; @@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose, extern struct pci_controller *pci_find_hose_for_OF_device( struct device_node *node); -/* Fill up host controller resources from the OF node */ -extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, - struct device_node *dev, int primary); - /* Allocate & free a PCI host bridge structure */ extern struct pci_controller *pcibios_alloc_controller(struct device_node *dev); extern void pcibios_free_controller(struct pci_controller *phb); diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c index 9ea521e..2735ad9 100644 --- a/arch/microblaze/pci/pci-common.c +++ b/arch/microblaze/pci/pci-common.c @@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar, *end = rsrc->end - offset; } -/** - * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree - * @hose: newly allocated pci_controller to be setup - * @dev: device node of the host bridge - * @primary: set if primary bus (32 bits only, soon to be deprecated) - * - * This function will parse the "ranges" property of a PCI host bridge device - * node and setup the resource mapping of a pci controller based on its - * content. - * - * Life would be boring if it wasn't for a few issues that we have to deal - * with here: - * - * - We can only cope with one IO space range and up to 3 Memory space - * ranges. However, some machines (thanks Apple !) tend to split their - * space into lots of small contiguous ranges. So we have to coalesce. - * - * - We can only cope with all memory ranges having the same offset - * between CPU addresses and PCI addresses. Unfortunately, some bridges - * are setup for a large 1:1 mapping along with a small "window" which - * maps PCI address 0 to some arbitrary high address of the CPU space in - * order to give access to the ISA memory hole. - * The way out of here that I've chosen for now is to always set the - * offset based on the first resource found, then override it if we - * have a different offset and the previous was set by an ISA hole. - * - * - Some busses have IO space not starting at 0, which causes trouble with - * the way we do our IO resource renumbering. The code somewhat deals with - * it for 64 bits but I would expect problems on 32 bits. - * - * - Some 32 bits platforms such as 4xx can have physical space larger than - * 32 bits so we need to use 64 bits values for the parsing - */ -void pci_process_bridge_OF_ranges(struct pci_controller *hose, - struct device_node *dev, int primary) -{ - const u32 *ranges; - int rlen; - int pna = of_n_addr_cells(dev); - int np = pna + 5; - int memno = 0, isa_hole = -1; - u32 pci_space; - unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size; - unsigned long long isa_mb = 0; - struct resource *res; - - pr_info("PCI host bridge %s %s ranges:\n", - dev->full_name, primary ? "(primary)" : ""); - - /* Get ranges property */ - ranges = of_get_property(dev, "ranges", &rlen); - if (ranges == NULL) - return; - - /* Parse it */ - pr_debug("Parsing ranges property...\n"); - while ((rlen -= np * 4) >= 0) { - /* Read next ranges element */ - pci_space = ranges[0]; - pci_addr = of_read_number(ranges + 1, 2); - cpu_addr = of_translate_address(dev, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - -
[PATCH v7 2/3] of/pci: Provide support for parsing PCI DT ranges property
This patch factors out common implementation patterns to reduce overall kernel code and provide a means for host bridge drivers to directly obtain struct resources from the DT's ranges property without relying on architecture specific DT handling. This will make it easier to write archiecture independent host bridge drivers and mitigate against further duplication of DT parsing code. This patch can be used in the following way: struct of_pci_range_parser parser; struct of_pci_range range; if (of_pci_range_parser(&parser, np)) ; //no ranges property for_each_of_pci_range(&parser, &range) { /* directly access properties of the address range, e.g.: range.pci_space, range.pci_addr, range.cpu_addr, range.size, range.flags alternatively obtain a struct resource, e.g.: struct resource res; of_pci_range_to_resource(&range, np, &res); */ } Additionally the implementation takes care of adjacent ranges and merges them into a single range (as was the case with powerpc and microblaze). Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Signed-off-by: Thomas Petazzoni Reviewed-by: Rob Herring Tested-by: Thomas Petazzoni Tested-by: Linus Walleij --- drivers/of/address.c | 67 ++ drivers/of/of_pci.c| 113 include/linux/of_address.h | 46 ++ 3 files changed, 154 insertions(+), 72 deletions(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 04da786..6eec70c 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, int bar, return __of_address_to_resource(dev, addrp, size, flags, NULL, r); } EXPORT_SYMBOL_GPL(of_pci_address_to_resource); + +int of_pci_range_parser(struct of_pci_range_parser *parser, + struct device_node *node) +{ + const int na = 3, ns = 2; + int rlen; + + parser->node = node; + parser->pna = of_n_addr_cells(node); + parser->np = parser->pna + na + ns; + + parser->range = of_get_property(node, "ranges", &rlen); + if (parser->range == NULL) + return -ENOENT; + + parser->end = parser->range + rlen / sizeof(__be32); + + return 0; +} +EXPORT_SYMBOL_GPL(of_pci_range_parser); + +struct of_pci_range *of_pci_process_ranges(struct of_pci_range_parser *parser, + struct of_pci_range *range) +{ + const int na = 3, ns = 2; + + if (!range) + return NULL; + + if (!parser->range || parser->range + parser->np > parser->end) + return NULL; + + range->pci_space = parser->range[0]; + range->flags = of_bus_pci_get_flags(parser->range); + range->pci_addr = of_read_number(parser->range + 1, ns); + range->cpu_addr = of_translate_address(parser->node, + parser->range + na); + range->size = of_read_number(parser->range + parser->pna + na, ns); + + parser->range += parser->np; + + /* Now consume following elements while they are contiguous */ + while (parser->range + parser->np <= parser->end) { + u32 flags, pci_space; + u64 pci_addr, cpu_addr, size; + + pci_space = be32_to_cpup(parser->range); + flags = of_bus_pci_get_flags(parser->range); + pci_addr = of_read_number(parser->range + 1, ns); + cpu_addr = of_translate_address(parser->node, + parser->range + na); + size = of_read_number(parser->range + parser->pna + na, ns); + + if (flags != range->flags) + break; + if (pci_addr != range->pci_addr + range->size || + cpu_addr != range->cpu_addr + range->size) + break; + + range->size += size; + parser->range += parser->np; + } + + return range; +} +EXPORT_SYMBOL_GPL(of_pci_process_ranges); + #endif /* CONFIG_PCI */ /* diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 1626172..e5ab604 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || defined(CONFIG_MICROBLAZE) @@ -82,67 +83,43 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device); void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev,
[PATCH v7 3/3] of/pci: mips: convert to common of_pci_range_parser
This patch converts the pci_load_of_ranges function to use the new common of_pci_range_parser. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Reviewed-by: Rob Herring --- arch/mips/pci/pci.c | 50 -- 1 files changed, 16 insertions(+), 34 deletions(-) diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c index 0872f12..bee49a4 100644 --- a/arch/mips/pci/pci.c +++ b/arch/mips/pci/pci.c @@ -122,51 +122,33 @@ static void pcibios_scanbus(struct pci_controller *hose) #ifdef CONFIG_OF void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) { - const __be32 *ranges; - int rlen; - int pna = of_n_addr_cells(node); - int np = pna + 5; + struct of_pci_range_range range; + struct of_pci_range_parser parser; + u32 res_type; pr_info("PCI host bridge %s ranges:\n", node->full_name); - ranges = of_get_property(node, "ranges", &rlen); - if (ranges == NULL) - return; hose->of_node = node; - while ((rlen -= np * 4) >= 0) { - u32 pci_space; + if (of_pci_range_parser(&parser, node)) + return; + + for_each_of_pci_range(&parser, &range) { struct resource *res = NULL; - u64 addr, size; - - pci_space = be32_to_cpup(&ranges[0]); - addr = of_translate_address(node, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - ranges += np; - switch ((pci_space >> 24) & 0x3) { - case 1: /* PCI IO space */ + + res_type = range.flags & IORESOURCE_TYPE_BITS; + if (res_type == IORESOURCE_IO) { pr_info(" IO 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.addr, range.addr + range.size - 1); hose->io_map_base = - (unsigned long)ioremap(addr, size); + (unsigned long)ioremap(range.addr, range.size); res = hose->io_resource; - res->flags = IORESOURCE_IO; - break; - case 2: /* PCI Memory space */ - case 3: /* PCI 64 bits Memory space */ + } else if (res_type == IORESOURCE_MEM) { pr_info(" MEM 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.addr, range.addr + range.size - 1); res = hose->mem_resource; - res->flags = IORESOURCE_MEM; - break; - } - if (res != NULL) { - res->start = addr; - res->name = node->full_name; - res->end = res->start + size - 1; - res->parent = NULL; - res->sibling = NULL; - res->child = NULL; } + if (res != NULL) + of_pci_range_to_resource(&range, node, res); } } #endif -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC
On Tue, Apr 16, 2013 at 11:18:26AM +0100, Andrew Murray wrote: > The pci_process_bridge_OF_ranges function, used to parse the "ranges" > property of a PCI host device, is found in both Microblaze and PowerPC > architectures. These implementations are nearly identical. This patch > moves this common code to a common place. > > Signed-off-by: Andrew Murray > Signed-off-by: Liviu Dudau > Reviewed-by: Rob Herring > Tested-by: Thomas Petazzoni > Tested-by: Linus Walleij > Acked-by: Michal Simek > --- > arch/microblaze/include/asm/pci-bridge.h |5 +- > arch/microblaze/pci/pci-common.c | 192 > arch/powerpc/include/asm/pci-bridge.h|5 +- > arch/powerpc/kernel/pci-common.c | 192 Is there anyone on linuxppc-dev/linux-mips that can help test this patchset? I've tested that it builds on powerpc with a variety of configs (some which include fsl_pci.c implementation). Though I don't have hardware to verify that it works. I haven't tested this builds or runs on MIPS. You shouldn't see any difference in behaviour or new warnings and PCI devices should continue to operate as before. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 3/3] of/pci: mips: convert to common of_pci_range_parser
On Wed, Apr 17, 2013 at 04:42:48PM +0100, Linus Walleij wrote: > On Tue, Apr 16, 2013 at 12:18 PM, Andrew Murray wrote: > > > This patch converts the pci_load_of_ranges function to use the new common > > of_pci_range_parser. > > > > Signed-off-by: Andrew Murray > > Signed-off-by: Liviu Dudau > > Reviewed-by: Rob Herring > > Tested-by: Linus Walleij Jason - you may not have seen this, but here (Linus Walleij) is another Tested-by to add to this patch in your tree (if you can). Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 2/3] of/pci: Provide support for parsing PCI DT ranges property
On Thu, Apr 18, 2013 at 02:44:01PM +0100, Grant Likely wrote: > On Tue, 16 Apr 2013 11:18:27 +0100, Andrew Murray > wrote: > > This patch factors out common implementation patterns to reduce overall > > kernel > > code and provide a means for host bridge drivers to directly obtain struct > > resources from the DT's ranges property without relying on architecture > > specific > > DT handling. This will make it easier to write archiecture independent host > > bridge > > drivers and mitigate against further duplication of DT parsing code. > > > > This patch can be used in the following way: > > > > struct of_pci_range_parser parser; > > struct of_pci_range range; > > > > if (of_pci_range_parser(&parser, np)) > > ; //no ranges property > > > > for_each_of_pci_range(&parser, &range) { > > > > /* > > directly access properties of the address range, e.g.: > > range.pci_space, range.pci_addr, range.cpu_addr, > > range.size, range.flags > > > > alternatively obtain a struct resource, e.g.: > > struct resource res; > > of_pci_range_to_resource(&range, np, &res); > > */ > > } > > > > Additionally the implementation takes care of adjacent ranges and merges > > them > > into a single range (as was the case with powerpc and microblaze). > > > > Signed-off-by: Andrew Murray > > Signed-off-by: Liviu Dudau > > Signed-off-by: Thomas Petazzoni > > Reviewed-by: Rob Herring > > Tested-by: Thomas Petazzoni > > Tested-by: Linus Walleij > > Acked-by: Grant Likely > > But comments below... > > > --- > > drivers/of/address.c | 67 ++ > > drivers/of/of_pci.c| 113 > > > > include/linux/of_address.h | 46 ++ > > 3 files changed, 154 insertions(+), 72 deletions(-) > > > > diff --git a/drivers/of/address.c b/drivers/of/address.c > > index 04da786..6eec70c 100644 > > --- a/drivers/of/address.c > > +++ b/drivers/of/address.c > > @@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node > > *dev, int bar, > > return __of_address_to_resource(dev, addrp, size, flags, NULL, r); > > } > > EXPORT_SYMBOL_GPL(of_pci_address_to_resource); > > + > > +int of_pci_range_parser(struct of_pci_range_parser *parser, > > + struct device_node *node) > > +{ > > + const int na = 3, ns = 2; > > + int rlen; > > + > > + parser->node = node; > > + parser->pna = of_n_addr_cells(node); > > + parser->np = parser->pna + na + ns; > > + > > + parser->range = of_get_property(node, "ranges", &rlen); > > + if (parser->range == NULL) > > + return -ENOENT; > > + > > + parser->end = parser->range + rlen / sizeof(__be32); > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(of_pci_range_parser); > > "of_pci_range_parser_init" would be a clearer name > > > +struct of_pci_range *of_pci_process_ranges(struct of_pci_range_parser > > *parser, > > + struct of_pci_range *range) > > Similarly, "of_pci_range_parser_one" would be more consistent. > > > +{ > > + const int na = 3, ns = 2; > > + > > + if (!range) > > + return NULL; > > + > > + if (!parser->range || parser->range + parser->np > parser->end) > > + return NULL; > > + > > + range->pci_space = parser->range[0]; > > + range->flags = of_bus_pci_get_flags(parser->range); > > + range->pci_addr = of_read_number(parser->range + 1, ns); > > + range->cpu_addr = of_translate_address(parser->node, > > + parser->range + na); > > + range->size = of_read_number(parser->range + parser->pna + na, ns); > > + > > + parser->range += parser->np; > > + > > + /* Now consume following elements while they are contiguous */ > > + while (parser->range + parser->np <= parser->end) { > > + u32 flags, pci_space; > > + u64 pci_addr, cpu_addr, size; > > + > > + pci_space = be32_to_cpup(parser->range); > > + flags = of_bus_pci_ge
Re: [PATCH v7 3/3] of/pci: mips: convert to common of_pci_range_parser
On Thu, Apr 18, 2013 at 02:45:35PM +0100, Grant Likely wrote: > On Tue, 16 Apr 2013 11:18:28 +0100, Andrew Murray > wrote: > > This patch converts the pci_load_of_ranges function to use the new common > > of_pci_range_parser. > > > > Signed-off-by: Andrew Murray > > Signed-off-by: Liviu Dudau > > Reviewed-by: Rob Herring > > Looks okay to me, and thank you very much for doing the legwork to merge > the common code. No problem, though I think there is more than can be done in this area - there is a lot of similar PCI code floating about. > > Reviewed-by: Grant Likely Thanks for the review. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 2/3] of/pci: Provide support for parsing PCI DT ranges property
On Thu, Apr 18, 2013 at 04:29:54PM +0100, Grant Likely wrote: > On Thu, Apr 18, 2013 at 3:24 PM, Andrew Murray wrote: > > On Thu, Apr 18, 2013 at 02:44:01PM +0100, Grant Likely wrote: > >> On Tue, 16 Apr 2013 11:18:27 +0100, Andrew Murray > >> wrote: > >> > /* Act based on address space type */ > >> > res = NULL; > >> > - switch ((pci_space >> 24) & 0x3) { > >> > - case 1: /* PCI IO space */ > >> > + res_type = range.flags & IORESOURCE_TYPE_BITS; > >> > + if (res_type == IORESOURCE_IO) { > >> > >> Why the change from switch() to an if/else if sequence? > > > > Russell King suggested this change - see > > https://patchwork.kernel.org/patch/2323941/ > > Umm, that isn't what that link shows. That link shows the patch > already changing to an if/else if construct, and rmk is commenting on > that. Arh yes sorry about that. I can't find or remember any good reason why I changed it from a switch to an if/else :\ Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v8 3/3] of/pci: mips: convert to common of_pci_range_parser
This patch converts the pci_load_of_ranges function to use the new common of_pci_range_parser. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Signed-off-by: Gabor Juhos Reviewed-by: Rob Herring Reviewed-by: Grant Likely Tested-by: Linus Walleij --- arch/mips/pci/pci.c | 51 +++ 1 files changed, 19 insertions(+), 32 deletions(-) diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c index 0872f12..4b09ca8 100644 --- a/arch/mips/pci/pci.c +++ b/arch/mips/pci/pci.c @@ -122,51 +122,38 @@ static void pcibios_scanbus(struct pci_controller *hose) #ifdef CONFIG_OF void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node) { - const __be32 *ranges; - int rlen; - int pna = of_n_addr_cells(node); - int np = pna + 5; + struct of_pci_range range; + struct of_pci_range_parser parser; + u32 res_type; pr_info("PCI host bridge %s ranges:\n", node->full_name); - ranges = of_get_property(node, "ranges", &rlen); - if (ranges == NULL) - return; hose->of_node = node; - while ((rlen -= np * 4) >= 0) { - u32 pci_space; + if (of_pci_range_parser_init(&parser, node)) + return; + + for_each_of_pci_range(&parser, &range) { struct resource *res = NULL; - u64 addr, size; - - pci_space = be32_to_cpup(&ranges[0]); - addr = of_translate_address(node, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - ranges += np; - switch ((pci_space >> 24) & 0x3) { - case 1: /* PCI IO space */ + + switch (range.flags & IORESOURCE_TYPE_BITS) { + case IORESOURCE_IO: pr_info(" IO 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.cpu_addr, + range.cpu_addr + range.size - 1); hose->io_map_base = - (unsigned long)ioremap(addr, size); + (unsigned long)ioremap(range.cpu_addr, + range.size); res = hose->io_resource; - res->flags = IORESOURCE_IO; break; - case 2: /* PCI Memory space */ - case 3: /* PCI 64 bits Memory space */ + case IORESOURCE_MEM: pr_info(" MEM 0x%016llx..0x%016llx\n", - addr, addr + size - 1); + range.cpu_addr, + range.cpu_addr + range.size - 1); res = hose->mem_resource; - res->flags = IORESOURCE_MEM; break; } - if (res != NULL) { - res->start = addr; - res->name = node->full_name; - res->end = res->start + size - 1; - res->parent = NULL; - res->sibling = NULL; - res->child = NULL; - } + if (res != NULL) + of_pci_range_to_resource(&range, node, res); } } #endif -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v8 0/3] of/pci: Provide common support for PCI DT parsing
This patchset factors out duplicated code associated with parsing PCI DT "ranges" properties across the architectures and introduces a "ranges" parser. This parser "of_pci_range_parser" can be used directly by ARM host bridge drivers enabling them to obtain ranges from device trees. I've included the Reviewed-by, Tested-by and Acked-by's received from v5/v6/v7 in this patchset, earlier versions of this patchset (v3) have been tested-by: Thierry Reding Jingoo Han I've tested that this patchset builds and runs on ARM and that it builds on PowerPC, x86_64 and MIPS. Compared to the v7 sent by Andrew Murray, the following changes have been made (please note that the first patch is unchanged from v7): * Rename of_pci_range_parser to of_pci_range_parser_init and of_pci_process_ranges to of_pci_range_parser_one as suggested by Grant Likely. * Reverted back to using a switch statement instead of if/else in pci_process_bridge_OF_ranges. Grant Likely highlighted this change from the original code which was unnecessary. * Squashed in a patch provided by Gabor Juhos which fixes build errors on MIPS found in the last patchset. Compared to the v6 sent by Andrew Murray, the following changes have been made in response to build errors/warnings: * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal Simek to prevent compilation failures on Microblaze (and others) and his ack. * Use of externs, static inlines and a typo in linux/of_address.h in response to linker errors (multiple defination) on x86_64 as spotted by a kbuild test robot on (jcooper/linux.git mvebu/drivers) * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent with of_pci_process_ranges function Compared to the v5 sent by Andrew Murray, the following changes have been made: * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by Rob Herring in drivers/of/of_pci.c * Added forward declaration of struct pci_controller in linux/of_pci.h to prevent compiler warning as suggested by Thomas Petazzoni * Improved error checking (!range check), removal of unnecessary be32_to_cpup call, improved formatting of struct of_pci_range_parser layout and replacement of macro with a static inline. All suggested by Rob Herring. Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray, the following changes have been made: * Split the patch as suggested by Rob Herring Compared to the v3 sent by Andrew Murray, the following changes have been made: * Unify and move duplicate pci_process_bridge_OF_ranges functions to drivers/of/of_pci.c as suggested by Rob Herring * Fix potential build errors with Microblaze/MIPS Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT ranges property", the following changes have been made: * Correct use of IORESOURCE_* as suggested by Russell King * Improved interface and naming as suggested by Thierry Reding Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did: * Add a memset() on the struct of_pci_range_iter when starting the for loop in for_each_pci_range(). Otherwise, with an uninitialized of_pci_range_iter, of_pci_process_ranges() may crash. * Add parenthesis around 'res', 'np' and 'iter' in the for_each_of_pci_range macro definitions. Otherwise, passing something like &foobar as 'res' didn't work. * Rebased on top of 3.9-rc2, which required fixing a few conflicts in the Microblaze code. v2: This follows on from suggestions made by Grant Likely (marc.info/?l=linux-kernel&m=136079602806328) Andrew Murray (3): of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC of/pci: Provide support for parsing PCI DT ranges property of/pci: mips: convert to common of_pci_range_parser arch/microblaze/include/asm/pci-bridge.h |5 +- arch/microblaze/pci/pci-common.c | 192 -- arch/mips/pci/pci.c | 51 +++- arch/powerpc/include/asm/pci-bridge.h|5 +- arch/powerpc/kernel/pci-common.c | 192 -- drivers/of/address.c | 67 +++ drivers/of/of_pci.c | 173 +++ include/linux/of_address.h | 48 include/linux/of_pci.h |4 + 9 files changed, 313 insertions(+), 424 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v8 2/3] of/pci: Provide support for parsing PCI DT ranges property
This patch factors out common implementation patterns to reduce overall kernel code and provide a means for host bridge drivers to directly obtain struct resources from the DT's ranges property without relying on architecture specific DT handling. This will make it easier to write archiecture independent host bridge drivers and mitigate against further duplication of DT parsing code. This patch can be used in the following way: struct of_pci_range_parser parser; struct of_pci_range range; if (of_pci_range_parser_init(&parser, np)) ; //no ranges property for_each_of_pci_range(&parser, &range) { /* directly access properties of the address range, e.g.: range.pci_space, range.pci_addr, range.cpu_addr, range.size, range.flags alternatively obtain a struct resource, e.g.: struct resource res; of_pci_range_to_resource(&range, np, &res); */ } Additionally the implementation takes care of adjacent ranges and merges them into a single range (as was the case with powerpc and microblaze). Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Signed-off-by: Thomas Petazzoni Reviewed-by: Rob Herring Tested-by: Thomas Petazzoni Tested-by: Linus Walleij Acked-by: Grant Likely --- drivers/of/address.c | 67 ++ drivers/of/of_pci.c| 113 +--- include/linux/of_address.h | 48 +++ 3 files changed, 158 insertions(+), 70 deletions(-) diff --git a/drivers/of/address.c b/drivers/of/address.c index 04da786..fdd0636 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, int bar, return __of_address_to_resource(dev, addrp, size, flags, NULL, r); } EXPORT_SYMBOL_GPL(of_pci_address_to_resource); + +int of_pci_range_parser_init(struct of_pci_range_parser *parser, + struct device_node *node) +{ + const int na = 3, ns = 2; + int rlen; + + parser->node = node; + parser->pna = of_n_addr_cells(node); + parser->np = parser->pna + na + ns; + + parser->range = of_get_property(node, "ranges", &rlen); + if (parser->range == NULL) + return -ENOENT; + + parser->end = parser->range + rlen / sizeof(__be32); + + return 0; +} +EXPORT_SYMBOL_GPL(of_pci_range_parser_init); + +struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser *parser, + struct of_pci_range *range) +{ + const int na = 3, ns = 2; + + if (!range) + return NULL; + + if (!parser->range || parser->range + parser->np > parser->end) + return NULL; + + range->pci_space = parser->range[0]; + range->flags = of_bus_pci_get_flags(parser->range); + range->pci_addr = of_read_number(parser->range + 1, ns); + range->cpu_addr = of_translate_address(parser->node, + parser->range + na); + range->size = of_read_number(parser->range + parser->pna + na, ns); + + parser->range += parser->np; + + /* Now consume following elements while they are contiguous */ + while (parser->range + parser->np <= parser->end) { + u32 flags, pci_space; + u64 pci_addr, cpu_addr, size; + + pci_space = be32_to_cpup(parser->range); + flags = of_bus_pci_get_flags(parser->range); + pci_addr = of_read_number(parser->range + 1, ns); + cpu_addr = of_translate_address(parser->node, + parser->range + na); + size = of_read_number(parser->range + parser->pna + na, ns); + + if (flags != range->flags) + break; + if (pci_addr != range->pci_addr + range->size || + cpu_addr != range->cpu_addr + range->size) + break; + + range->size += size; + parser->range += parser->np; + } + + return range; +} +EXPORT_SYMBOL_GPL(of_pci_range_parser_one); + #endif /* CONFIG_PCI */ /* diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 1626172..3c49ab2 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || defined(CONFIG_MICROBLAZE) @@ -82,67 +83,42 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device); void pci_process_bridge_OF_ranges(struct pci_controller *hose,
[PATCH v8 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC
The pci_process_bridge_OF_ranges function, used to parse the "ranges" property of a PCI host device, is found in both Microblaze and PowerPC architectures. These implementations are nearly identical. This patch moves this common code to a common place. Signed-off-by: Andrew Murray Signed-off-by: Liviu Dudau Reviewed-by: Rob Herring Tested-by: Thomas Petazzoni Tested-by: Linus Walleij Acked-by: Michal Simek Acked-by: Grant Likely --- arch/microblaze/include/asm/pci-bridge.h |5 +- arch/microblaze/pci/pci-common.c | 192 arch/powerpc/include/asm/pci-bridge.h|5 +- arch/powerpc/kernel/pci-common.c | 192 drivers/of/of_pci.c | 200 ++ include/linux/of_pci.h |4 + 6 files changed, 206 insertions(+), 392 deletions(-) diff --git a/arch/microblaze/include/asm/pci-bridge.h b/arch/microblaze/include/asm/pci-bridge.h index cb5d397..5783cd6 100644 --- a/arch/microblaze/include/asm/pci-bridge.h +++ b/arch/microblaze/include/asm/pci-bridge.h @@ -10,6 +10,7 @@ #include #include #include +#include struct device_node; @@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose, extern struct pci_controller *pci_find_hose_for_OF_device( struct device_node *node); -/* Fill up host controller resources from the OF node */ -extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, - struct device_node *dev, int primary); - /* Allocate & free a PCI host bridge structure */ extern struct pci_controller *pcibios_alloc_controller(struct device_node *dev); extern void pcibios_free_controller(struct pci_controller *phb); diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c index 9ea521e..2735ad9 100644 --- a/arch/microblaze/pci/pci-common.c +++ b/arch/microblaze/pci/pci-common.c @@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar, *end = rsrc->end - offset; } -/** - * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree - * @hose: newly allocated pci_controller to be setup - * @dev: device node of the host bridge - * @primary: set if primary bus (32 bits only, soon to be deprecated) - * - * This function will parse the "ranges" property of a PCI host bridge device - * node and setup the resource mapping of a pci controller based on its - * content. - * - * Life would be boring if it wasn't for a few issues that we have to deal - * with here: - * - * - We can only cope with one IO space range and up to 3 Memory space - * ranges. However, some machines (thanks Apple !) tend to split their - * space into lots of small contiguous ranges. So we have to coalesce. - * - * - We can only cope with all memory ranges having the same offset - * between CPU addresses and PCI addresses. Unfortunately, some bridges - * are setup for a large 1:1 mapping along with a small "window" which - * maps PCI address 0 to some arbitrary high address of the CPU space in - * order to give access to the ISA memory hole. - * The way out of here that I've chosen for now is to always set the - * offset based on the first resource found, then override it if we - * have a different offset and the previous was set by an ISA hole. - * - * - Some busses have IO space not starting at 0, which causes trouble with - * the way we do our IO resource renumbering. The code somewhat deals with - * it for 64 bits but I would expect problems on 32 bits. - * - * - Some 32 bits platforms such as 4xx can have physical space larger than - * 32 bits so we need to use 64 bits values for the parsing - */ -void pci_process_bridge_OF_ranges(struct pci_controller *hose, - struct device_node *dev, int primary) -{ - const u32 *ranges; - int rlen; - int pna = of_n_addr_cells(dev); - int np = pna + 5; - int memno = 0, isa_hole = -1; - u32 pci_space; - unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size; - unsigned long long isa_mb = 0; - struct resource *res; - - pr_info("PCI host bridge %s %s ranges:\n", - dev->full_name, primary ? "(primary)" : ""); - - /* Get ranges property */ - ranges = of_get_property(dev, "ranges", &rlen); - if (ranges == NULL) - return; - - /* Parse it */ - pr_debug("Parsing ranges property...\n"); - while ((rlen -= np * 4) >= 0) { - /* Read next ranges element */ - pci_space = ranges[0]; - pci_addr = of_read_number(ranges + 1, 2); - cpu_addr = of_translate_address(dev, ranges + 3); - s
Re: [PATCH v7 3/3] of/pci: mips: convert to common of_pci_range_parser
On Sun, Apr 21, 2013 at 08:27:02AM +0100, Gabor Juhos wrote: > Hi Jason, > > >> Sorry I had no time earlier, but I have tested this now on MIPS. The patch > >> causes build errors unfortunately. Given the fact that this has been merged > >> already, I will send a fixup patch. > > > > Olof has dropped this branch from arm-soc, plase post the build error > > and fix here so that it can be included in this series. > > I have posted the patch to Olof two days ago. It has been CC'd to you as well > but In case that it does not exists in your mailbox the patch can be found > here: > > https://patchwork.linux-mips.org/patch/5196/ > > However I can re-post the patch as a reply to this thread if you prefer that. As this branch was dropped I have updated my patchset to include Grant's recent feedback - I've also included Gabor's fixes to this patchset (and his sign-off). If you include this new patchset in your branch the drivers that depend on it will need to be updated to reflect the new naming of functions as suggested by Grant. Thanks, Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 2/3] of/pci: Provide support for parsing PCI DT ranges property
On Thu, Apr 18, 2013 at 02:44:01PM +0100, Grant Likely wrote: > On Tue, 16 Apr 2013 11:18:27 +0100, Andrew Murray > wrote: > > Acked-by: Grant Likely > > But comments below... > I've updated the patchset (now v8) to reflect your feedback, after a closer look... > > - > > - pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ", > > - pci_space, pci_addr); > > - pr_debug("cpu_addr:0x%016llx size:0x%016llx\n", > > - cpu_addr, size); > > - > > - ranges += np; > > + pr_debug("pci_space: 0x%08x pci_addr: 0x%016llx ", > > + range.pci_space, range.pci_addr); > > + pr_debug("cpu_addr: 0x%016llx size: 0x%016llx\n", > > + range.cpu_addr, range.size); > > Nit: the patch changed whitespace on the pr_debug() statements, so even > though the first line of each is identical, they look different in the > patch. > Actually the first line isn't identical, the original file was inconsistent with its use of spaces between ':' and '0x%0' - my patch ensured that there was always a space. I guess this could have been done as a separate patch. > > > > /* If we failed translation or got a zero-sized region > > * (some FW try to feed us with non sensical zero sized regions > > * such as power3 which look like some kind of attempt > > * at exposing the VGA memory hole) > > */ > > - if (cpu_addr == OF_BAD_ADDR || size == 0) > > + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) > > continue; > > Can this also be rolled into the parsing iterator? > I decided not to do this. Mainly because ARM drivers use the parser directly (instead of pci_process_bridge_OF_ranges function) and it seemed perfectly valid for the parser to return a range of size 0 if that is what was present in the DT. > > > > - /* Now consume following elements while they are contiguous */ > > - for (; rlen >= np * sizeof(u32); > > -ranges += np, rlen -= np * 4) { > > - if (ranges[0] != pci_space) > > - break; > > - pci_next = of_read_number(ranges + 1, 2); > > - cpu_next = of_translate_address(dev, ranges + 3); > > - if (pci_next != pci_addr + size || > > - cpu_next != cpu_addr + size) > > - break; > > - size += of_read_number(ranges + pna + 3, 2); > > - } > > - > > /* Act based on address space type */ > > res = NULL; > > - switch ((pci_space >> 24) & 0x3) { > > - case 1: /* PCI IO space */ > > + res_type = range.flags & IORESOURCE_TYPE_BITS; > > + if (res_type == IORESOURCE_IO) { > > Why the change from switch() to an if/else if sequence? I think this was an artifact of the patches evolution, I've reverted back to the switch. > > But those are mostly nitpicks. If this is deferred to v3.10 then I would > suggest fixing them up and posting for another round of review. Andrew Murray ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 0/3] Unify definations of struct pci_controller
PowerPC and Microblaze have nearly identical definations of struct pci_controller - this patch unifies them in asm-generic to reduce code duplication and to allow new architectures to reuse. This patchset follows and depends on "of/pci: Provide common support for PCI DT parsing" which provided common 'ranges' parsing code which uses an architecture defined struct pci_controller. This patch is currently in Jason Coopers mvebu-next/pcie branch. It is hoped this will pave the way for providing common implementations of commonly duplicated functions found across the architectures such as pcibios_alloc|free_controller and pcibios_setup_phb_resources type functions. Andrew Murray (3): powerpc: Move struct pci_controller to asm-generic microblaze: Use asm-generic version of pci_controller pci: Use common definations of INDIRECT_TYPE_* arch/microblaze/include/asm/pci-bridge.h | 70 +--- arch/powerpc/include/asm/pci-bridge.h| 82 --- arch/powerpc/sysdev/fsl_pci.c| 16 +++--- arch/powerpc/sysdev/indirect_pci.c | 20 +++--- arch/powerpc/sysdev/ppc4xx_pci.c |4 +- arch/powerpc/sysdev/xilinx_pci.c |2 +- include/asm-generic/pci-bridge.h | 90 ++ 7 files changed, 112 insertions(+), 172 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 1/3] powerpc: Move struct pci_controller to asm-generic
This patch moves struct pci_controller into asm-generic to allow for use by other architectures thus reducing code duplication in the kernel. Signed-off-by: Andrew Murray --- arch/powerpc/include/asm/pci-bridge.h | 87 +--- include/asm-generic/pci-bridge.h | 68 + 2 files changed, 82 insertions(+), 73 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 205bfba..163bd40 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -8,7 +8,6 @@ * 2 of the License, or (at your option) any later version. */ #include -#include #include #include #include @@ -16,85 +15,27 @@ struct device_node; /* - * Structure of a PCI controller (host bridge) + * Used for variants of PCI indirect handling and possible quirks: + * SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 + * EXT_REG - provides access to PCI-e extended registers + * SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS + * on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS + * to determine which bus number to match on when generating type0 + * config cycles + * NO_PCIE_LINK - the Freescale PCI-e controllers have issues with + * hanging if we don't have link and try to do config cycles to + * anything but the PHB. Only allow talking to the PHB if this is + * set. + * BIG_ENDIAN - cfg_addr is a big endian register + * BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs on + * the PLB4. Effectively disable MRM commands by setting this. */ -struct pci_controller { - struct pci_bus *bus; - char is_dynamic; -#ifdef CONFIG_PPC64 - int node; -#endif - struct device_node *dn; - struct list_head list_node; - struct device *parent; - - int first_busno; - int last_busno; - int self_busno; - struct resource busn; - - void __iomem *io_base_virt; -#ifdef CONFIG_PPC64 - void *io_base_alloc; -#endif - resource_size_t io_base_phys; - resource_size_t pci_io_size; - - /* Some machines (PReP) have a non 1:1 mapping of -* the PCI memory space in the CPU bus space -*/ - resource_size_t pci_mem_offset; - - /* Some machines have a special region to forward the ISA -* "memory" cycles such as VGA memory regions. Left to 0 -* if unsupported -*/ - resource_size_t isa_mem_phys; - resource_size_t isa_mem_size; - - struct pci_ops *ops; - unsigned int __iomem *cfg_addr; - void __iomem *cfg_data; - - /* -* Used for variants of PCI indirect handling and possible quirks: -* SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 -* EXT_REG - provides access to PCI-e extended registers -* SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS -* on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS -* to determine which bus number to match on when generating type0 -* config cycles -* NO_PCIE_LINK - the Freescale PCI-e controllers have issues with -* hanging if we don't have link and try to do config cycles to -* anything but the PHB. Only allow talking to the PHB if this is -* set. -* BIG_ENDIAN - cfg_addr is a big endian register -* BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs on -* the PLB4. Effectively disable MRM commands by setting this. -*/ #define PPC_INDIRECT_TYPE_SET_CFG_TYPE 0x0001 #define PPC_INDIRECT_TYPE_EXT_REG 0x0002 #define PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004 #define PPC_INDIRECT_TYPE_NO_PCIE_LINK 0x0008 #define PPC_INDIRECT_TYPE_BIG_ENDIAN 0x0010 #define PPC_INDIRECT_TYPE_BROKEN_MRM 0x0020 - u32 indirect_type; - /* Currently, we limit ourselves to 1 IO range and 3 mem -* ranges since the common pci_bus structure can't handle more -*/ - struct resource io_resource; - struct resource mem_resources[3]; - int global_number; /* PCI domain number */ - - resource_size_t dma_window_base_cur; - resource_size_t dma_window_size; - -#ifdef CONFIG_PPC64 - unsigned long buid; - - void *private_data; -#endif /* CONFIG_PPC64 */ -}; /* These are used for config access before all the PCI probing has been done. */ diff --git a/include/asm-generic/pci-bridge.h b/include/asm-generic/pci-bridge.h index 20db2e5..e58830e 100644 --- a/include/asm-generic/pci-bridge.h +++ b/include/asm-generic/pci-bridge.h @@ -9,6 +9,9 @@ #ifdef __KERNEL__ +#include +#include + enum { /* Force re-assigning all resources (ignore firmware * setup completely
[RFC PATCH 2/3] microblaze: Use asm-generic version of pci_controller
This patch removes struct pci_controller from Microblaze and instead uses struct pci_controller from asm-generic. Signed-off-by: Andrew Murray --- arch/microblaze/include/asm/pci-bridge.h | 75 ++ include/asm-generic/pci-bridge.h |2 +- 2 files changed, 16 insertions(+), 61 deletions(-) diff --git a/arch/microblaze/include/asm/pci-bridge.h b/arch/microblaze/include/asm/pci-bridge.h index 5783cd6..0ee75dc 100644 --- a/arch/microblaze/include/asm/pci-bridge.h +++ b/arch/microblaze/include/asm/pci-bridge.h @@ -8,9 +8,9 @@ * 2 of the License, or (at your option) any later version. */ #include -#include #include #include +#include struct device_node; @@ -25,72 +25,27 @@ static inline int pcibios_vaddr_is_ioport(void __iomem *address) #endif /* - * Structure of a PCI controller (host bridge) + * Used for variants of PCI indirect handling and possible quirks: + * SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 + * EXT_REG - provides access to PCI-e extended registers + * SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS + * on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS + * to determine which bus number to match on when generating type0 + * config cycles + * NO_PCIE_LINK - the Freescale PCI-e controllers have issues with + * hanging if we don't have link and try to do config cycles to + * anything but the PHB. Only allow talking to the PHB if this is + * set. + * BIG_ENDIAN - cfg_addr is a big endian register + * BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs + * on the PLB4. Effectively disable MRM commands by setting this. */ -struct pci_controller { - struct pci_bus *bus; - char is_dynamic; - struct device_node *dn; - struct list_head list_node; - struct device *parent; - - int first_busno; - int last_busno; - - int self_busno; - - void __iomem *io_base_virt; - resource_size_t io_base_phys; - - resource_size_t pci_io_size; - - /* Some machines (PReP) have a non 1:1 mapping of -* the PCI memory space in the CPU bus space -*/ - resource_size_t pci_mem_offset; - - /* Some machines have a special region to forward the ISA -* "memory" cycles such as VGA memory regions. Left to 0 -* if unsupported -*/ - resource_size_t isa_mem_phys; - resource_size_t isa_mem_size; - - struct pci_ops *ops; - unsigned int __iomem *cfg_addr; - void __iomem *cfg_data; - - /* -* Used for variants of PCI indirect handling and possible quirks: -* SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 -* EXT_REG - provides access to PCI-e extended registers -* SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS -* on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS -* to determine which bus number to match on when generating type0 -* config cycles -* NO_PCIE_LINK - the Freescale PCI-e controllers have issues with -* hanging if we don't have link and try to do config cycles to -* anything but the PHB. Only allow talking to the PHB if this is -* set. -* BIG_ENDIAN - cfg_addr is a big endian register -* BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs -* on the PLB4. Effectively disable MRM commands by setting this. -*/ #define INDIRECT_TYPE_SET_CFG_TYPE 0x0001 #define INDIRECT_TYPE_EXT_REG 0x0002 #define INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004 #define INDIRECT_TYPE_NO_PCIE_LINK 0x0008 #define INDIRECT_TYPE_BIG_ENDIAN 0x0010 #define INDIRECT_TYPE_BROKEN_MRM 0x0020 - u32 indirect_type; - - /* Currently, we limit ourselves to 1 IO range and 3 mem -* ranges since the common pci_bus structure can't handle more -*/ - struct resource io_resource; - struct resource mem_resources[3]; - int global_number; /* PCI domain number */ -}; #ifdef CONFIG_PCI static inline struct pci_controller *pci_bus_to_host(const struct pci_bus *bus) diff --git a/include/asm-generic/pci-bridge.h b/include/asm-generic/pci-bridge.h index e58830e..1a7f96d 100644 --- a/include/asm-generic/pci-bridge.h +++ b/include/asm-generic/pci-bridge.h @@ -46,7 +46,7 @@ struct device_node; /* * Structure of a PCI controller (host bridge) */ -#if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) +#if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || defined(CONFIG_MICROBLAZE) struct pci_controller { struct pci_bus *bus; char is_dynamic; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 3/3] pci: Use common definations of INDIRECT_TYPE_*
This patch unifies similar definations of INDIRECT_TYPE_* between PowerPC and Microblaze. Signed-off-by: Andrew Murray --- arch/microblaze/include/asm/pci-bridge.h | 23 --- arch/powerpc/include/asm/pci-bridge.h| 23 --- arch/powerpc/sysdev/fsl_pci.c| 16 arch/powerpc/sysdev/indirect_pci.c | 20 ++-- arch/powerpc/sysdev/ppc4xx_pci.c |4 ++-- arch/powerpc/sysdev/xilinx_pci.c |2 +- include/asm-generic/pci-bridge.h | 22 ++ 7 files changed, 43 insertions(+), 67 deletions(-) diff --git a/arch/microblaze/include/asm/pci-bridge.h b/arch/microblaze/include/asm/pci-bridge.h index 0ee75dc..acf8252 100644 --- a/arch/microblaze/include/asm/pci-bridge.h +++ b/arch/microblaze/include/asm/pci-bridge.h @@ -24,29 +24,6 @@ static inline int pcibios_vaddr_is_ioport(void __iomem *address) } #endif -/* - * Used for variants of PCI indirect handling and possible quirks: - * SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 - * EXT_REG - provides access to PCI-e extended registers - * SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS - * on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS - * to determine which bus number to match on when generating type0 - * config cycles - * NO_PCIE_LINK - the Freescale PCI-e controllers have issues with - * hanging if we don't have link and try to do config cycles to - * anything but the PHB. Only allow talking to the PHB if this is - * set. - * BIG_ENDIAN - cfg_addr is a big endian register - * BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs - * on the PLB4. Effectively disable MRM commands by setting this. - */ -#define INDIRECT_TYPE_SET_CFG_TYPE 0x0001 -#define INDIRECT_TYPE_EXT_REG 0x0002 -#define INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004 -#define INDIRECT_TYPE_NO_PCIE_LINK 0x0008 -#define INDIRECT_TYPE_BIG_ENDIAN 0x0010 -#define INDIRECT_TYPE_BROKEN_MRM 0x0020 - #ifdef CONFIG_PCI static inline struct pci_controller *pci_bus_to_host(const struct pci_bus *bus) { diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 163bd40..b2bbf05 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -14,29 +14,6 @@ struct device_node; -/* - * Used for variants of PCI indirect handling and possible quirks: - * SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1 - * EXT_REG - provides access to PCI-e extended registers - * SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS - * on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS - * to determine which bus number to match on when generating type0 - * config cycles - * NO_PCIE_LINK - the Freescale PCI-e controllers have issues with - * hanging if we don't have link and try to do config cycles to - * anything but the PHB. Only allow talking to the PHB if this is - * set. - * BIG_ENDIAN - cfg_addr is a big endian register - * BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs on - * the PLB4. Effectively disable MRM commands by setting this. - */ -#define PPC_INDIRECT_TYPE_SET_CFG_TYPE 0x0001 -#define PPC_INDIRECT_TYPE_EXT_REG 0x0002 -#define PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004 -#define PPC_INDIRECT_TYPE_NO_PCIE_LINK 0x0008 -#define PPC_INDIRECT_TYPE_BIG_ENDIAN 0x0010 -#define PPC_INDIRECT_TYPE_BROKEN_MRM 0x0020 - /* These are used for config access before all the PCI probing has been done. */ extern int early_read_config_byte(struct pci_controller *hose, int bus, diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index 682084d..d73f94a 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -399,7 +399,7 @@ void fsl_pcibios_fixup_bus(struct pci_bus *bus) if (fsl_pcie_bus_fixup) is_pcie = early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP); - no_link = !!(hose->indirect_type & PPC_INDIRECT_TYPE_NO_PCIE_LINK); + no_link = !!(hose->indirect_type & INDIRECT_TYPE_NO_PCIE_LINK); if (bus->parent == hose->bus && (is_pcie || no_link)) { for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; ++i) { @@ -462,7 +462,7 @@ int __init fsl_add_bridge(struct platform_device *pdev, int is_primary) hose->last_busno = bus_range ? bus_range[1] : 0xff; setup_indirect_pci(hose, rsrc.start, rsrc.start + 0x4, - PPC_INDIRECT_TYPE_BIG_ENDIAN); + INDIRECT_TYPE_BIG_ENDIAN); if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) { /* For PCIE read HEADER_TYPE to identify contro
[PATCH] powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_ranges
This patch updates the implementation of pci_process_bridge_OF_ranges to use the of_pci_range_parser helpers. Signed-off-by: Andrew Murray --- I've verified that this builds, however I have no hardware to test this. --- arch/powerpc/kernel/pci-common.c | 88 +--- 1 file changed, 29 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index d9476c1..a05fe18 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -666,60 +666,36 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar, void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev, int primary) { - const __be32 *ranges; - int rlen; - int pna = of_n_addr_cells(dev); - int np = pna + 5; int memno = 0; - u32 pci_space; - unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size; struct resource *res; + struct of_pci_range range; + struct of_pci_range_parser parser; printk(KERN_INFO "PCI host bridge %s %s ranges:\n", dev->full_name, primary ? "(primary)" : ""); - /* Get ranges property */ - ranges = of_get_property(dev, "ranges", &rlen); - if (ranges == NULL) + /* Check for ranges property */ + if (of_pci_range_parser_init(&parser, dev)) return; /* Parse it */ - while ((rlen -= np * 4) >= 0) { - /* Read next ranges element */ - pci_space = of_read_number(ranges, 1); - pci_addr = of_read_number(ranges + 1, 2); - cpu_addr = of_translate_address(dev, ranges + 3); - size = of_read_number(ranges + pna + 3, 2); - ranges += np; - + for_each_of_pci_range(&parser, &range) { /* If we failed translation or got a zero-sized region * (some FW try to feed us with non sensical zero sized regions * such as power3 which look like some kind of attempt at exposing * the VGA memory hole) */ - if (cpu_addr == OF_BAD_ADDR || size == 0) + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) continue; - /* Now consume following elements while they are contiguous */ - for (; rlen >= np * sizeof(u32); -ranges += np, rlen -= np * 4) { - if (of_read_number(ranges, 1) != pci_space) - break; - pci_next = of_read_number(ranges + 1, 2); - cpu_next = of_translate_address(dev, ranges + 3); - if (pci_next != pci_addr + size || - cpu_next != cpu_addr + size) - break; - size += of_read_number(ranges + pna + 3, 2); - } - /* Act based on address space type */ res = NULL; - switch ((pci_space >> 24) & 0x3) { - case 1: /* PCI IO space */ + switch (range.flags & IORESOURCE_TYPE_BITS) { + case IORESOURCE_IO: printk(KERN_INFO " IO 0x%016llx..0x%016llx -> 0x%016llx\n", - cpu_addr, cpu_addr + size - 1, pci_addr); + range.cpu_addr, range.cpu_addr + range.size - 1, + range.pci_addr); /* We support only one IO range */ if (hose->pci_io_size) { @@ -729,11 +705,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose, } #ifdef CONFIG_PPC32 /* On 32 bits, limit I/O space to 16MB */ - if (size > 0x0100) - size = 0x0100; + if (range.size > 0x0100) + range.size = 0x0100; /* 32 bits needs to map IOs here */ - hose->io_base_virt = ioremap(cpu_addr, size); + hose->io_base_virt = ioremap(range.cpu_addr, + range.size); /* Expect trouble if pci_addr is not 0 */ if (primary) @@ -743,20 +720,20 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose, /* pci_io_size and io_base_phys always represent IO * space starting at 0 so we factor in pci_addr */ - hose->pci_io_size = pci_addr + size; -
Re: [PATCH] powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_ranges
On 25 February 2014 13:25, Benjamin Herrenschmidt wrote: > On Tue, 2014-02-25 at 06:32 +0000, Andrew Murray wrote: >> This patch updates the implementation of pci_process_bridge_OF_ranges to use >> the of_pci_range_parser helpers. >> >> Signed-off-by: Andrew Murray >> --- >> I've verified that this builds, however I have no hardware to test this. >> --- > > Thanks. A cursory review looks good but I need to spend a bit more time > making sure our various special cases are handled properly. > > It's tracked on patchwork so unless you have an update to the patch, > it won't be lost, but it might take a little while before I get to > actually merge it. Thanks for the response - Yes it's easy to screw this stuff up. Please note that some of the special cases are handled by the parser helper e.g. consumption of contiguous ranges, assignment to 'struct resource', etc. It should also be pointed out that this is now once again very similar to the Microblaze implementation. Thanks, Andrew Murray > > Cheers, > Ben. > >> arch/powerpc/kernel/pci-common.c | 88 >> +--- >> 1 file changed, 29 insertions(+), 59 deletions(-) >> >> diff --git a/arch/powerpc/kernel/pci-common.c >> b/arch/powerpc/kernel/pci-common.c >> index d9476c1..a05fe18 100644 >> --- a/arch/powerpc/kernel/pci-common.c >> +++ b/arch/powerpc/kernel/pci-common.c >> @@ -666,60 +666,36 @@ void pci_resource_to_user(const struct pci_dev *dev, >> int bar, >> void pci_process_bridge_OF_ranges(struct pci_controller *hose, >> struct device_node *dev, int primary) >> { >> - const __be32 *ranges; >> - int rlen; >> - int pna = of_n_addr_cells(dev); >> - int np = pna + 5; >> int memno = 0; >> - u32 pci_space; >> - unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size; >> struct resource *res; >> + struct of_pci_range range; >> + struct of_pci_range_parser parser; >> >> printk(KERN_INFO "PCI host bridge %s %s ranges:\n", >> dev->full_name, primary ? "(primary)" : ""); >> >> - /* Get ranges property */ >> - ranges = of_get_property(dev, "ranges", &rlen); >> - if (ranges == NULL) >> + /* Check for ranges property */ >> + if (of_pci_range_parser_init(&parser, dev)) >> return; >> >> /* Parse it */ >> - while ((rlen -= np * 4) >= 0) { >> - /* Read next ranges element */ >> - pci_space = of_read_number(ranges, 1); >> - pci_addr = of_read_number(ranges + 1, 2); >> - cpu_addr = of_translate_address(dev, ranges + 3); >> - size = of_read_number(ranges + pna + 3, 2); >> - ranges += np; >> - >> + for_each_of_pci_range(&parser, &range) { >> /* If we failed translation or got a zero-sized region >>* (some FW try to feed us with non sensical zero sized regions >>* such as power3 which look like some kind of attempt at >> exposing >>* the VGA memory hole) >>*/ >> - if (cpu_addr == OF_BAD_ADDR || size == 0) >> + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0) >> continue; >> >> - /* Now consume following elements while they are contiguous */ >> - for (; rlen >= np * sizeof(u32); >> - ranges += np, rlen -= np * 4) { >> - if (of_read_number(ranges, 1) != pci_space) >> - break; >> - pci_next = of_read_number(ranges + 1, 2); >> - cpu_next = of_translate_address(dev, ranges + 3); >> - if (pci_next != pci_addr + size || >> - cpu_next != cpu_addr + size) >> - break; >> - size += of_read_number(ranges + pna + 3, 2); >> - } >> - >> /* Act based on address space type */ >> res = NULL; >> - switch ((pci_space >> 24) & 0x3) { >> - case 1: /* PCI IO space */ >> + switch (range.flags & IORESOURCE_TYPE_BITS) { >> + case IORESOURCE_IO: >> printk(KERN_INFO >> " IO 0x%016llx..0x%016llx -> 0x%016llx\n"
[PATCH v1 0/7] PCI: dt: Remove magic numbers for legacy PCI IRQ interrupts
PCI devices can trigger interrupts via 4 physical/virtual lines known as INTA, INTB, INTC or INTD. Due to interrupt swizzling it is often required to describe the interrupt mapping in the device tree. Let's avoid the existing magic numbers and replace them with a #define to improve clarity. Based on v5.4-rc5, compile tested Signed-off-by: Andrew Murray Andrew Murray (7): PCI: dt: Add legacy PCI IRQ defines arm64: dts: Use IRQ flags for legacy PCI IRQ interrupts arm: dts: Use IRQ flags for legacy PCI IRQ interrupts xtensa: dts: Use IRQ flags for legacy PCI IRQ interrupts powerpc: dts: fsl: Use IRQ flags for legacy PCI IRQ interrupts powerpc: dts: Use IRQ flags for legacy PCI IRQ interrupts dt-bindings: PCI: Use IRQ flags for legacy PCI IRQ interrupts .../devicetree/bindings/pci/83xx-512x-pci.txt | 18 +-- .../devicetree/bindings/pci/aardvark-pci.txt | 10 +- .../devicetree/bindings/pci/altera-pcie.txt | 10 +- .../bindings/pci/axis,artpec6-pcie.txt| 10 +- .../bindings/pci/cdns,cdns-pcie-host.txt | 10 +- .../bindings/pci/faraday,ftpci100.txt | 68 .../bindings/pci/fsl,imx6q-pcie.txt | 10 +- .../bindings/pci/hisilicon-pcie.txt | 20 +-- .../bindings/pci/host-generic-pci.txt | 10 +- .../devicetree/bindings/pci/kirin-pcie.txt| 10 +- .../bindings/pci/layerscape-pci.txt | 10 +- .../devicetree/bindings/pci/mediatek-pcie.txt | 40 ++--- .../devicetree/bindings/pci/mobiveil-pcie.txt | 8 +- .../devicetree/bindings/pci/pci-rcar-gen2.txt | 8 +- .../bindings/pci/pci-thunder-pem.txt | 10 +- .../devicetree/bindings/pci/pcie-al.txt | 4 +- .../devicetree/bindings/pci/qcom,pcie.txt | 20 +-- .../bindings/pci/ralink,rt3883-pci.txt| 18 +-- .../bindings/pci/rockchip-pcie-host.txt | 10 +- .../devicetree/bindings/pci/ti-pci.txt| 10 +- .../devicetree/bindings/pci/uniphier-pcie.txt | 10 +- .../bindings/pci/v3-v360epc-pci.txt | 34 ++-- .../devicetree/bindings/pci/versatile.txt | 40 ++--- .../devicetree/bindings/pci/xgene-pci-msi.txt | 10 +- .../devicetree/bindings/pci/xgene-pci.txt | 10 +- .../bindings/pci/xilinx-nwl-pcie.txt | 10 +- .../devicetree/bindings/pci/xilinx-pcie.txt | 20 +-- arch/arm/boot/dts/alpine.dtsi | 6 +- arch/arm/boot/dts/artpec6.dtsi| 10 +- arch/arm/boot/dts/gemini-dlink-dir-685.dts| 34 ++-- arch/arm/boot/dts/gemini-sl93512r.dts | 34 ++-- arch/arm/boot/dts/gemini-sq201.dts| 34 ++-- arch/arm/boot/dts/gemini-wbd111.dts | 34 ++-- arch/arm/boot/dts/gemini-wbd222.dts | 34 ++-- arch/arm/boot/dts/imx6qdl.dtsi| 10 +- arch/arm/boot/dts/imx6sx.dtsi | 10 +- arch/arm/boot/dts/integratorap.dts| 36 +++-- arch/arm/boot/dts/keystone-k2e.dtsi | 11 +- arch/arm/boot/dts/keystone.dtsi | 10 +- arch/arm/boot/dts/qcom-apq8064.dtsi | 10 +- arch/arm/boot/dts/qcom-ipq4019.dtsi | 10 +- arch/arm/boot/dts/versatile-pb.dts| 36 +++-- arch/arm64/boot/dts/al/alpine-v2.dtsi | 6 +- .../boot/dts/amd/amd-overdrive-rev-b0.dts | 2 +- .../boot/dts/amd/amd-overdrive-rev-b1.dts | 2 +- arch/arm64/boot/dts/amd/amd-overdrive.dts | 2 +- arch/arm64/boot/dts/amd/amd-seattle-soc.dtsi | 12 +- arch/arm64/boot/dts/amd/husky.dts | 2 +- arch/arm64/boot/dts/arm/fvp-base-revc.dts | 10 +- arch/arm64/boot/dts/arm/juno-base.dtsi| 12 +- arch/arm64/boot/dts/cavium/thunder2-99xx.dtsi | 10 +- .../arm64/boot/dts/freescale/fsl-ls1012a.dtsi | 10 +- arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 10 +- arch/arm64/boot/dts/hisilicon/hip06.dtsi | 10 +- arch/arm64/boot/dts/qcom/msm8998.dtsi | 10 +- arch/arm64/boot/dts/qcom/qcs404.dtsi | 10 +- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 10 +- .../boot/dts/socionext/uniphier-ld20.dtsi | 11 +- .../boot/dts/socionext/uniphier-pxs3.dtsi | 11 +- arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 12 +- arch/powerpc/boot/dts/bluestone.dts | 12 +- arch/powerpc/boot/dts/charon.dts | 12 +- arch/powerpc/boot/dts/digsy_mtc.dts | 12 +- arch/powerpc/boot/dts/fsl/b4420qds.dts| 4 +- arch/powerpc/boot/dts/fsl/b4420si-post.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4860qds.dts| 4 +- arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4qds.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/bsc9132qds.dts | 2 +- arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/c293pcie.dts| 2 +- arch/powerpc/boot/dts/fsl/c293si-post.dtsi| 12 +- arch/powerpc/boot/dts/fsl/gef_sbc310.dts | 12 +- arch/powerpc/boot/dt
[PATCH v1 5/7] powerpc: dts: fsl: Use IRQ flags for legacy PCI IRQ interrupts
Replace magic numbers used to describe legacy PCI IRQ interrupts with #define. Signed-off-by: Andrew Murray --- arch/powerpc/boot/dts/fsl/b4420qds.dts| 4 +- arch/powerpc/boot/dts/fsl/b4420si-post.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4860qds.dts| 4 +- arch/powerpc/boot/dts/fsl/b4860si-post.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4qds.dtsi | 2 +- arch/powerpc/boot/dts/fsl/b4si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/bsc9132qds.dts | 2 +- arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/c293pcie.dts| 2 +- arch/powerpc/boot/dts/fsl/c293si-post.dtsi| 12 +- arch/powerpc/boot/dts/fsl/gef_sbc310.dts | 12 +- arch/powerpc/boot/dts/fsl/mpc8536ds.dts | 12 +- arch/powerpc/boot/dts/fsl/mpc8536ds_36b.dts | 12 +- arch/powerpc/boot/dts/fsl/mpc8540ads.dts | 100 ++-- arch/powerpc/boot/dts/fsl/mpc8544ds.dts | 22 +-- arch/powerpc/boot/dts/fsl/mpc8544ds.dtsi | 22 +-- arch/powerpc/boot/dts/fsl/mpc8548cds_32b.dts | 14 +- arch/powerpc/boot/dts/fsl/mpc8548cds_36b.dts | 14 +- arch/powerpc/boot/dts/fsl/mpc8548si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/mpc8560ads.dts | 100 ++-- arch/powerpc/boot/dts/fsl/mpc8568mds.dts | 22 +-- arch/powerpc/boot/dts/fsl/mpc8568si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/mpc8569mds.dts | 2 +- arch/powerpc/boot/dts/fsl/mpc8569si-post.dtsi | 12 +- arch/powerpc/boot/dts/fsl/mpc8641_hpcn.dts| 150 +- .../powerpc/boot/dts/fsl/mpc8641_hpcn_36b.dts | 150 +- arch/powerpc/boot/dts/fsl/p2020ds.dts | 2 +- arch/powerpc/boot/dts/fsl/p2020ds.dtsi| 46 +++--- arch/powerpc/boot/dts/fsl/ppa8548.dts | 2 +- arch/powerpc/boot/dts/fsl/sbc8641d.dts| 4 +- 30 files changed, 408 insertions(+), 368 deletions(-) diff --git a/arch/powerpc/boot/dts/fsl/b4420qds.dts b/arch/powerpc/boot/dts/fsl/b4420qds.dts index cd9203ceedc0..11b6a5147538 100644 --- a/arch/powerpc/boot/dts/fsl/b4420qds.dts +++ b/arch/powerpc/boot/dts/fsl/b4420qds.dts @@ -33,7 +33,7 @@ */ /include/ "b4420si-pre.dtsi" -/include/ "b4qds.dtsi" +#include "b4qds.dtsi" / { model = "fsl,B4420QDS"; @@ -47,4 +47,4 @@ }; -/include/ "b4420si-post.dtsi" +#include "b4420si-post.dtsi" diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi index f996cced45e0..981585dc9026 100644 --- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi @@ -32,7 +32,7 @@ * this software, even if advised of the possibility of such damage. */ -/include/ "b4si-post.dtsi" +#include "b4si-post.dtsi" /* controller at 0x20 */ &pci0 { diff --git a/arch/powerpc/boot/dts/fsl/b4860qds.dts b/arch/powerpc/boot/dts/fsl/b4860qds.dts index a8bc419959ca..9cad8d3f3165 100644 --- a/arch/powerpc/boot/dts/fsl/b4860qds.dts +++ b/arch/powerpc/boot/dts/fsl/b4860qds.dts @@ -33,7 +33,7 @@ */ /include/ "b4860si-pre.dtsi" -/include/ "b4qds.dtsi" +#include "b4qds.dtsi" / { model = "fsl,B4860QDS"; @@ -114,4 +114,4 @@ }; }; -/include/ "b4860si-post.dtsi" +#include "b4860si-post.dtsi" diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi index 868719821106..8d99df9a0259 100644 --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi @@ -32,7 +32,7 @@ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -/include/ "b4si-post.dtsi" +#include "b4si-post.dtsi" /* controller at 0x20 */ &pci0 { diff --git a/arch/powerpc/boot/dts/fsl/b4qds.dtsi b/arch/powerpc/boot/dts/fsl/b4qds.dtsi index 05be919f3545..0fd5a51942a3 100644 --- a/arch/powerpc/boot/dts/fsl/b4qds.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4qds.dtsi @@ -277,4 +277,4 @@ }; }; -/include/ "b4si-post.dtsi" +#include "b4si-post.dtsi" diff --git a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi index 4f044b41a776..b8c0cfe342ff 100644 --- a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi @@ -32,6 +32,8 @@ * this software, even if advised of the possibility of such damage. */ +#include + &bman_fbpr { compatible = "fsl,bman-fbpr"; alloc-ranges = <0 0 0x1 0>; @@ -70,13 +72,13 @@ device_type = "pci"; reg = <0 0 0 0 0>; interrupts = <20 2 0 0>; - interrupt-map-mask = <0xf800 0 0 7>; + interrupt-map-mask = <0xf800 0 0 IRQ_INT_ALL>; interrupt-map = < /* IDSEL 0x0 */
[PATCH v1 6/7] powerpc: dts: Use IRQ flags for legacy PCI IRQ interrupts
Replace magic numbers used to describe legacy PCI IRQ interrupts with #define. Signed-off-by: Andrew Murray --- arch/powerpc/boot/dts/bluestone.dts | 12 +++-- arch/powerpc/boot/dts/charon.dts | 12 +++-- arch/powerpc/boot/dts/digsy_mtc.dts | 12 +++-- arch/powerpc/boot/dts/haleakala.dts | 12 +++-- arch/powerpc/boot/dts/holly.dts | 42 arch/powerpc/boot/dts/hotfoot.dts | 12 +++-- arch/powerpc/boot/dts/kuroboxHD.dts | 28 ++- arch/powerpc/boot/dts/kuroboxHG.dts | 28 ++- arch/powerpc/boot/dts/lite5200.dts| 12 +++-- arch/powerpc/boot/dts/lite5200b.dts | 22 + arch/powerpc/boot/dts/media5200.dts | 26 +- arch/powerpc/boot/dts/mpc5121ads.dts | 20 arch/powerpc/boot/dts/mpc8308rdb.dts | 12 +++-- arch/powerpc/boot/dts/mpc8313erdb.dts | 20 arch/powerpc/boot/dts/mpc832x_mds.dts | 60 --- arch/powerpc/boot/dts/mpc832x_rdb.dts | 22 + arch/powerpc/boot/dts/mpc8349emitxgp.dts | 8 +-- arch/powerpc/boot/dts/mpc836x_mds.dts | 60 --- arch/powerpc/boot/dts/mpc836x_rdk.dts | 16 +++--- arch/powerpc/boot/dts/mucmc52.dts | 12 +++-- arch/powerpc/boot/dts/mvme5100.dts| 48 +- arch/powerpc/boot/dts/pcm030.dts | 22 + arch/powerpc/boot/dts/pcm032.dts | 22 + arch/powerpc/boot/dts/pq2fads.dts | 28 ++- arch/powerpc/boot/dts/socrates.dts| 8 +-- arch/powerpc/boot/dts/storcenter.dts | 28 ++- arch/powerpc/boot/dts/stx_gp3_8560.dts| 36 +++--- arch/powerpc/boot/dts/taishan.dts | 20 arch/powerpc/boot/dts/tqm5200.dts | 12 +++-- arch/powerpc/boot/dts/tqm8540.dts | 16 +++--- arch/powerpc/boot/dts/tqm8541.dts | 16 +++--- arch/powerpc/boot/dts/tqm8555.dts | 16 +++--- arch/powerpc/boot/dts/tqm8560.dts | 16 +++--- arch/powerpc/boot/dts/virtex440-ml510.dts | 43 arch/powerpc/boot/dts/xcalibur1501.dts| 13 +++-- arch/powerpc/boot/dts/xpedite5200.dts | 8 +-- 36 files changed, 437 insertions(+), 363 deletions(-) diff --git a/arch/powerpc/boot/dts/bluestone.dts b/arch/powerpc/boot/dts/bluestone.dts index cc965a1816b6..851b6f764ec3 100644 --- a/arch/powerpc/boot/dts/bluestone.dts +++ b/arch/powerpc/boot/dts/bluestone.dts @@ -8,6 +8,8 @@ /dts-v1/; +#include + / { #address-cells = <2>; #size-cells = <1>; @@ -359,12 +361,12 @@ * below are basically de-swizzled numbers. * The real slot is on idsel 0, so the swizzling is 1:1 */ - interrupt-map-mask = <0x0 0x0 0x0 0x7>; + interrupt-map-mask = <0x0 0x0 0x0 IRQ_INT_ALL>; interrupt-map = < - 0x0 0x0 0x0 0x1 &UIC3 0xc 0x4 /* swizzled int A */ - 0x0 0x0 0x0 0x2 &UIC3 0xd 0x4 /* swizzled int B */ - 0x0 0x0 0x0 0x3 &UIC3 0xe 0x4 /* swizzled int C */ - 0x0 0x0 0x0 0x4 &UIC3 0xf 0x4 /* swizzled int D */>; + 0x0 0x0 0x0 IRQ_INTA &UIC3 0xc 0x4 /* swizzled int A */ + 0x0 0x0 0x0 IRQ_INTB &UIC3 0xd 0x4 /* swizzled int B */ + 0x0 0x0 0x0 IRQ_INTC &UIC3 0xe 0x4 /* swizzled int C */ + 0x0 0x0 0x0 IRQ_INTD &UIC3 0xf 0x4 /* swizzled int D */>; }; MSI: ppc4xx-msi@C1000 { diff --git a/arch/powerpc/boot/dts/charon.dts b/arch/powerpc/boot/dts/charon.dts index 408b486b13df..0e2fe2511c46 100644 --- a/arch/powerpc/boot/dts/charon.dts +++ b/arch/powerpc/boot/dts/charon.dts @@ -11,6 +11,8 @@ /dts-v1/; +#include + / { model = "anon,charon"; compatible = "anon,charon"; @@ -217,11 +219,11 @@ device_type = "pci"; compatible = "fsl,mpc5200-pci"; reg = <0xfd00 0x100>; - interrupt-map-mask = <0xf800 0 0 7>; - interrupt-map = <0xc000 0 0 1 &mpc5200_pic 0 0 3 -0xc000 0 0 2 &mpc5200_pic 0 0 3 -0xc000 0 0 3 &mpc5200_pic 0 0 3 -0xc000 0 0 4 &mpc5200_pic 0 0 3>; + interrupt-map-mask = <0xf800 0 0 IRQ_INT_ALL>; + interrupt-map = <0xc000 0 0 IRQ_INTA &mpc5200_pic 0 0 3 +0xc000 0 0 IRQ_INTB &mpc5200_pic 0 0 3 +0xc000 0 0 IRQ_INTC &mpc5200_pic 0 0 3 +0xc000 0 0 IRQ_INTD &mpc5200_pic 0 0 3&
Re: [PATCH v6 1/3] dt-bindings: pci: layerscape-pci: add compatible strings "fsl,ls1028a-pcie"
On Mon, Sep 02, 2019 at 11:43:17AM +0800, Xiaowei Bao wrote: > Add the PCIe compatible string for LS1028A > > Signed-off-by: Xiaowei Bao > Signed-off-by: Hou Zhiqiang > Reviewed-by: Rob Herring > --- Reviewed-by: Andrew Murray > v2: > - No change. > v3: > - No change. > v4: > - No change. > v5: > - No change. > v6: > - No change. > > Documentation/devicetree/bindings/pci/layerscape-pci.txt | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > index e20ceaa..99a386e 100644 > --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > @@ -21,6 +21,7 @@ Required properties: > "fsl,ls1046a-pcie" > "fsl,ls1043a-pcie" > "fsl,ls1012a-pcie" > +"fsl,ls1028a-pcie" >EP mode: > "fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep" > - reg: base addresses and lengths of the PCIe controller register blocks. > -- > 2.9.5 >
Re: [PATCH v6 3/3] PCI: layerscape: Add LS1028a support
On Mon, Sep 02, 2019 at 11:43:19AM +0800, Xiaowei Bao wrote: > Add support for the LS1028a PCIe controller. > > Signed-off-by: Xiaowei Bao > Signed-off-by: Hou Zhiqiang > --- > v2: > - No change. > v3: > - Reuse the ls2088 driver data structurt. > v4: > - No change. > v5: > - No change. > v6: > - No change. > > drivers/pci/controller/dwc/pci-layerscape.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c > b/drivers/pci/controller/dwc/pci-layerscape.c > index 3a5fa26..f24f79a 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape.c > +++ b/drivers/pci/controller/dwc/pci-layerscape.c > @@ -263,6 +263,7 @@ static const struct ls_pcie_drvdata ls2088_drvdata = { > static const struct of_device_id ls_pcie_of_match[] = { > { .compatible = "fsl,ls1012a-pcie", .data = &ls1046_drvdata }, > { .compatible = "fsl,ls1021a-pcie", .data = &ls1021_drvdata }, > + { .compatible = "fsl,ls1028a-pcie", .data = &ls2088_drvdata }, > { .compatible = "fsl,ls1043a-pcie", .data = &ls1043_drvdata }, > { .compatible = "fsl,ls1046a-pcie", .data = &ls1046_drvdata }, > { .compatible = "fsl,ls2080a-pcie", .data = &ls2080_drvdata }, Reviewed-by: Andrew Murray > -- > 2.9.5 >
[PATCH 00/10] perf/core: Generalise event exclusion checking
Many PMU drivers do not have the capability to exclude counting events that occur in specific contexts such as idle, kernel, guest, etc. These drivers indicate this by returning an error in their event_init upon testing the events attribute flags. However this approach requires that each time a new event modifier is added to perf, all the perf drivers need to be modified to indicate that they don't support the attribute. This results in additional boiler-plate code common to many drivers that needs to be maintained. An example of this is the addition of exclude_host and exclude_guest in 2011 yet many PMU drivers do not support this or indicate an error on events that make use of it. This patch generalises the test for exclusion and updates PMU drivers to use it. This is a functional change as some PMU drivers will now correctly report that they don't support certain events whereas they previously did. An alternative approach might have been to provide a static perf_event_attr with exclusion events set such that each PMU driver could compare against (see ref [1]) - however this is less readable. A longer term approach may instead be for PMU's to advertise their capabilities on registration. All drivers touched by this patchset have been compile tested. [1] https://lore.kernel.org/patchwork/patch/325116/ ~ Andrew Murray (10): perf/core: Add macro to test for event exclusion flags arm: perf/core: generalise event exclusion checking with perf macro arm: perf: add additional validation to set_event_filter powerpc: perf/core: generalise event exclusion checking with perf macro powerpc/pmu/fsl: add additional validation to event_init alpha: perf/core: generalise event exclusion checking with perf macro x86: perf/core: generalise event exclusion checking with perf macro perf/core: Remove unused perf_flags drivers/perf: perf/core: generalise event exclusion checking with perf macro perf/doc: update design.txt for exclude_{host|guest} flags arch/alpha/kernel/perf_event.c | 4 +--- arch/arm/kernel/perf_event_v7.c | 2 ++ arch/arm/mach-imx/mmdc.c | 8 +--- arch/arm/mm/cache-l2x0-pmu.c | 7 +-- arch/powerpc/perf/core-fsl-emb.c | 2 ++ arch/powerpc/perf/hv-24x7.c | 7 +-- arch/powerpc/perf/hv-gpci.c | 7 +-- arch/powerpc/perf/imc-pmu.c | 14 ++ arch/x86/events/amd/ibs.c| 11 +-- arch/x86/events/amd/iommu.c | 3 +-- arch/x86/events/amd/power.c | 8 +--- arch/x86/events/amd/uncore.c | 3 +-- arch/x86/events/intel/cstate.c | 7 +-- arch/x86/events/intel/rapl.c | 7 +-- arch/x86/events/intel/uncore.c | 3 +-- arch/x86/events/intel/uncore_snb.c | 7 +-- arch/x86/events/msr.c| 7 +-- drivers/perf/arm-cci.c | 7 +-- drivers/perf/arm-ccn.c | 5 + drivers/perf/arm_dsu_pmu.c | 7 +-- drivers/perf/arm_pmu.c | 9 + drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +-- drivers/perf/qcom_l2_pmu.c | 3 +-- drivers/perf/qcom_l3_pmu.c | 3 +-- drivers/perf/xgene_pmu.c | 3 +-- include/linux/perf_event.h | 9 + include/uapi/linux/perf_event.h | 2 -- tools/include/uapi/linux/perf_event.h| 2 -- tools/perf/design.txt| 4 29 files changed, 41 insertions(+), 127 deletions(-) -- 2.7.4
[PATCH 02/10] arm: perf/core: generalise event exclusion checking with perf macro
Replace checking of perf event exclusion flags with perf macro. Signed-off-by: Andrew Murray --- arch/arm/mach-imx/mmdc.c | 8 +--- arch/arm/mm/cache-l2x0-pmu.c | 7 +-- 2 files changed, 2 insertions(+), 13 deletions(-) diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c index 04b3bf7..d9d468f 100644 --- a/arch/arm/mach-imx/mmdc.c +++ b/arch/arm/mach-imx/mmdc.c @@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest || - event->attr.sample_period) + if (event_has_exclude_flags(event) || event->attr.sample_period) return -EINVAL; if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS) diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c index afe5b4c..968fdf8 100644 --- a/arch/arm/mm/cache-l2x0-pmu.c +++ b/arch/arm/mm/cache-l2x0-pmu.c @@ -314,12 +314,7 @@ static int l2x0_pmu_event_init(struct perf_event *event) event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; if (event->cpu < 0) -- 2.7.4
[PATCH 01/10] perf/core: Add macro to test for event exclusion flags
Add a macro that tests if any of the perf event exclusion flags are set on a given event. Signed-off-by: Andrew Murray --- include/linux/perf_event.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 53c500f..89ee7fa 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); +static inline bool event_has_exclude_flags(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + + return attr->exclude_idle || attr->exclude_user || + attr->exclude_kernel || attr->exclude_hv || + attr->exclude_guest || attr->exclude_host; +} + static inline bool is_sampling_event(struct perf_event *event) { return event->attr.sample_period != 0; -- 2.7.4
[PATCH 03/10] arm: perf: add additional validation to set_event_filter
The armv7pmu driver doesn't support host/guest mode exclusion so let's report this when set_event_filter is called with these exclusion flags set. Signed-off-by: Andrew Murray --- arch/arm/kernel/perf_event_v7.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c index a4fb0f8..c4c9fbb 100644 --- a/arch/arm/kernel/perf_event_v7.c +++ b/arch/arm/kernel/perf_event_v7.c @@ -1074,6 +1074,8 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event, if (attr->exclude_idle) return -EPERM; + if (attr->exclude_host || attr->exclude_guest) + return -EPERM; if (attr->exclude_user) config_base |= ARMV7_EXCLUDE_USER; if (attr->exclude_kernel) -- 2.7.4
[PATCH 04/10] powerpc: perf/core: generalise event exclusion checking with perf macro
Replace checking of perf event exclusion flags with perf macro. Signed-off-by: Andrew Murray --- arch/powerpc/perf/hv-24x7.c | 7 +-- arch/powerpc/perf/hv-gpci.c | 7 +-- arch/powerpc/perf/imc-pmu.c | 14 ++ 3 files changed, 4 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 72238ee..60db22d 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1307,12 +1307,7 @@ static int h_24x7_event_init(struct perf_event *event) } /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; /* no branch sampling */ diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 43fabb3..2d2b5c0 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -233,12 +233,7 @@ static int h_gpci_event_init(struct perf_event *event) } /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; /* no branch sampling */ diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 1fafc32b..1ae1d3f 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -474,12 +474,7 @@ static int nest_imc_event_init(struct perf_event *event) return -EINVAL; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; if (event->cpu < 0) @@ -749,12 +744,7 @@ static int core_imc_event_init(struct perf_event *event) return -EINVAL; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; if (event->cpu < 0) -- 2.7.4
[PATCH 05/10] powerpc/pmu/fsl: add additional validation to event_init
The fsl PMU driver doesn't support host/guest mode exclusion so let's report this when event_init is called with these exclusion flags set. Signed-off-by: Andrew Murray --- arch/powerpc/perf/core-fsl-emb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c index ba48584..b1cc9d1 100644 --- a/arch/powerpc/perf/core-fsl-emb.c +++ b/arch/powerpc/perf/core-fsl-emb.c @@ -559,6 +559,8 @@ static int fsl_emb_pmu_event_init(struct perf_event *event) event->hw.config_base |= PMLCA_FCS; if (event->attr.exclude_idle) return -ENOTSUPP; + if (event->attr.exclude_host || event->attr.exclude_guest) + return -ENOTSUPP; event->hw.last_period = event->hw.sample_period; local64_set(&event->hw.period_left, event->hw.last_period); -- 2.7.4
[PATCH 06/10] alpha: perf/core: generalise event exclusion checking with perf macro
Replace checking of perf event exclusion flags with perf macro. This is a functional change as __hw_perf_event_init will now indicate that it doesn't support exclude_host and exclude_guest event flags. Signed-off-by: Andrew Murray --- arch/alpha/kernel/perf_event.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c index 5613aa37..36f98ee 100644 --- a/arch/alpha/kernel/perf_event.c +++ b/arch/alpha/kernel/perf_event.c @@ -631,10 +631,8 @@ static int __hw_perf_event_init(struct perf_event *event) } /* The EV67 does not support mode exclusion */ - if (attr->exclude_kernel || attr->exclude_user - || attr->exclude_hv || attr->exclude_idle) { + if (event_has_exclude_flags(event)) return -EPERM; - } /* * We place the event type in event_base here and leave calculation -- 2.7.4
[PATCH 07/10] x86: perf/core: generalise event exclusion checking with perf macro
Replace checking of perf event exclusion flags with perf macro. This is a functional change as exclude_host and exclude_guest are added in these files: arch/x86/events/intel/uncore.c and exclude_idle and exclude_hv are added in these files: arch/x86/events/amd/iommu.c arch/x86/events/amd/uncore.c Signed-off-by: Andrew Murray --- arch/x86/events/amd/ibs.c | 11 +-- arch/x86/events/amd/iommu.c| 3 +-- arch/x86/events/amd/power.c| 8 +--- arch/x86/events/amd/uncore.c | 3 +-- arch/x86/events/intel/cstate.c | 7 +-- arch/x86/events/intel/rapl.c | 7 +-- arch/x86/events/intel/uncore.c | 3 +-- arch/x86/events/intel/uncore_snb.c | 7 +-- arch/x86/events/msr.c | 7 +-- 9 files changed, 9 insertions(+), 47 deletions(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index d50bb4d..a51981c 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config) return -EOPNOTSUPP; } -static const struct perf_event_attr ibs_notsupp = { - .exclude_user = 1, - .exclude_kernel = 1, - .exclude_hv = 1, - .exclude_idle = 1, - .exclude_host = 1, - .exclude_guest = 1, -}; - static int perf_ibs_init(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; @@ -282,7 +273,7 @@ static int perf_ibs_init(struct perf_event *event) if (event->pmu != &perf_ibs->pmu) return -ENOENT; - if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp)) + if (event_has_exclude_flags(event)) return -EINVAL; if (config & ~perf_ibs->config_mask) diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c index 3210fee..fa7541b 100644 --- a/arch/x86/events/amd/iommu.c +++ b/arch/x86/events/amd/iommu.c @@ -224,8 +224,7 @@ static int perf_iommu_event_init(struct perf_event *event) return -EINVAL; /* IOMMU counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; if (event->cpu < 0) diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c index 2aefacf..4129fbe 100644 --- a/arch/x86/events/amd/power.c +++ b/arch/x86/events/amd/power.c @@ -136,13 +136,7 @@ static int pmu_event_init(struct perf_event *event) return -ENOENT; /* Unsupported modes and filters. */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - /* no sampling */ + if (event_has_exclude_flags(event) || event->attr.sample_period) return -EINVAL; diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index 8671de1..c2015c7 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -202,8 +202,7 @@ static int amd_uncore_event_init(struct perf_event *event) return -EINVAL; /* NB and Last level cache counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; /* and we do not enable counter overflow interrupts */ diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 9f8084f..9366833 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -280,12 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event) return -ENOENT; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || + if (event_has_exclude_flags(event) || event->attr.sample_period) /* no sampling */ return -EINVAL; diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 32f3e94..428d40c 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -397,12 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event) return -EINVAL; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.ex
[PATCH 08/10] perf/core: Remove unused perf_flags
Now that perf_flags is not used we remove it. Signed-off-by: Andrew Murray --- include/uapi/linux/perf_event.h | 2 -- tools/include/uapi/linux/perf_event.h | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ -- 2.7.4
[PATCH 09/10] drivers/perf: perf/core: generalise event exclusion checking with perf macro
Replace checking of perf event exclusion flags with perf macro. This is a functional change as exclude_host and exclude_guest are added in the following files: - drivers/perf/qcom_l2_pmu.c - drivers/perf/qcom_l3_pmu.c - drivers/perf/arm_pmu.c And exclude_idle and exclude_hv are added in these files: - drivers/perf/xgene_pmu.c Signed-off-by: Andrew Murray --- drivers/perf/arm-cci.c | 7 +-- drivers/perf/arm-ccn.c | 5 + drivers/perf/arm_dsu_pmu.c | 7 +-- drivers/perf/arm_pmu.c | 9 + drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +-- drivers/perf/qcom_l2_pmu.c | 3 +-- drivers/perf/qcom_l3_pmu.c | 3 +-- drivers/perf/xgene_pmu.c | 3 +-- 8 files changed, 8 insertions(+), 36 deletions(-) diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c index 1bfeb16..d749f19 100644 --- a/drivers/perf/arm-cci.c +++ b/drivers/perf/arm-cci.c @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; /* We have no filtering of any kind */ - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest) + if (event_has_exclude_flags(event)) return -EINVAL; /* diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c index 7dd850e..9a22a95 100644 --- a/drivers/perf/arm-ccn.c +++ b/drivers/perf/arm-ccn.c @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (has_branch_stack(event) || event->attr.exclude_user || - event->attr.exclude_kernel || event->attr.exclude_hv || - event->attr.exclude_idle || event->attr.exclude_host || - event->attr.exclude_guest) { + if (has_branch_stack(event) || event_has_exclude_flags(event)) { dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); return -EINVAL; } diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c index 660cb8a..300ff3d 100644 --- a/drivers/perf/arm_dsu_pmu.c +++ b/drivers/perf/arm_dsu_pmu.c @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event) } if (has_branch_stack(event) || - event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) { + event_has_exclude_flags(event)) { dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n"); return -EINVAL; } diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 7f01f6f..a03634f 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) } static int -event_requires_mode_exclusion(struct perf_event_attr *attr) -{ - return attr->exclude_idle || attr->exclude_user || - attr->exclude_kernel || attr->exclude_hv; -} - -static int __hw_perf_event_init(struct perf_event *event) { struct arm_pmu *armpmu = to_arm_pmu(event->pmu); @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event) */ if ((!armpmu->set_event_filter || armpmu->set_event_filter(hwc, &event->attr)) && -event_requires_mode_exclusion(&event->attr)) { +event_has_exclude_flags(event)) { pr_debug("ARM performance counters do not support " "mode exclusion\n"); return -EOPNOTSUPP; diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c index 9efd241..d3edff9 100644 --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c @@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; /* counters do not have these bits */ - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_host|| - event->attr.exclude_guest || - event->attr.exclude_hv || - event->attr.exclude_idle) + if (event_has_exclude_flags(event)) return -EINVAL; /* diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c index 842135c..d7d85a2 100644 --- a/drivers/perf/qcom_l2_pmu.c +++ b/drivers/perf/qcom_l2_pmu.c @@ -
[PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags
Update design.txt to reflect the presence of the exclude_host and exclude_guest perf flags. Signed-off-by: Andrew Murray --- tools/perf/design.txt | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/design.txt b/tools/perf/design.txt index a28dca2..7de7d83 100644 --- a/tools/perf/design.txt +++ b/tools/perf/design.txt @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a way to request that counting of events be restricted to times when the CPU is in user, kernel and/or hypervisor mode. +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way +to request counting of events restricted to guest and host contexts when +using virtualisation. + The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap operations, these can be used to relate userspace IP addresses to actual code, even after the mapping (or even the whole process) is gone, -- 2.7.4
Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags
On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote: > Andrew Murray writes: > > > Update design.txt to reflect the presence of the exclude_host > > and exclude_guest perf flags. > > > > Signed-off-by: Andrew Murray > > --- > > tools/perf/design.txt | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/tools/perf/design.txt b/tools/perf/design.txt > > index a28dca2..7de7d83 100644 > > --- a/tools/perf/design.txt > > +++ b/tools/perf/design.txt > > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' > > bits provide a > > way to request that counting of events be restricted to times when the > > CPU is in user, kernel and/or hypervisor mode. > > > > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way > > +to request counting of events restricted to guest and host contexts when > > +using virtualisation. > > How does exclude_host differ from exclude_hv ? I believe exclude_host / exclude_guest are intented to distinguish between host and guest in the hosted hypervisor context (KVM). Whereas exclude_hv allows to distinguish between guest and hypervisor in the bare-metal type hypervisors. In the case of arm64 - if VHE extensions are present then the host kernel will run at a higher privilege to the guest kernel, in which case there is no distinction between hypervisor and host so we ignore exclude_hv. But where VHE extensions are not present then the host kernel runs at the same privilege level as the guest and we use a higher privilege level to switch between them - in this case we can use exclude_hv to discount that hypervisor role of switching between guests. Thanks, Andrew Murray > > cheers
Re: [PATCH 00/10] perf/core: Generalise event exclusion checking
On Mon, Nov 19, 2018 at 02:08:00PM +0100, Peter Zijlstra wrote: > On Fri, Nov 16, 2018 at 10:24:03AM +0000, Andrew Murray wrote: > > Many PMU drivers do not have the capability to exclude counting events > > that occur in specific contexts such as idle, kernel, guest, etc. These > > drivers indicate this by returning an error in their event_init upon > > testing the events attribute flags. > > > > However this approach requires that each time a new event modifier is > > added to perf, all the perf drivers need to be modified to indicate that > > they don't support the attribute. This results in additional boiler-plate > > code common to many drivers that needs to be maintained. An example of > > this is the addition of exclude_host and exclude_guest in 2011 yet many > > PMU drivers do not support this or indicate an error on events that make > > use of it. > > > > This patch generalises the test for exclusion and updates PMU drivers to > > use it. This is a functional change as some PMU drivers will now correctly > > report that they don't support certain events whereas they previously did. > > Right, I like that idea, and yes, there's a lot of fail around there :/ > > > A longer term approach may instead be for PMU's to advertise their > > capabilities on registration. > > This I think is the better approach. We already have the > PERF_PMU_CAP_flags that can be used to advertise various PMU > capabilities. OK I'll respin my series to take this approach. > > Something along these lines I suppose; then every PMU that actually > checks the flags, needs to set the flag, otherwise it'll fail. > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 53c500f0ca79..de15723ea52a 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -244,6 +244,7 @@ struct perf_event; > #define PERF_PMU_CAP_EXCLUSIVE 0x10 > #define PERF_PMU_CAP_ITRACE 0x20 > #define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40 > +#define PERF_PMU_CAP_EXCLUDE 0x80 > > /** > * struct pmu - generic performance monitoring unit > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 84530ab358c3..d76b724177b9 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct > perf_event *event) > if (ctx) > perf_event_ctx_unlock(event->group_leader, ctx); > > + if (!ret) { > + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) || > + event_has_exclude_flags(event)) { > + event->destroy(event); > + ret = -EINVAL; > + } > + } > + I don't quite follow this logic. Should that not have been: if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) && event_has_exclude_flags(event)) { Meaning that if an event has any exclude flags but the pmu doesn't have the capability to handle them then error. If you're happy with my proposed logic, then would it also make sense to move this before the call to the pmu->event_init ? Thanks, Andrew Murray > if (ret) > module_put(pmu->module); > >
Re: [PATCH 00/10] perf/core: Generalise event exclusion checking
On Thu, Nov 22, 2018 at 01:26:37PM +0100, Peter Zijlstra wrote: > On Thu, Nov 22, 2018 at 12:21:43PM +0000, Andrew Murray wrote: > > On Mon, Nov 19, 2018 at 02:08:00PM +0100, Peter Zijlstra wrote: > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index 84530ab358c3..d76b724177b9 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, > > > struct perf_event *event) > > > if (ctx) > > > perf_event_ctx_unlock(event->group_leader, ctx); > > > > > > + if (!ret) { > > > + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) || > > > + event_has_exclude_flags(event)) { > > > + event->destroy(event); > > > + ret = -EINVAL; > > > + } > > > + } > > > + > > > > I don't quite follow this logic. Should that not have been: > > > > if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) && > > event_has_exclude_flags(event)) { > > > > Meaning that if an event has any exclude flags but the pmu doesn't > > have the capability to handle them then error. > > Uhm, yes. Brainfart on my side that. > > > If you're happy with my proposed logic, then would it also make > > sense to move this before the call to the pmu->event_init ? > > I'm not sure that can work; I think we need ->event_init() first such > that it can -ENOENT. Only after ->event_init() returns success can we be > certain of @pmu. Ah yes I see now. Until event_init doesn't return -ENOENT we can't be sure that this will be the PMU we use (as per the other calls to perf_try_init_event in perf_init_event). Thanks, Andrew Murray
Re: [PATCH 09/10] drivers/perf: perf/core: generalise event exclusion checking with perf macro
On Mon, Nov 19, 2018 at 04:03:52PM +, Mark Rutland wrote: > On Fri, Nov 16, 2018 at 10:24:12AM +0000, Andrew Murray wrote: > > Replace checking of perf event exclusion flags with perf macro. > > > > This is a functional change as exclude_host and exclude_guest are added > > in the following files: > > > > - drivers/perf/qcom_l2_pmu.c > > - drivers/perf/qcom_l3_pmu.c > > - drivers/perf/arm_pmu.c > > > > And exclude_idle and exclude_hv are added in these files: > > > > - drivers/perf/xgene_pmu.c > > FWIW, that all sounds fine to me. > > Assuming you fix up the 'macro' nit: > > Acked-by: Mark Rutland > > ... unless we go for Peter's core cap for this. Yeah I'm going to re-spin with the core cap approach - but thanks anyway. Thanks, Andrew Murray > > Thanks, > Mark. > > > Signed-off-by: Andrew Murray > > --- > > drivers/perf/arm-cci.c | 7 +-- > > drivers/perf/arm-ccn.c | 5 + > > drivers/perf/arm_dsu_pmu.c | 7 +-- > > drivers/perf/arm_pmu.c | 9 + > > drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +-- > > drivers/perf/qcom_l2_pmu.c | 3 +-- > > drivers/perf/qcom_l3_pmu.c | 3 +-- > > drivers/perf/xgene_pmu.c | 3 +-- > > 8 files changed, 8 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c > > index 1bfeb16..d749f19 100644 > > --- a/drivers/perf/arm-cci.c > > +++ b/drivers/perf/arm-cci.c > > @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event > > *event) > > return -EOPNOTSUPP; > > > > /* We have no filtering of any kind */ > > - if (event->attr.exclude_user|| > > - event->attr.exclude_kernel || > > - event->attr.exclude_hv || > > - event->attr.exclude_idle|| > > - event->attr.exclude_host|| > > - event->attr.exclude_guest) > > + if (event_has_exclude_flags(event)) > > return -EINVAL; > > > > /* > > diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c > > index 7dd850e..9a22a95 100644 > > --- a/drivers/perf/arm-ccn.c > > +++ b/drivers/perf/arm-ccn.c > > @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event > > *event) > > return -EOPNOTSUPP; > > } > > > > - if (has_branch_stack(event) || event->attr.exclude_user || > > - event->attr.exclude_kernel || event->attr.exclude_hv || > > - event->attr.exclude_idle || event->attr.exclude_host || > > - event->attr.exclude_guest) { > > + if (has_branch_stack(event) || event_has_exclude_flags(event)) { > > dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); > > return -EINVAL; > > } > > diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c > > index 660cb8a..300ff3d 100644 > > --- a/drivers/perf/arm_dsu_pmu.c > > +++ b/drivers/perf/arm_dsu_pmu.c > > @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event) > > } > > > > if (has_branch_stack(event) || > > - event->attr.exclude_user || > > - event->attr.exclude_kernel || > > - event->attr.exclude_hv || > > - event->attr.exclude_idle || > > - event->attr.exclude_host || > > - event->attr.exclude_guest) { > > + event_has_exclude_flags(event)) { > > dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n"); > > return -EINVAL; > > } > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c > > index 7f01f6f..a03634f 100644 > > --- a/drivers/perf/arm_pmu.c > > +++ b/drivers/perf/arm_pmu.c > > @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void > > *dev) > > } > > > > static int > > -event_requires_mode_exclusion(struct perf_event_attr *attr) > > -{ > > - return attr->exclude_idle || attr->exclude_user || > > - attr->exclude_kernel || attr->exclude_hv; > > -} > > - > > -static int > > __hw_perf_event_init(struct perf_event *event) > > { > > struct arm_pmu *armpmu = to_arm_pmu(event->pmu); > > @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event) > > */ >
Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags
On Tue, Nov 20, 2018 at 10:28:34PM +1100, Michael Ellerman wrote: > Andrew Murray writes: > > > Add a macro that tests if any of the perf event exclusion flags > > are set on a given event. > > > > Signed-off-by: Andrew Murray > > --- > > include/linux/perf_event.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index 53c500f..89ee7fa 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event > > *event, > > extern void > > perf_log_lost_samples(struct perf_event *event, u64 lost); > > > > +static inline bool event_has_exclude_flags(struct perf_event *event) > > +{ > > + struct perf_event_attr *attr = &event->attr; > > + > > + return attr->exclude_idle || attr->exclude_user || > > + attr->exclude_kernel || attr->exclude_hv || > > + attr->exclude_guest || attr->exclude_host; > > +} > > Sorry to be a total PITA, but using "flags" plural suggests that it only > returns true if there is more than one exclude flag set. > > A better name would be event_has_exclude_flag() or maybe > event_has_any_exclude_flag(). > > If you're doing a respin anyway it'd be nice to fix the name, but > obviously it's not high priority. No problem - I'll go with event_has_any_exclude_flag. Thanks, Andrew Murray > > cheers
[PATCH v2 00/20] perf/core: Generalise event exclusion checking
Many PMU drivers do not have the capability to exclude counting events that occur in specific contexts such as idle, kernel, guest, etc. These drivers indicate this by returning an error in their event_init upon testing the events attribute flags. However this approach requires that each time a new event modifier is added to perf, all the perf drivers need to be modified to indicate that they don't support the attribute. This results in additional boiler-plate code common to many drivers that needs to be maintained. Furthermore the drivers are not consistent with regards to the error value they return when reporting unsupported attributes. This patchset allow PMU drivers to advertise their ability to exclude based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This allows the perf core to reject requests for exclusion events where there is no support in the PMU. This is a functional change, in particular: - Some drivers will now additionally (but correctly) report unsupported exclusion flags. It's typical for existing userspace tools such as perf to handle such errors by retrying the system call without the unsupported flags. - Drivers that do not support any exclusion that previously reported -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent with the majority and results in userspace perf retrying without exclusion. All drivers touched by this patchset have been compile tested. Changes from v1: - Changed approach from explicitly rejecting events in unsupporting PMU drivers to explicitly advertising a capability in PMU drivers that do support exclusion events - Added additional information to tools/perf/design.txt - Rename event_has_exclude_flags to event_has_any_exclude_flag and update commit log to reflect it's a function Andrew Murray (20): perf/doc: update design.txt for exclude_{host|guest} flags perf/core: add function to test for event exclusion flags perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs perf/hw_breakpoint: perf/core: advertise PMU exclusion capability alpha: perf/core: remove unnecessary checks for exclusion arc: perf/core: advertise PMU exclusion capability arm: perf: conditionally advertise PMU exclusion capability arm: perf/core: remove unnecessary checks for exclusion drivers/perf: perf/core: remove unnecessary checks for exclusion drivers/perf: perf/core: remove unnecessary checks for exclusion drivers/perf: perf/core: advertise PMU exclusion capability mips: perf/core: advertise PMU exclusion capability powerpc: perf/core: advertise PMU exclusion capability powerpc: perf/core: remove unnecessary checks for exclusion s390: perf/events: advertise PMU exclusion capability sparc: perf/core: advertise PMU exclusion capability x86: perf/core: remove unnecessary checks for exclusion x86: perf/core remove unnecessary checks for exclusion x86: perf/core: advertise PMU exclusion capability perf/core: remove unused perf_flags arch/alpha/kernel/perf_event.c | 6 -- arch/arc/kernel/perf_event.c | 1 + arch/arm/mach-imx/mmdc.c | 8 +--- arch/arm/mm/cache-l2x0-pmu.c | 8 arch/mips/kernel/perf_event_mipsxx.c | 1 + arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/core-fsl-emb.c | 1 + arch/powerpc/perf/hv-24x7.c | 9 - arch/powerpc/perf/hv-gpci.c | 9 - arch/powerpc/perf/imc-pmu.c | 18 -- arch/s390/kernel/perf_cpum_cf.c | 1 + arch/s390/kernel/perf_cpum_sf.c | 2 ++ arch/sparc/kernel/perf_event.c | 1 + arch/x86/events/amd/ibs.c| 12 arch/x86/events/amd/iommu.c | 5 - arch/x86/events/amd/power.c | 9 + arch/x86/events/amd/uncore.c | 5 - arch/x86/events/core.c | 2 ++ arch/x86/events/intel/bts.c | 2 +- arch/x86/events/intel/cstate.c | 8 +--- arch/x86/events/intel/pt.c | 4 +++- arch/x86/events/intel/rapl.c | 8 +--- arch/x86/events/intel/uncore.c | 8 arch/x86/events/intel/uncore_snb.c | 8 +--- arch/x86/events/msr.c| 8 +--- drivers/perf/arm-cci.c | 9 - drivers/perf/arm-ccn.c | 5 + drivers/perf/arm_dsu_pmu.c | 8 +--- drivers/perf/arm_pmu.c | 15 +-- drivers/perf/arm_spe_pmu.c | 3 ++- drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 - drivers/perf/qcom_l2_pmu.c | 8 drivers/perf/qcom_l3_pmu.c | 7 --- drivers/perf/xgene_pmu.c | 5 - include/linux/perf_event.h | 10 ++ include/uapi/linux/perf_event.h | 2 -- kerne
[PATCH v2 01/20] perf/doc: update design.txt for exclude_{host|guest} flags
Update design.txt to reflect the presence of the exclude_host and exclude_guest perf flags. Signed-off-by: Andrew Murray --- tools/perf/design.txt | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/design.txt b/tools/perf/design.txt index a28dca2..5b2b23b 100644 --- a/tools/perf/design.txt +++ b/tools/perf/design.txt @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a way to request that counting of events be restricted to times when the CPU is in user, kernel and/or hypervisor mode. +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way +to request counting of events restricted to guest and host contexts when +using KVM virtualisation. + The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap operations, these can be used to relate userspace IP addresses to actual code, even after the mapping (or even the whole process) is gone, -- 2.7.4
[PATCH v2 02/20] perf/core: add function to test for event exclusion flags
Add a function that tests if any of the perf event exclusion flags are set on a given event. Signed-off-by: Andrew Murray --- include/linux/perf_event.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 53c500f..b2e806f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); +static inline bool event_has_any_exclude_flag(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + + return attr->exclude_idle || attr->exclude_user || + attr->exclude_kernel || attr->exclude_hv || + attr->exclude_guest || attr->exclude_host; +} + static inline bool is_sampling_event(struct perf_event *event) { return event->attr.sample_period != 0; -- 2.7.4
[PATCH v2 03/20] perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs
Many PMU drivers do not have the capability to exclude counting events that occur in specific contexts such as idle, kernel, guest, etc. These drivers indicate this by returning an error in their event_init upon testing the events attribute flags. This approach is error prone and often inconsistent. Let's instead allow PMU drivers to advertise their ability to exclude based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This allows the perf core to reject requests for exclusion events where there is no support in the PMU. Signed-off-by: Andrew Murray --- include/linux/perf_event.h | 1 + kernel/events/core.c | 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b2e806f..69b3d65 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -244,6 +244,7 @@ struct perf_event; #define PERF_PMU_CAP_EXCLUSIVE 0x10 #define PERF_PMU_CAP_ITRACE0x20 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40 +#define PERF_PMU_CAP_EXCLUDE 0x80 /** * struct pmu - generic performance monitoring unit diff --git a/kernel/events/core.c b/kernel/events/core.c index 5a97f34..9afb33c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) if (ctx) perf_event_ctx_unlock(event->group_leader, ctx); + if (!ret) { + if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) && + event_has_any_exclude_flag(event)) { + if (event->destroy) + event->destroy(event); + ret = -EINVAL; + } + } + if (ret) module_put(pmu->module); -- 2.7.4
[PATCH v2 04/20] perf/hw_breakpoint: perf/core: advertise PMU exclusion capability
The breakpoint PMU has the capability to exclude kernel contexts, let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability so that perf doesn't prevent us from handling events with any exclusion flags set. Signed-off-by: Andrew Murray --- kernel/events/hw_breakpoint.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index d6b5618..fefe3d5 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -669,6 +669,8 @@ static struct pmu perf_breakpoint = { .start = hw_breakpoint_start, .stop = hw_breakpoint_stop, .read = hw_breakpoint_pmu_read, + + .capabilities = PERF_PMU_CAP_EXCLUDE, }; int __init init_hw_breakpoint(void) -- 2.7.4
[PATCH v2 05/20] alpha: perf/core: remove unnecessary checks for exclusion
As the Alpha PMU doesn't support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. This change means that __hw_perf_event_init will now also indicate that it doesn't support exclude_host and exclude_guest and will now implicitly return -EINVAL instead of -EPERM. This is likely more desirable as -EPERM will result in a kernel.perf_event_paranoid related warning from the perf userspace utility. Signed-off-by: Andrew Murray --- arch/alpha/kernel/perf_event.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c index 5613aa37..5c17077 100644 --- a/arch/alpha/kernel/perf_event.c +++ b/arch/alpha/kernel/perf_event.c @@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event) return ev; } - /* The EV67 does not support mode exclusion */ - if (attr->exclude_kernel || attr->exclude_user - || attr->exclude_hv || attr->exclude_idle) { - return -EPERM; - } - /* * We place the event type in event_base here and leave calculation * of the codes to programme the PMU for alpha_pmu_enable() because -- 2.7.4
[PATCH v2 06/20] arc: perf/core: advertise PMU exclusion capability
The ARC PMU has the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/arc/kernel/perf_event.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c index 8aec462..cf8ea6d 100644 --- a/arch/arc/kernel/perf_event.c +++ b/arch/arc/kernel/perf_event.c @@ -514,6 +514,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev) .start = arc_pmu_start, .stop = arc_pmu_stop, .read = arc_pmu_read, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; if (has_interrupts) { -- 2.7.4
[PATCH v2 07/20] arm: perf: conditionally advertise PMU exclusion capability
The ARM PMU driver can be used to represent a variety of ARM based PMUs. Some of these PMUs provide support for context exclusion, where this is the case we advertise this via the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- drivers/perf/arm_pmu.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 7f01f6f..4409035 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) } static int -event_requires_mode_exclusion(struct perf_event_attr *attr) -{ - return attr->exclude_idle || attr->exclude_user || - attr->exclude_kernel || attr->exclude_hv; -} - -static int __hw_perf_event_init(struct perf_event *event) { struct arm_pmu *armpmu = to_arm_pmu(event->pmu); @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event) /* * Check whether we need to exclude the counter from certain modes. */ - if ((!armpmu->set_event_filter || -armpmu->set_event_filter(hwc, &event->attr)) && -event_requires_mode_exclusion(&event->attr)) { + if (armpmu->set_event_filter && + armpmu->set_event_filter(hwc, &event->attr)) { pr_debug("ARM performance counters do not support " "mode exclusion\n"); return -EOPNOTSUPP; @@ -861,6 +853,9 @@ int armpmu_register(struct arm_pmu *pmu) if (ret) return ret; + if (pmu->set_event_filter) + pmu->pmu.capabilities |= PERF_PMU_CAP_EXCLUDE; + ret = perf_pmu_register(&pmu->pmu, pmu->name, -1); if (ret) goto out_destroy; -- 2.7.4
[PATCH v2 08/20] arm: perf/core: remove unnecessary checks for exclusion
For drivers that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/arm/mach-imx/mmdc.c | 8 +--- arch/arm/mm/cache-l2x0-pmu.c | 8 2 files changed, 1 insertion(+), 15 deletions(-) diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c index 04b3bf7..b937a15 100644 --- a/arch/arm/mach-imx/mmdc.c +++ b/arch/arm/mach-imx/mmdc.c @@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest || - event->attr.sample_period) + if (event->attr.sample_period) return -EINVAL; if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS) diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c index afe5b4c..ba92f9e 100644 --- a/arch/arm/mm/cache-l2x0-pmu.c +++ b/arch/arm/mm/cache-l2x0-pmu.c @@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event) event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; -- 2.7.4
[PATCH v2 09/20] drivers/perf: perf/core: remove unnecessary checks for exclusion
For drivers that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- drivers/perf/arm-cci.c | 9 - drivers/perf/arm-ccn.c | 5 + drivers/perf/arm_dsu_pmu.c | 8 +--- drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 - 4 files changed, 2 insertions(+), 29 deletions(-) diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c index 1bfeb16..501b497 100644 --- a/drivers/perf/arm-cci.c +++ b/drivers/perf/arm-cci.c @@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EOPNOTSUPP; - /* We have no filtering of any kind */ - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest) - return -EINVAL; - /* * Following the example set by other "uncore" PMUs, we accept any CPU * and rewrite its affinity dynamically rather than having perf core diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c index 7dd850e..decf881 100644 --- a/drivers/perf/arm-ccn.c +++ b/drivers/perf/arm-ccn.c @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (has_branch_stack(event) || event->attr.exclude_user || - event->attr.exclude_kernel || event->attr.exclude_hv || - event->attr.exclude_idle || event->attr.exclude_host || - event->attr.exclude_guest) { + if (has_branch_stack(event)) { dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); return -EINVAL; } diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c index 660cb8a..036414c 100644 --- a/drivers/perf/arm_dsu_pmu.c +++ b/drivers/perf/arm_dsu_pmu.c @@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event) return -EINVAL; } - if (has_branch_stack(event) || - event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) { + if (has_branch_stack(event)) { dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n"); return -EINVAL; } diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c index 9efd241..f028cbc 100644 --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c @@ -142,15 +142,6 @@ int hisi_uncore_pmu_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EOPNOTSUPP; - /* counters do not have these bits */ - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_host|| - event->attr.exclude_guest || - event->attr.exclude_hv || - event->attr.exclude_idle) - return -EINVAL; - /* * The uncore counters not specific to any CPU, so cannot * support per-task -- 2.7.4
[PATCH v2 10/20] drivers/perf: perf/core: remove unnecessary checks for exclusion
For drivers that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. This change means that qcom_{l2|l3}_pmu will now also indicate that they do not support exclude_{host|guest} and that xgene_pmu does not also support exclude_idle and exclude_hv. Note that for qcom_l2_pmu we now implictly return -EINVAL instead of -EOPNOTSUPP. This change will result in the perf userspace utility retrying the perf_event_open system call with fallback event attributes that do not fail. Signed-off-by: Andrew Murray --- drivers/perf/qcom_l2_pmu.c | 8 drivers/perf/qcom_l3_pmu.c | 7 --- drivers/perf/xgene_pmu.c | 5 - 3 files changed, 20 deletions(-) diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c index 842135c..518e18c 100644 --- a/drivers/perf/qcom_l2_pmu.c +++ b/drivers/perf/qcom_l2_pmu.c @@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event) return -EOPNOTSUPP; } - /* We cannot filter accurately so we just don't allow it. */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) { - dev_dbg_ratelimited(&l2cache_pmu->pdev->dev, - "Can't exclude execution levels\n"); - return -EOPNOTSUPP; - } - if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) || ((event->attr.config & ~L2_EVT_MASK) != 0)) && (event->attr.config != L2CYCLE_CTR_RAW_CODE)) { diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c index 2dc63d6..e28bd2f 100644 --- a/drivers/perf/qcom_l3_pmu.c +++ b/drivers/perf/qcom_l3_pmu.c @@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event *event) return -ENOENT; /* -* There are no per-counter mode filters in the PMU. -*/ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) - return -EINVAL; - - /* * Sampling not supported since these events are not core-attributable. */ if (hwc->sample_period) diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c index 0e31f13..bdc55de 100644 --- a/drivers/perf/xgene_pmu.c +++ b/drivers/perf/xgene_pmu.c @@ -914,11 +914,6 @@ static int xgene_perf_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* SOC counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; /* -- 2.7.4
[PATCH v2 11/20] drivers/perf: perf/core: advertise PMU exclusion capability
The arm_pse PMU has the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- drivers/perf/arm_spe_pmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index 54ec278..a09c38a 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -902,7 +902,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu) spe_pmu->pmu = (struct pmu) { .module = THIS_MODULE, - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE, + .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE | + PERF_PMU_CAP_EXCLUDE, .attr_groups= arm_spe_pmu_attr_groups, /* * We hitch a ride on the software context here, so that -- 2.7.4
[PATCH v2 12/20] mips: perf/core: advertise PMU exclusion capability
The MIPS PMU has the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/mips/kernel/perf_event_mipsxx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c index 4138635..d7813d0 100644 --- a/arch/mips/kernel/perf_event_mipsxx.c +++ b/arch/mips/kernel/perf_event_mipsxx.c @@ -670,6 +670,7 @@ static struct pmu pmu = { .start = mipspmu_start, .stop = mipspmu_stop, .read = mipspmu_read, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event *pev) -- 2.7.4
[PATCH v2 13/20] powerpc: perf/core: advertise PMU exclusion capability
For PowerPC PMUs that have the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/core-fsl-emb.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 81f8a0c..2f44b09 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2007,6 +2007,7 @@ static struct pmu power_pmu = { .commit_txn = power_pmu_commit_txn, .event_idx = power_pmu_event_idx, .sched_task = power_pmu_sched_task, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; /* diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c index ba48584..cea5bcb 100644 --- a/arch/powerpc/perf/core-fsl-emb.c +++ b/arch/powerpc/perf/core-fsl-emb.c @@ -596,6 +596,7 @@ static struct pmu fsl_emb_pmu = { .start = fsl_emb_pmu_start, .stop = fsl_emb_pmu_stop, .read = fsl_emb_pmu_read, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; /* -- 2.7.4
[PATCH v2 14/20] powerpc: perf/core: remove unnecessary checks for exclusion
For PowerPC PMUs that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/powerpc/perf/hv-24x7.c | 9 - arch/powerpc/perf/hv-gpci.c | 9 - arch/powerpc/perf/imc-pmu.c | 18 -- 3 files changed, 36 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 72238ee..d13d8a9 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event) return -EINVAL; } - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - /* no branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 43fabb3..e0ce0e0 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event) return -EINVAL; } - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - /* no branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 1fafc32b..49c0b1c 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event) if (event->hw.sample_period) return -EINVAL; - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; @@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event) if (event->hw.sample_period) return -EINVAL; - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; -- 2.7.4
[PATCH v2 15/20] s390: perf/events: advertise PMU exclusion capability
The s390 cpum_cf and cpum_sf PMUs have the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/s390/kernel/perf_cpum_cf.c | 1 + arch/s390/kernel/perf_cpum_sf.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c index cc085e2..7b583ed 100644 --- a/arch/s390/kernel/perf_cpum_cf.c +++ b/arch/s390/kernel/perf_cpum_cf.c @@ -667,6 +667,7 @@ static struct pmu cpumf_pmu = { .start_txn= cpumf_pmu_start_txn, .commit_txn = cpumf_pmu_commit_txn, .cancel_txn = cpumf_pmu_cancel_txn, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; static int cpumf_pmf_setup(unsigned int cpu, int flags) diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c index 5c53e97..25a64aa 100644 --- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -1885,6 +1885,8 @@ static struct pmu cpumf_sampling = { .setup_aux= aux_buffer_setup, .free_aux = aux_buffer_free, + + .capabilities = PERF_PMU_CAP_EXCLUDE, }; static void cpumf_measurement_alert(struct ext_code ext_code, -- 2.7.4
[PATCH v2 17/20] x86: perf/core: remove unnecessary checks for exclusion
For drivers that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/x86/events/amd/ibs.c | 12 arch/x86/events/amd/power.c| 9 + arch/x86/events/intel/cstate.c | 8 +--- arch/x86/events/intel/rapl.c | 8 +--- arch/x86/events/intel/uncore_snb.c | 8 +--- arch/x86/events/msr.c | 8 +--- 6 files changed, 5 insertions(+), 48 deletions(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index d50bb4d..9e43ef6 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config) return -EOPNOTSUPP; } -static const struct perf_event_attr ibs_notsupp = { - .exclude_user = 1, - .exclude_kernel = 1, - .exclude_hv = 1, - .exclude_idle = 1, - .exclude_host = 1, - .exclude_guest = 1, -}; - static int perf_ibs_init(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; @@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event) if (event->pmu != &perf_ibs->pmu) return -ENOENT; - if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp)) - return -EINVAL; - if (config & ~perf_ibs->config_mask) return -EINVAL; diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c index 2aefacf..ef80c60 100644 --- a/arch/x86/events/amd/power.c +++ b/arch/x86/events/amd/power.c @@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event) return -ENOENT; /* Unsupported modes and filters. */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - /* no sampling */ - event->attr.sample_period) + if (event->attr.sample_period) return -EINVAL; if (cfg != AMD_POWER_EVENTSEL_PKG) diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 9f8084f..af919c4 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event) return -ENOENT; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - event->attr.sample_period) /* no sampling */ + if (event->attr.sample_period) /* no sampling */ return -EINVAL; if (event->cpu < 0) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 32f3e94..9cb94e6 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event) return -EINVAL; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - event->attr.sample_period) /* no sampling */ + if (event->attr.sample_period) /* no sampling */ return -EINVAL; /* must be done before validate_group */ diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c index 8527c3e..26441eb 100644 --- a/arch/x86/events/intel/uncore_snb.c +++ b/arch/x86/events/intel/uncore_snb.c @@ -374,13 +374,7 @@ static int snb_uncore_imc_event_init(struct perf_event *event) return -EINVAL; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - event->attr.sample_period) /* no sampling */ + if (event->attr.sample_period) /* no sampling */ return -EINVAL; /* diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c index b4771a6..c4fa5d4 100644 --- a/arch/x86/events/msr.c +++ b/arch/x86/events/msr.c @@ -160,13 +160,7 @@ s
[PATCH v2 16/20] sparc: perf/core: advertise PMU exclusion capability
The SPARC PMU has the capability to exclude events based on context - let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/sparc/kernel/perf_event.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c index d3149ba..38fac17 100644 --- a/arch/sparc/kernel/perf_event.c +++ b/arch/sparc/kernel/perf_event.c @@ -1571,6 +1571,7 @@ static struct pmu pmu = { .start_txn = sparc_pmu_start_txn, .cancel_txn = sparc_pmu_cancel_txn, .commit_txn = sparc_pmu_commit_txn, + .capabilities = PERF_PMU_CAP_EXCLUDE, }; void perf_event_print_debug(void) -- 2.7.4
[PATCH v2 18/20] x86: perf/core remove unnecessary checks for exclusion
For x86 PMUs that do not support context exclusion we do not advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's remove the now unnecessary check for exclusion flags. This change means that amd/iommu and amd/uncore will now also indicate that they do not support exclude_{hv|idle} and intel/uncore that it does not support exclude_{guest|host}. Signed-off-by: Andrew Murray --- arch/x86/events/amd/iommu.c| 5 - arch/x86/events/amd/uncore.c | 5 - arch/x86/events/intel/uncore.c | 8 3 files changed, 18 deletions(-) diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c index 3210fee..eb35fe4 100644 --- a/arch/x86/events/amd/iommu.c +++ b/arch/x86/events/amd/iommu.c @@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* IOMMU counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index 8671de1..d6ade30 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* NB and Last level cache counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - /* and we do not enable counter overflow interrupts */ hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB; hwc->idx = -1; diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 27a4614..f1d78d9 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event) if (pmu->func_id < 0) return -ENOENT; - /* -* Uncore PMU does measure at all privilege level all the time. -* So it doesn't make sense to specify any exclude bits. -*/ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) - return -EINVAL; - /* Sampling not supported yet */ if (hwc->sample_period) return -EINVAL; -- 2.7.4
[PATCH v2 19/20] x86: perf/core: advertise PMU exclusion capability
For PMUs that have the capability to exclude events based on context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't prevent us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- arch/x86/events/core.c | 2 ++ arch/x86/events/intel/bts.c | 2 +- arch/x86/events/intel/pt.c | 4 +++- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index dfb2f7c..3f51916 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2297,6 +2297,8 @@ static struct pmu pmu = { .event_idx = x86_pmu_event_idx, .sched_task = x86_pmu_sched_task, .task_ctx_size = sizeof(struct x86_perf_task_context), + + .capabilities = PERF_PMU_CAP_EXCLUDE, }; void arch_perf_update_userpage(struct perf_event *event, diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 24ffa1e..4976695 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -601,7 +601,7 @@ static __init int bts_init(void) } bts_pmu.capabilities= PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE | - PERF_PMU_CAP_EXCLUSIVE; + PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_EXCLUDE; bts_pmu.task_ctx_nr = perf_sw_context; bts_pmu.event_init = bts_event_init; bts_pmu.add = bts_event_add; diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 8d016ce..2d811f8 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -1516,7 +1516,9 @@ static __init int pt_init(void) pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_AUX_SW_DOUBLEBUF; - pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE; + pt_pmu.pmu.capabilities = PERF_PMU_CAP_EXCLUSIVE | + PERF_PMU_CAP_ITRACE | + PERF_PMU_CAP_EXCLUDE; pt_pmu.pmu.attr_groups = pt_attr_groups; pt_pmu.pmu.task_ctx_nr = perf_sw_context; pt_pmu.pmu.event_init= pt_event_init; -- 2.7.4
[PATCH v2 20/20] perf/core: remove unused perf_flags
Now that perf_flags is not used we remove it. Signed-off-by: Andrew Murray --- include/uapi/linux/perf_event.h | 2 -- tools/include/uapi/linux/perf_event.h | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ -- 2.7.4
Re: [PATCH v2 03/20] perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs
On Mon, Nov 26, 2018 at 02:10:24PM +, Robin Murphy wrote: > Hi Andrew, > > On 26/11/2018 11:12, Andrew Murray wrote: > > Many PMU drivers do not have the capability to exclude counting events > > that occur in specific contexts such as idle, kernel, guest, etc. These > > drivers indicate this by returning an error in their event_init upon > > testing the events attribute flags. This approach is error prone and > > often inconsistent. > > > > Let's instead allow PMU drivers to advertise their ability to exclude > > based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This > > allows the perf core to reject requests for exclusion events where > > there is no support in the PMU. > > > > Signed-off-by: Andrew Murray > > --- > > include/linux/perf_event.h | 1 + > > kernel/events/core.c | 9 + > > 2 files changed, 10 insertions(+) > > > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index b2e806f..69b3d65 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -244,6 +244,7 @@ struct perf_event; > > #define PERF_PMU_CAP_EXCLUSIVE0x10 > > #define PERF_PMU_CAP_ITRACE 0x20 > > #define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40 > > +#define PERF_PMU_CAP_EXCLUDE 0x80 > > /** > >* struct pmu - generic performance monitoring unit > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 5a97f34..9afb33c 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, > > struct perf_event *event) > > if (ctx) > > perf_event_ctx_unlock(event->group_leader, ctx); > > + if (!ret) { > > + if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) && > > + event_has_any_exclude_flag(event)) { > > Technically this is a bisection-breaker, since no driver has this capability > yet - ideally, this patch should come after all the ones introducing it to > the relevant drivers (with the removal of the now-redundant code from the > other drivers at the end). Indeed. Thought it is possible to first introduce the capability, update the relevant drivers to advertise it, then add the change to core.c and finally remove the unnecessary error checks as a result of using the new capability. This approach could be bisection-proof. > > Alternatively, since we already have several other negative capabilities, > unless there's a strong feeling against adding any more then it might work > out simpler to flip it to PERF_PMU_CAP_NO_EXCLUDE, such that we only need to > introduce the core check then directly replace the open-coded event checks > with the capability in the appropriate drivers, and need not touch the > exclusion-supporting ones at all. This would certaintly be less risky and invasive (e.g. compare the number of files touched between this v2 and the previous v1). I'm happy with either approach. Thanks, Andrew Murray > > Robin. > > > + if (event->destroy) > > + event->destroy(event); > > + ret = -EINVAL; > > + } > > + } > > + > > if (ret) > > module_put(pmu->module); > >
[PATCH v3 00/12] perf/core: Generalise event exclusion checking
Many PMU drivers do not have the capability to exclude counting events that occur in specific contexts such as idle, kernel, guest, etc. These drivers indicate this by returning an error in their event_init upon testing the events attribute flags. However this approach requires that each time a new event modifier is added to perf, all the perf drivers need to be modified to indicate that they don't support the attribute. This results in additional boiler-plate code common to many drivers that needs to be maintained. Furthermore the drivers are not consistent with regards to the error value they return when reporting unsupported attributes. This patchset allow PMU drivers to advertise their inability to exclude based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This allows the perf core to reject requests for exclusion events where there is no support in the PMU. This is a functional change, in particular: - Some drivers will now additionally (but correctly) report unsupported exclusion flags. It's typical for existing userspace tools such as perf to handle such errors by retrying the system call without the unsupported flags. - Drivers that do not support any exclusion that previously reported -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent with the majority and results in userspace perf retrying without exclusion. All drivers touched by this patchset have been compile tested. Changes from v2: - Invert logic from CAP_EXCLUDE to CAP_NO_EXCLUDE Changes from v1: - Changed approach from explicitly rejecting events in unsupporting PMU drivers to explicitly advertising a capability in PMU drivers that do support exclusion events - Added additional information to tools/perf/design.txt - Rename event_has_exclude_flags to event_has_any_exclude_flag and update commit log to reflect it's a function Andrew Murray (12): perf/doc: update design.txt for exclude_{host|guest} flags perf/core: add function to test for event exclusion flags perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs perf/core: remove unused perf_flags arch/alpha/kernel/perf_event.c| 7 +-- arch/arm/mach-imx/mmdc.c | 9 ++--- arch/arm/mm/cache-l2x0-pmu.c | 9 + arch/powerpc/perf/hv-24x7.c | 10 +- arch/powerpc/perf/hv-gpci.c | 10 +- arch/powerpc/perf/imc-pmu.c | 19 +-- arch/x86/events/amd/ibs.c | 13 + arch/x86/events/amd/iommu.c | 6 +- arch/x86/events/amd/power.c | 10 ++ arch/x86/events/amd/uncore.c | 7 ++- arch/x86/events/intel/cstate.c| 12 +++- arch/x86/events/intel/rapl.c | 9 ++--- arch/x86/events/intel/uncore.c| 9 + arch/x86/events/intel/uncore_snb.c| 9 ++--- arch/x86/events/msr.c | 10 ++ drivers/perf/arm-cci.c| 10 +- drivers/perf/arm-ccn.c| 6 ++ drivers/perf/arm_dsu_pmu.c| 9 ++--- drivers/perf/arm_pmu.c| 15 +-- drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 - drivers/perf/qcom_l2_pmu.c| 9 + drivers/perf/qcom_l3_pmu.c| 8 +--- drivers/perf/xgene_pmu.c | 6 +- include/linux/perf_event.h| 10 ++ include/uapi/linux/perf_event.h | 2 -- kernel/events/core.c | 9 + tools/include/uapi/linux/perf_event.h | 2 -- tools/perf/design.txt | 4 31 files changed, 62 insertions(+), 189 deletions(-) -- 2.7.4
[PATCH v3 01/12] perf/doc: update design.txt for exclude_{host|guest} flags
Update design.txt to reflect the presence of the exclude_host and exclude_guest perf flags. Signed-off-by: Andrew Murray --- tools/perf/design.txt | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/design.txt b/tools/perf/design.txt index a28dca2..0453ba2 100644 --- a/tools/perf/design.txt +++ b/tools/perf/design.txt @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a way to request that counting of events be restricted to times when the CPU is in user, kernel and/or hypervisor mode. +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way +to request counting of events restricted to guest and host contexts when +using Linux as the hypervisor. + The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap operations, these can be used to relate userspace IP addresses to actual code, even after the mapping (or even the whole process) is gone, -- 2.7.4
[PATCH v3 02/12] perf/core: add function to test for event exclusion flags
Add a function that tests if any of the perf event exclusion flags are set on a given event. Signed-off-by: Andrew Murray --- include/linux/perf_event.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 53c500f..b2e806f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); +static inline bool event_has_any_exclude_flag(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + + return attr->exclude_idle || attr->exclude_user || + attr->exclude_kernel || attr->exclude_hv || + attr->exclude_guest || attr->exclude_host; +} + static inline bool is_sampling_event(struct perf_event *event) { return event->attr.sample_period != 0; -- 2.7.4
[PATCH v3 03/12] perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
Many PMU drivers do not have the capability to exclude counting events that occur in specific contexts such as idle, kernel, guest, etc. These drivers indicate this by returning an error in their event_init upon testing the events attribute flags. This approach is error prone and often inconsistent. Let's instead allow PMU drivers to advertise their inability to exclude based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This allows the perf core to reject requests for exclusion events where there is no support in the PMU. Signed-off-by: Andrew Murray --- include/linux/perf_event.h | 1 + kernel/events/core.c | 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b2e806f..fe92b89 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -244,6 +244,7 @@ struct perf_event; #define PERF_PMU_CAP_EXCLUSIVE 0x10 #define PERF_PMU_CAP_ITRACE0x20 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40 +#define PERF_PMU_CAP_NO_EXCLUDE0x80 /** * struct pmu - generic performance monitoring unit diff --git a/kernel/events/core.c b/kernel/events/core.c index 5a97f34..5113cfc 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) if (ctx) perf_event_ctx_unlock(event->group_leader, ctx); + if (!ret) { + if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE && + event_has_any_exclude_flag(event)) { + if (event->destroy) + event->destroy(event); + ret = -EINVAL; + } + } + if (ret) module_put(pmu->module); -- 2.7.4
[PATCH v3 04/12] alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE
As the Alpha PMU doesn't support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. This change means that __hw_perf_event_init will now also indicate that it doesn't support exclude_host and exclude_guest and will now implicitly return -EINVAL instead of -EPERM. This is likely more desirable as -EPERM will result in a kernel.perf_event_paranoid related warning from the perf userspace utility. Signed-off-by: Andrew Murray --- arch/alpha/kernel/perf_event.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c index 5613aa37..4341ccf 100644 --- a/arch/alpha/kernel/perf_event.c +++ b/arch/alpha/kernel/perf_event.c @@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event) return ev; } - /* The EV67 does not support mode exclusion */ - if (attr->exclude_kernel || attr->exclude_user - || attr->exclude_hv || attr->exclude_idle) { - return -EPERM; - } - /* * We place the event type in event_base here and leave calculation * of the codes to programme the PMU for alpha_pmu_enable() because @@ -771,6 +765,7 @@ static struct pmu pmu = { .start = alpha_pmu_start, .stop = alpha_pmu_stop, .read = alpha_pmu_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; -- 2.7.4
[PATCH v3 05/12] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
The ARM PMU driver can be used to represent a variety of ARM based PMUs. Some of these PMUs do not provide support for context exclusion, where this is the case we advertise the PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us from handling events where any exclusion flags are set. Signed-off-by: Andrew Murray --- drivers/perf/arm_pmu.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 7f01f6f..ea69067 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) } static int -event_requires_mode_exclusion(struct perf_event_attr *attr) -{ - return attr->exclude_idle || attr->exclude_user || - attr->exclude_kernel || attr->exclude_hv; -} - -static int __hw_perf_event_init(struct perf_event *event) { struct arm_pmu *armpmu = to_arm_pmu(event->pmu); @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event) /* * Check whether we need to exclude the counter from certain modes. */ - if ((!armpmu->set_event_filter || -armpmu->set_event_filter(hwc, &event->attr)) && -event_requires_mode_exclusion(&event->attr)) { + if (armpmu->set_event_filter && + armpmu->set_event_filter(hwc, &event->attr)) { pr_debug("ARM performance counters do not support " "mode exclusion\n"); return -EOPNOTSUPP; @@ -861,6 +853,9 @@ int armpmu_register(struct arm_pmu *pmu) if (ret) return ret; + if (!pmu->set_event_filter) + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE; + ret = perf_pmu_register(&pmu->pmu, pmu->name, -1); if (ret) goto out_destroy; -- 2.7.4
[PATCH v3 06/12] arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For drivers that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/arm/mach-imx/mmdc.c | 9 ++--- arch/arm/mm/cache-l2x0-pmu.c | 9 + 2 files changed, 3 insertions(+), 15 deletions(-) diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c index 04b3bf7..3453838 100644 --- a/arch/arm/mach-imx/mmdc.c +++ b/arch/arm/mach-imx/mmdc.c @@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest || - event->attr.sample_period) + if (event->attr.sample_period) return -EINVAL; if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS) @@ -455,6 +449,7 @@ static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc, .start = mmdc_pmu_event_start, .stop = mmdc_pmu_event_stop, .read = mmdc_pmu_event_update, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }, .mmdc_base = mmdc_base, .dev = dev, diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c index afe5b4c..99bcd07 100644 --- a/arch/arm/mm/cache-l2x0-pmu.c +++ b/arch/arm/mm/cache-l2x0-pmu.c @@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event) event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; @@ -544,6 +536,7 @@ static __init int l2x0_pmu_init(void) .del = l2x0_pmu_event_del, .event_init = l2x0_pmu_event_init, .attr_groups = l2x0_pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; l2x0_pmu_reset(); -- 2.7.4
[PATCH v3 07/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For drivers that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- drivers/perf/arm-cci.c| 10 +- drivers/perf/arm-ccn.c| 6 ++ drivers/perf/arm_dsu_pmu.c| 9 ++--- drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 + drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 - 7 files changed, 8 insertions(+), 29 deletions(-) diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c index 1bfeb16..bfd03e0 100644 --- a/drivers/perf/arm-cci.c +++ b/drivers/perf/arm-cci.c @@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EOPNOTSUPP; - /* We have no filtering of any kind */ - if (event->attr.exclude_user|| - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle|| - event->attr.exclude_host|| - event->attr.exclude_guest) - return -EINVAL; - /* * Following the example set by other "uncore" PMUs, we accept any CPU * and rewrite its affinity dynamically rather than having perf core @@ -1433,6 +1424,7 @@ static int cci_pmu_init(struct cci_pmu *cci_pmu, struct platform_device *pdev) .stop = cci_pmu_stop, .read = pmu_read, .attr_groups= pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; cci_pmu->plat_device = pdev; diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c index 7dd850e..2ae7602 100644 --- a/drivers/perf/arm-ccn.c +++ b/drivers/perf/arm-ccn.c @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; } - if (has_branch_stack(event) || event->attr.exclude_user || - event->attr.exclude_kernel || event->attr.exclude_hv || - event->attr.exclude_idle || event->attr.exclude_host || - event->attr.exclude_guest) { + if (has_branch_stack(event)) { dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); return -EINVAL; } @@ -1290,6 +1287,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) .read = arm_ccn_pmu_event_read, .pmu_enable = arm_ccn_pmu_enable, .pmu_disable = arm_ccn_pmu_disable, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; /* No overflow interrupt? Have to use a timer instead. */ diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c index 660cb8a..5851de5 100644 --- a/drivers/perf/arm_dsu_pmu.c +++ b/drivers/perf/arm_dsu_pmu.c @@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event) return -EINVAL; } - if (has_branch_stack(event) || - event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) { + if (has_branch_stack(event)) { dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n"); return -EINVAL; } @@ -735,6 +729,7 @@ static int dsu_pmu_device_probe(struct platform_device *pdev) .read = dsu_pmu_read, .attr_groups= dsu_pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; rc = perf_pmu_register(&dsu_pmu->pmu, name, -1); diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c index 1b10ea0..296fef8 100644 --- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c @@ -396,6 +396,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev) .stop = hisi_uncore_pmu_stop, .read = hisi_uncore_pmu_read, .attr_groups= hisi_ddrc_pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; ret = perf_pmu_register(&ddrc_pmu->pmu, name, -1); diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c index 443906e..2553a84 100644 --- a/drivers/perf/hisil
[PATCH v3 08/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For drivers that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. This change means that qcom_{l2|l3}_pmu will now also indicate that they do not support exclude_{host|guest} and that xgene_pmu does not also support exclude_idle and exclude_hv. Note that for qcom_l2_pmu we now implictly return -EINVAL instead of -EOPNOTSUPP. This change will result in the perf userspace utility retrying the perf_event_open system call with fallback event attributes that do not fail. Signed-off-by: Andrew Murray --- drivers/perf/qcom_l2_pmu.c | 9 + drivers/perf/qcom_l3_pmu.c | 8 +--- drivers/perf/xgene_pmu.c | 6 +- 3 files changed, 3 insertions(+), 20 deletions(-) diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c index 842135c..091b4d7 100644 --- a/drivers/perf/qcom_l2_pmu.c +++ b/drivers/perf/qcom_l2_pmu.c @@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event) return -EOPNOTSUPP; } - /* We cannot filter accurately so we just don't allow it. */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) { - dev_dbg_ratelimited(&l2cache_pmu->pdev->dev, - "Can't exclude execution levels\n"); - return -EOPNOTSUPP; - } - if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) || ((event->attr.config & ~L2_EVT_MASK) != 0)) && (event->attr.config != L2CYCLE_CTR_RAW_CODE)) { @@ -982,6 +974,7 @@ static int l2_cache_pmu_probe(struct platform_device *pdev) .stop = l2_cache_event_stop, .read = l2_cache_event_read, .attr_groups= l2_cache_pmu_attr_grps, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; l2cache_pmu->num_counters = get_num_counters(); diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c index 2dc63d6..5d70646 100644 --- a/drivers/perf/qcom_l3_pmu.c +++ b/drivers/perf/qcom_l3_pmu.c @@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event *event) return -ENOENT; /* -* There are no per-counter mode filters in the PMU. -*/ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) - return -EINVAL; - - /* * Sampling not supported since these events are not core-attributable. */ if (hwc->sample_period) @@ -777,6 +770,7 @@ static int qcom_l3_cache_pmu_probe(struct platform_device *pdev) .read = qcom_l3_cache__event_read, .attr_groups= qcom_l3_cache_pmu_attr_grps, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; memrc = platform_get_resource(pdev, IORESOURCE_MEM, 0); diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c index 0e31f13..dad6169 100644 --- a/drivers/perf/xgene_pmu.c +++ b/drivers/perf/xgene_pmu.c @@ -914,11 +914,6 @@ static int xgene_perf_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* SOC counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; /* @@ -1133,6 +1128,7 @@ static int xgene_init_perf(struct xgene_pmu_dev *pmu_dev, char *name) .start = xgene_perf_start, .stop = xgene_perf_stop, .read = xgene_perf_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; /* Hardware counter init */ -- 2.7.4
[PATCH v3 09/12] powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For PowerPC PMUs that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/powerpc/perf/hv-24x7.c | 10 +- arch/powerpc/perf/hv-gpci.c | 10 +- arch/powerpc/perf/imc-pmu.c | 19 +-- 3 files changed, 3 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 72238ee..d2b8e60 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event) return -EINVAL; } - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - /* no branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; @@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = { .start_txn = h_24x7_event_start_txn, .commit_txn = h_24x7_event_commit_txn, .cancel_txn = h_24x7_event_cancel_txn, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static int hv_24x7_init(void) diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 43fabb3..735e77b 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event) return -EINVAL; } - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - /* no branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; @@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = { .start = h_gpci_event_start, .stop= h_gpci_event_stop, .read= h_gpci_event_update, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static int hv_gpci_init(void) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 1fafc32b..1dbb0ee 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event) if (event->hw.sample_period) return -EINVAL; - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; @@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event) if (event->hw.sample_period) return -EINVAL; - /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; @@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu) pmu->pmu.stop = imc_event_stop; pmu->pmu.read = imc_event_update; pmu->pmu.attr_groups = pmu->attr_groups; + pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE; pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group; switch (pmu->domain) { -- 2.7.4
[PATCH v3 10/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For drivers that do not support context exclusion let's advertise the PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray --- arch/x86/events/amd/ibs.c | 13 + arch/x86/events/amd/power.c| 10 ++ arch/x86/events/intel/cstate.c | 12 +++- arch/x86/events/intel/rapl.c | 9 ++--- arch/x86/events/intel/uncore_snb.c | 9 ++--- arch/x86/events/msr.c | 10 ++ 6 files changed, 12 insertions(+), 51 deletions(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index d50bb4d..62f317c 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config) return -EOPNOTSUPP; } -static const struct perf_event_attr ibs_notsupp = { - .exclude_user = 1, - .exclude_kernel = 1, - .exclude_hv = 1, - .exclude_idle = 1, - .exclude_host = 1, - .exclude_guest = 1, -}; - static int perf_ibs_init(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; @@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event) if (event->pmu != &perf_ibs->pmu) return -ENOENT; - if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp)) - return -EINVAL; - if (config & ~perf_ibs->config_mask) return -EINVAL; @@ -537,6 +525,7 @@ static struct perf_ibs perf_ibs_fetch = { .start = perf_ibs_start, .stop = perf_ibs_stop, .read = perf_ibs_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }, .msr= MSR_AMD64_IBSFETCHCTL, .config_mask= IBS_FETCH_CONFIG_MASK, diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c index 2aefacf..c5ff084 100644 --- a/arch/x86/events/amd/power.c +++ b/arch/x86/events/amd/power.c @@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event) return -ENOENT; /* Unsupported modes and filters. */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - /* no sampling */ - event->attr.sample_period) + if (event->attr.sample_period) return -EINVAL; if (cfg != AMD_POWER_EVENTSEL_PKG) @@ -226,6 +219,7 @@ static struct pmu pmu_class = { .start = pmu_event_start, .stop = pmu_event_stop, .read = pmu_event_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static int power_cpu_exit(unsigned int cpu) diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 9f8084f..15a1981 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event) return -ENOENT; /* unsupported modes and filters */ - if (event->attr.exclude_user || - event->attr.exclude_kernel || - event->attr.exclude_hv || - event->attr.exclude_idle || - event->attr.exclude_host || - event->attr.exclude_guest || - event->attr.sample_period) /* no sampling */ + if (event->attr.sample_period) /* no sampling */ return -EINVAL; if (event->cpu < 0) @@ -437,7 +431,7 @@ static struct pmu cstate_core_pmu = { .start = cstate_pmu_event_start, .stop = cstate_pmu_event_stop, .read = cstate_pmu_event_update, - .capabilities = PERF_PMU_CAP_NO_INTERRUPT, + .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE, .module = THIS_MODULE, }; @@ -451,7 +445,7 @@ static struct pmu cstate_pkg_pmu = { .start = cstate_pmu_event_start, .stop = cstate_pmu_event_stop, .read = cstate_pmu_event_update, - .capabilities = PERF_PMU_CAP_NO_INTERRUPT, + .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE, .module = THIS_MODULE, }; diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 32f3e94..18a5628 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event) return -EINVAL; /* unsupporte
[PATCH v3 11/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
For x86 PMUs that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. This change means that amd/iommu and amd/uncore will now also indicate that they do not support exclude_{hv|idle} and intel/uncore that it does not support exclude_{guest|host}. Signed-off-by: Andrew Murray --- arch/x86/events/amd/iommu.c| 6 +- arch/x86/events/amd/uncore.c | 7 ++- arch/x86/events/intel/uncore.c | 9 + 3 files changed, 4 insertions(+), 18 deletions(-) diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c index 3210fee..7635c23 100644 --- a/arch/x86/events/amd/iommu.c +++ b/arch/x86/events/amd/iommu.c @@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* IOMMU counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - if (event->cpu < 0) return -EINVAL; @@ -414,6 +409,7 @@ static const struct pmu iommu_pmu __initconst = { .read = perf_iommu_read, .task_ctx_nr= perf_invalid_context, .attr_groups= amd_iommu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static __init int init_one_iommu(unsigned int idx) diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index 8671de1..988cb9c 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event) if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) return -EINVAL; - /* NB and Last level cache counters do not have usr/os/guest/host bits */ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_host || event->attr.exclude_guest) - return -EINVAL; - /* and we do not enable counter overflow interrupts */ hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB; hwc->idx = -1; @@ -307,6 +302,7 @@ static struct pmu amd_nb_pmu = { .start = amd_uncore_start, .stop = amd_uncore_stop, .read = amd_uncore_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static struct pmu amd_llc_pmu = { @@ -317,6 +313,7 @@ static struct pmu amd_llc_pmu = { .start = amd_uncore_start, .stop = amd_uncore_stop, .read = amd_uncore_read, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; static struct amd_uncore *amd_uncore_alloc(unsigned int cpu) diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 27a4614..d516161 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event) if (pmu->func_id < 0) return -ENOENT; - /* -* Uncore PMU does measure at all privilege level all the time. -* So it doesn't make sense to specify any exclude bits. -*/ - if (event->attr.exclude_user || event->attr.exclude_kernel || - event->attr.exclude_hv || event->attr.exclude_idle) - return -EINVAL; - /* Sampling not supported yet */ if (hwc->sample_period) return -EINVAL; @@ -800,6 +792,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu) .stop = uncore_pmu_event_stop, .read = uncore_pmu_event_read, .module = THIS_MODULE, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; } else { pmu->pmu = *pmu->type->pmu; -- 2.7.4
[PATCH v3 12/12] perf/core: remove unused perf_flags
Now that perf_flags is not used we remove it. Signed-off-by: Andrew Murray --- include/uapi/linux/perf_event.h | 2 -- tools/include/uapi/linux/perf_event.h | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index f35eb72..ba89bd3 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -445,8 +445,6 @@ struct perf_event_query_bpf { __u32 ids[0]; }; -#define perf_flags(attr) (*(&(attr)->read_format + 1)) - /* * Ioctls that can be done on a perf event fd: */ -- 2.7.4
Re: [PATCH v3 00/12] perf/core: Generalise event exclusion checking
On Fri, Dec 07, 2018 at 05:25:17PM +, Will Deacon wrote: > On Thu, Dec 06, 2018 at 04:47:17PM +0000, Andrew Murray wrote: > > Many PMU drivers do not have the capability to exclude counting events > > that occur in specific contexts such as idle, kernel, guest, etc. These > > drivers indicate this by returning an error in their event_init upon > > testing the events attribute flags. > > > > However this approach requires that each time a new event modifier is > > added to perf, all the perf drivers need to be modified to indicate that > > they don't support the attribute. This results in additional boiler-plate > > code common to many drivers that needs to be maintained. Furthermore the > > drivers are not consistent with regards to the error value they return > > when reporting unsupported attributes. > > > > This patchset allow PMU drivers to advertise their inability to exclude > > based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This > > allows the perf core to reject requests for exclusion events where there > > is no support in the PMU. > > > > This is a functional change, in particular: > > > > - Some drivers will now additionally (but correctly) report unsupported > >exclusion flags. It's typical for existing userspace tools such as > >perf to handle such errors by retrying the system call without the > >unsupported flags. > > > > - Drivers that do not support any exclusion that previously reported > >-EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent > >with the majority and results in userspace perf retrying without > >exclusion. > > > > All drivers touched by this patchset have been compile tested. > > For the bits under arch/arm/ and drivers/perf: > > Acked-by: Will Deacon > > Note that I've queued the TX2 uncore PMU for 4.21 [1], which could also > benefit from your new flag. Ah thanks for pointing this out, I'll send a patch in due course. Thanks, Andrwe Murray > > Will > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=for-next/perf&id=69c32972d59388c041268e8206e8eb1acff29b9a
Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags
On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote: > [ Reviving old thread. ] > > Andrew Murray writes: > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote: > >> Andrew Murray writes: > >> > >> > Update design.txt to reflect the presence of the exclude_host > >> > and exclude_guest perf flags. > >> > > >> > Signed-off-by: Andrew Murray > >> > --- > >> > tools/perf/design.txt | 4 > >> > 1 file changed, 4 insertions(+) > >> > > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt > >> > index a28dca2..7de7d83 100644 > >> > --- a/tools/perf/design.txt > >> > +++ b/tools/perf/design.txt > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and > >> > 'exclude_hv' bits provide a > >> > way to request that counting of events be restricted to times when the > >> > CPU is in user, kernel and/or hypervisor mode. > >> > > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way > >> > +to request counting of events restricted to guest and host contexts when > >> > +using virtualisation. > >> > >> How does exclude_host differ from exclude_hv ? > > > > I believe exclude_host / exclude_guest are intented to distinguish > > between host and guest in the hosted hypervisor context (KVM). > > OK yeah, from the perf-list man page: > >u - user-space counting >k - kernel counting >h - hypervisor counting >I - non idle counting >G - guest counting (in KVM guests) >H - host counting (not in KVM guests) > > > Whereas exclude_hv allows to distinguish between guest and > > hypervisor in the bare-metal type hypervisors. > > Except that's exactly not how we use them on powerpc :) > > We use exclude_hv to exclude "the hypervisor", regardless of whether > it's KVM or PowerVM (which is a bare-metal hypervisor). > > We don't use exclude_host / exclude_guest at all, which I guess is a > bug, except I didn't know they existed until this thread. > > eg, in a KVM guest: > > $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done" > $ perf report -D | grep -Fc "dso: [hypervisor]" > 16 > > > > In the case of arm64 - if VHE extensions are present then the host > > kernel will run at a higher privilege to the guest kernel, in which > > case there is no distinction between hypervisor and host so we ignore > > exclude_hv. But where VHE extensions are not present then the host > > kernel runs at the same privilege level as the guest and we use a > > higher privilege level to switch between them - in this case we can > > use exclude_hv to discount that hypervisor role of switching between > > guests. > > I couldn't find any arm64 perf code using exclude_host/guest at all? Correct - but this is in flight as I am currently adding support for this see [1]. > > And I don't see any x86 code using exclude_hv. I can't find any either. > > But maybe that's OK, I just worry this is confusing for users. There is some extra context regarding this where exclude_guest/exclude_host was added, see [2] and where exclude_hv was added, see [3] Generally it seems that exclude_guest/exclude_host relies upon switching counters off/on on guest/host switch code (which works well in the nested virt case). Whereas exclude_hv tends to rely solely on hardware capability based on privilege level (which works well in the bare metal case where the guest doesn't run at same privilege as the host). I think from the user perspective exclude_hv allows you to see your overhead if you are a guest (i.e. work done by bare metal hypervisor associated with you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to see events above you (i.e. the kernel hypervisor) if you are the guest... At least that's how I read this, I've copied in others that may provide more authoritative feedback. [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html [2] https://www.spinics.net/lists/kvm/msg53996.html [3] https://lore.kernel.org/patchwork/patch/143918/ Thanks, Andrew Murray > > cheers
Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags
On Wed, Dec 12, 2018 at 09:07:42AM +0100, Christoffer Dall wrote: > On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote: > > On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote: > > > [ Reviving old thread. ] > > > > > > Andrew Murray writes: > > > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote: > > > >> Andrew Murray writes: > > > >> > > > >> > Update design.txt to reflect the presence of the exclude_host > > > >> > and exclude_guest perf flags. > > > >> > > > > >> > Signed-off-by: Andrew Murray > > > >> > --- > > > >> > tools/perf/design.txt | 4 > > > >> > 1 file changed, 4 insertions(+) > > > >> > > > > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt > > > >> > index a28dca2..7de7d83 100644 > > > >> > --- a/tools/perf/design.txt > > > >> > +++ b/tools/perf/design.txt > > > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and > > > >> > 'exclude_hv' bits provide a > > > >> > way to request that counting of events be restricted to times when > > > >> > the > > > >> > CPU is in user, kernel and/or hypervisor mode. > > > >> > > > > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a > > > >> > way > > > >> > +to request counting of events restricted to guest and host contexts > > > >> > when > > > >> > +using virtualisation. > > > >> > > > >> How does exclude_host differ from exclude_hv ? > > > > > > > > I believe exclude_host / exclude_guest are intented to distinguish > > > > between host and guest in the hosted hypervisor context (KVM). > > > > > > OK yeah, from the perf-list man page: > > > > > >u - user-space counting > > >k - kernel counting > > >h - hypervisor counting > > >I - non idle counting > > >G - guest counting (in KVM guests) > > >H - host counting (not in KVM guests) > > > > > > > Whereas exclude_hv allows to distinguish between guest and > > > > hypervisor in the bare-metal type hypervisors. > > > > > > Except that's exactly not how we use them on powerpc :) > > > > > > We use exclude_hv to exclude "the hypervisor", regardless of whether > > > it's KVM or PowerVM (which is a bare-metal hypervisor). > > > > > > We don't use exclude_host / exclude_guest at all, which I guess is a > > > bug, except I didn't know they existed until this thread. > > > > > > eg, in a KVM guest: > > > > > > $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done" > > > $ perf report -D | grep -Fc "dso: [hypervisor]" > > > 16 > > > > > > > > > > In the case of arm64 - if VHE extensions are present then the host > > > > kernel will run at a higher privilege to the guest kernel, in which > > > > case there is no distinction between hypervisor and host so we ignore > > > > exclude_hv. But where VHE extensions are not present then the host > > > > kernel runs at the same privilege level as the guest and we use a > > > > higher privilege level to switch between them - in this case we can > > > > use exclude_hv to discount that hypervisor role of switching between > > > > guests. > > > > > > I couldn't find any arm64 perf code using exclude_host/guest at all? > > > > Correct - but this is in flight as I am currently adding support for this > > see [1]. > > > > > > > > And I don't see any x86 code using exclude_hv. > > > > I can't find any either. > > > > > > > > But maybe that's OK, I just worry this is confusing for users. > > > > There is some extra context regarding this where exclude_guest/exclude_host > > was added, see [2] and where exclude_hv was added, see [3] > > > > Generally it seems that exclude_guest/exclude_host relies upon switching > > counters off/on on guest/host switch code (which works well in the nested > > virt case). Wherea
Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC
-designware.c > b/drivers/pci/controller/dwc/pcie-designware.c > index 7d25102..c99cee4 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -158,6 +158,43 @@ static void dw_pcie_writel_ob_unroll(struct dw_pcie > *pci, u32 index, u32 reg, > dw_pcie_writel_atu(pci, offset + reg, val); > } > > +static void dw_pcie_prog_ep_outbound_atu_unroll(struct dw_pcie *pci, u8 > func_no, > + int index, int type, > + u64 cpu_addr, u64 pci_addr, > + u32 size) > +{ > + u32 retries, val; > + > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_LOWER_BASE, > + lower_32_bits(cpu_addr)); > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_BASE, > + upper_32_bits(cpu_addr)); > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_LIMIT, > + lower_32_bits(cpu_addr + size - 1)); > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_LOWER_TARGET, > + lower_32_bits(pci_addr)); > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_TARGET, > + upper_32_bits(pci_addr)); > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, > + type | PCIE_ATU_FUNC_NUM(func_no)); With the exception of this line, the rest of this function is identical to dw_pcie_prog_outbound_atu_unroll. > + dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2, > + PCIE_ATU_ENABLE); > + > + /* > + * Make sure ATU enable takes effect before any subsequent config > + * and I/O accesses. > + */ > + for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES; retries++) { > + val = dw_pcie_readl_ob_unroll(pci, index, > + PCIE_ATU_UNR_REGION_CTRL2); > + if (val & PCIE_ATU_ENABLE) > + return; > + > + mdelay(LINK_WAIT_IATU); > + } > + dev_err(pci->dev, "Outbound iATU is not being enabled\n"); > +} > + > static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index, >int type, u64 cpu_addr, >u64 pci_addr, u32 size) > @@ -194,6 +231,51 @@ static void dw_pcie_prog_outbound_atu_unroll(struct > dw_pcie *pci, int index, > dev_err(pci->dev, "Outbound iATU is not being enabled\n"); > } > > +void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8 func_no, int index, > + int type, u64 cpu_addr, u64 pci_addr, > + u32 size) > +{ > + u32 retries, val; > + > + if (pci->ops->cpu_addr_fixup) > + cpu_addr = pci->ops->cpu_addr_fixup(pci, cpu_addr); > + > + if (pci->iatu_unroll_enabled) { > + dw_pcie_prog_ep_outbound_atu_unroll(pci, func_no, index, type, > + cpu_addr, pci_addr, size); > + return; > + } > + > + dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT, > +PCIE_ATU_REGION_OUTBOUND | index); > + dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_BASE, > +lower_32_bits(cpu_addr)); > + dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_BASE, > +upper_32_bits(cpu_addr)); > + dw_pcie_writel_dbi(pci, PCIE_ATU_LIMIT, > +lower_32_bits(cpu_addr + size - 1)); > + dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_TARGET, > +lower_32_bits(pci_addr)); > + dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET, > +upper_32_bits(pci_addr)); > + dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type | > +PCIE_ATU_FUNC_NUM(func_no)); The same here, this is identical to dw_pcie_prog_outbound_atu with the exception of this line. Is there a way you can avoid all of this duplicated code? Thanks, Andrew Murray > + dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE); > + > + /* > + * Make sure ATU enable takes effect before any subsequent config > + * and I/O accesses. > + */ > + for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES; retries++) { > + val = dw_pcie_readl_dbi(pci, PCIE_ATU_CR2); > + if (val & PCIE_ATU_ENABLE) > + return; > + > + mdelay(LINK_WAIT_IATU); > + } > + d
Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote: > Add the doorbell mode of MSI-X in EP mode. > > Signed-off-by: Xiaowei Bao > --- > drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++ > drivers/pci/controller/dwc/pcie-designware.h| 14 ++ > 2 files changed, 28 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > b/drivers/pci/controller/dwc/pcie-designware-ep.c > index 75e2955..e3a7cdf 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 > func_no, > return 0; > } > > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no, > +u16 interrupt_num) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + u32 msg_data; > + > + msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) | > +(interrupt_num - 1); > + > + dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data); > + > + return 0; > +} > + > int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no, > u16 interrupt_num) Have I understood correctly that the hardware provides an alternative mechanism that allows for raising MSI-X interrupts without the bother of reading the capabilities registers? If so is there any good reason to keep dw_pcie_ep_raise_msix_irq? (And thus use it in dw_plat_pcie_ep_raise_irq also)? > { > diff --git a/drivers/pci/controller/dwc/pcie-designware.h > b/drivers/pci/controller/dwc/pcie-designware.h > index 2b291e8..cd903e9 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.h > +++ b/drivers/pci/controller/dwc/pcie-designware.h > @@ -88,6 +88,11 @@ > #define PCIE_MISC_CONTROL_1_OFF 0x8BC > #define PCIE_DBI_RO_WR_ENBIT(0) > > +#define PCIE_MSIX_DOORBELL 0x948 > +#define PCIE_MSIX_DOORBELL_PF_SHIFT 24 > +#define PCIE_MSIX_DOORBELL_VF_SHIFT 16 > +#define PCIE_MSIX_DOORBELL_VF_ACTIVE BIT(15) The _VF defines are not used, I'd suggest removing them. Thanks, Andrew Murray > + > /* > * iATU Unroll-specific register definitions > * From 4.80 core version the address translation will be made by unroll > @@ -399,6 +404,8 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 > func_no, >u8 interrupt_num); > int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no, >u16 interrupt_num); > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no, > +u16 interrupt_num); > void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar); > #else > static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) > @@ -431,6 +438,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct > dw_pcie_ep *ep, u8 func_no, > return 0; > } > > +static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, > + u8 func_no, > + u16 interrupt_num) > +{ > + return 0; > +} > + > static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno > bar) > { > } > -- > 2.9.5 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting capability with different PEX
On Thu, Aug 15, 2019 at 04:37:11PM +0800, Xiaowei Bao wrote: > The different PCIe controller in one board may be have different > capability of MSI or MSIX, so change the way of getting the MSI > capability, make it more flexible. > > Signed-off-by: Xiaowei Bao > --- > drivers/pci/controller/dwc/pci-layerscape-ep.c | 28 > +++--- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > index be61d96..9404ca0 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > @@ -22,6 +22,7 @@ > > struct ls_pcie_ep { > struct dw_pcie *pci; > + struct pci_epc_features *ls_epc; > }; > > #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev) > @@ -40,25 +41,26 @@ static const struct of_device_id ls_pcie_ep_of_match[] = { > { }, > }; > > -static const struct pci_epc_features ls_pcie_epc_features = { > - .linkup_notifier = false, > - .msi_capable = true, > - .msix_capable = false, > -}; > - > static const struct pci_epc_features* > ls_pcie_ep_get_features(struct dw_pcie_ep *ep) > { > - return &ls_pcie_epc_features; > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci); > + > + return pcie->ls_epc; > } > > static void ls_pcie_ep_init(struct dw_pcie_ep *ep) > { > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci); > enum pci_barno bar; > > for (bar = BAR_0; bar <= BAR_5; bar++) > dw_pcie_ep_reset_bar(pci, bar); > + > + pcie->ls_epc->msi_capable = ep->msi_cap ? true : false; > + pcie->ls_epc->msix_capable = ep->msix_cap ? true : false; > } > > static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, > @@ -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device > *pdev) > struct device *dev = &pdev->dev; > struct dw_pcie *pci; > struct ls_pcie_ep *pcie; > + struct pci_epc_features *ls_epc; > struct resource *dbi_base; > int ret; > > @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct > platform_device *pdev) > if (!pci) > return -ENOMEM; > > + ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL); > + if (!ls_epc) > + return -ENOMEM; > + > dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); > pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base); > if (IS_ERR(pci->dbi_base)) > @@ -139,6 +146,13 @@ static int __init ls_pcie_ep_probe(struct > platform_device *pdev) > pci->ops = &ls_pcie_ep_ops; > pcie->pci = pci; > > + ls_epc->linkup_notifier = false, > + ls_epc->msi_capable = true, > + ls_epc->msix_capable = true, As [msi,msix]_capable is shortly set from ls_pcie_ep_init - is there any reason to set them here (to potentially incorrect values)? Thanks, Andrew Murray > + ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4), > + > + pcie->ls_epc = ls_epc; > + > platform_set_drvdata(pdev, pcie); > > ret = ls_add_pcie_ep(pcie, pdev); > -- > 2.9.5 >
Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC
On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote: > > > > -Original Message- > > From: Andrew Murray > > Sent: 2019年8月15日 19:32 > > To: Xiaowei Bao > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com; > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com; > > shawn...@kernel.org; Leo Li ; kis...@ti.com; > > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org; > > M.h. Lian ; Mingkai Hu ; > > Roy Zang ; linux-...@vger.kernel.org; > > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for > > DWC > > > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote: > > > Add multiple PFs support for DWC, different PF have different config > > > space, we use pf-offset property which get from the DTS to access the > > > different pF config space. > > > > Thanks for the patch. I haven't seen a cover letter for this series, is > > there one > > missing? > Maybe I miss, I will add you to review next time, thanks a lot for your > comments. > > > > > > > > > > Signed-off-by: Xiaowei Bao > > > --- > > > drivers/pci/controller/dwc/pcie-designware-ep.c | 97 > > +- > > > drivers/pci/controller/dwc/pcie-designware.c| 105 > > ++-- > > > drivers/pci/controller/dwc/pcie-designware.h| 10 ++- > > > include/linux/pci-epc.h | 1 + > > > 4 files changed, 164 insertions(+), 49 deletions(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > index 2bf5a35..75e2955 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) > > > pci_epc_linkup(epc); > > > } > > > > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno > > bar, > > > -int flags) > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no, > > > +enum pci_barno bar, int flags) > > > { > > > u32 reg; > > > + struct pci_epc *epc = pci->ep.epc; > > > + u32 pf_base = func_no * epc->pf_offset; > > > > > > - reg = PCI_BASE_ADDRESS_0 + (4 * bar); > > > + reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar); > > > > I think I'd rather see this arithmetic (and the one for determining pf_base) > > inside a macro or inline header function. This would make this code more > > readable and reduce the chances of an error by avoiding duplication of code. > > > > For example look at cdns_pcie_ep_fn_writeb and > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that do > > this. > Agree, this looks fine, thanks a lot for your comments, I will use this way > to access > the registers in next version patch. > > > > > > > dw_pcie_dbi_ro_wr_en(pci); > > > dw_pcie_writel_dbi2(pci, reg, 0x0); > > > dw_pcie_writel_dbi(pci, reg, 0x0); > > > @@ -37,7 +39,12 @@ static void __dw_pcie_ep_reset_bar(struct dw_pcie > > > *pci, enum pci_barno bar, > > > > > > void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar) { > > > - __dw_pcie_ep_reset_bar(pci, bar, 0); > > > + u8 func_no, funcs; > > > + > > > + funcs = pci->ep.epc->max_functions; > > > + > > > + for (func_no = 0; func_no < funcs; func_no++) > > > + __dw_pcie_ep_reset_bar(pci, func_no, bar, 0); > > > } > > > > > > static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr, > > > @@ -78,28 +85,29 @@ static int dw_pcie_ep_write_header(struct pci_epc > > > *epc, u8 func_no, { > > > struct dw_pcie_ep *ep = epc_get_drvdata(epc); > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > + u32 pf_base = func_no * epc->pf_offset; > > > > > > dw_pcie_dbi_ro_wr_en(pci); > > > - dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid); > > > - dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid); > > > - dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid); > > > - dw_pcie_writeb_dbi(
Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
On Fri, Aug 16, 2019 at 02:58:31AM +, Xiaowei Bao wrote: > > > > -Original Message- > > From: Andrew Murray > > Sent: 2019年8月15日 19:54 > > To: Xiaowei Bao > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com; > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com; > > shawn...@kernel.org; Leo Li ; kis...@ti.com; > > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org; > > M.h. Lian ; Mingkai Hu ; > > Roy Zang ; linux-...@vger.kernel.org; > > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org > > Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of > > MSI-X in EP mode > > > > On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote: > > > Add the doorbell mode of MSI-X in EP mode. > > > > > > Signed-off-by: Xiaowei Bao > > > --- > > > drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++ > > > drivers/pci/controller/dwc/pcie-designware.h| 14 ++ > > > 2 files changed, 28 insertions(+) > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > index 75e2955..e3a7cdf 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep > > *ep, u8 func_no, > > > return 0; > > > } > > > > > > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 > > func_no, > > > +u16 interrupt_num) > > > +{ > > > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > + u32 msg_data; > > > + > > > + msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) | > > > +(interrupt_num - 1); > > > + > > > + dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data); > > > + > > > + return 0; > > > +} > > > + > > > int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no, > > > u16 interrupt_num) > > > > Have I understood correctly that the hardware provides an alternative > > mechanism that allows for raising MSI-X interrupts without the bother of > > reading the capabilities registers? > Yes, the hardware provide two way to MSI-X, please check the page 492 of > DWC_pcie_dm_registers_4.30 Menu. > MSIX_DOORBELL_OFF on page 492 0x948 Description: MSI-X Doorbell Register> Thanks for the reference. > > > > If so is there any good reason to keep dw_pcie_ep_raise_msix_irq? (And thus > > use it in dw_plat_pcie_ep_raise_irq also)? > I am not sure, but I think the dw_pcie_ep_raise_msix_irq function is not > correct, > because I think we can't get the MSIX table from the address ep->phys_base + > tbl_addr, > but I also don't know where I can get the correct MSIX table. Well it looks like this function is used by snps,dw-pcie-ep and snps,dw-pcie, perhaps the doorbell mode isn't available on that hardware. > > > > > > > { > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h > > > b/drivers/pci/controller/dwc/pcie-designware.h > > > index 2b291e8..cd903e9 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware.h > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h > > > @@ -88,6 +88,11 @@ > > > #define PCIE_MISC_CONTROL_1_OFF 0x8BC > > > #define PCIE_DBI_RO_WR_ENBIT(0) > > > > > > +#define PCIE_MSIX_DOORBELL 0x948 > > > +#define PCIE_MSIX_DOORBELL_PF_SHIFT 24 > > > +#define PCIE_MSIX_DOORBELL_VF_SHIFT 16 > > > +#define PCIE_MSIX_DOORBELL_VF_ACTIVE BIT(15) > > > > The _VF defines are not used, I'd suggest removing them. > In fact, I will add the SRIOV support in this file, the SRIOV feature have > verified > In my board, but I need wait the EP framework SRIOV patch merge, > so I defined these two macros. I'd suggest adding the VF macros along with the SRIOV feature. Thanks, Andrew Murray > > > > Thanks, > > > > Andrew Murray > > > > > + > > > /* > > > * iATU Unroll-specific register definitions > > > * From 4.80 core version the address translation will be made by > > > unroll @@ -399,6 +404,8 @@ int dw
Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting capability with different PEX
On Fri, Aug 16, 2019 at 03:00:00AM +, Xiaowei Bao wrote: > > > > -Original Message- > > From: Andrew Murray > > Sent: 2019年8月15日 20:51 > > To: Xiaowei Bao > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com; > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com; > > shawn...@kernel.org; Leo Li ; kis...@ti.com; > > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org; > > M.h. Lian ; Mingkai Hu ; > > Roy Zang ; linux-...@vger.kernel.org; > > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org > > Subject: Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting > > capability with different PEX > > > > On Thu, Aug 15, 2019 at 04:37:11PM +0800, Xiaowei Bao wrote: > > > The different PCIe controller in one board may be have different > > > capability of MSI or MSIX, so change the way of getting the MSI > > > capability, make it more flexible. > > > > > > Signed-off-by: Xiaowei Bao > > > --- > > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 28 > > > +++--- > > > 1 file changed, 21 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > > > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > > > index be61d96..9404ca0 100644 > > > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c > > > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > > > @@ -22,6 +22,7 @@ > > > > > > struct ls_pcie_ep { > > > struct dw_pcie *pci; > > > + struct pci_epc_features *ls_epc; > > > }; > > > > > > #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev) > > > @@ -40,25 +41,26 @@ static const struct of_device_id > > ls_pcie_ep_of_match[] = { > > > { }, > > > }; > > > > > > -static const struct pci_epc_features ls_pcie_epc_features = { > > > - .linkup_notifier = false, > > > - .msi_capable = true, > > > - .msix_capable = false, > > > -}; > > > - > > > static const struct pci_epc_features* ls_pcie_ep_get_features(struct > > > dw_pcie_ep *ep) { > > > - return &ls_pcie_epc_features; > > > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci); > > > + > > > + return pcie->ls_epc; > > > } > > > > > > static void ls_pcie_ep_init(struct dw_pcie_ep *ep) { > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci); > > > enum pci_barno bar; > > > > > > for (bar = BAR_0; bar <= BAR_5; bar++) > > > dw_pcie_ep_reset_bar(pci, bar); > > > + > > > + pcie->ls_epc->msi_capable = ep->msi_cap ? true : false; > > > + pcie->ls_epc->msix_capable = ep->msix_cap ? true : false; > > > } > > > > > > static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, @@ > > > -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device > > *pdev) > > > struct device *dev = &pdev->dev; > > > struct dw_pcie *pci; > > > struct ls_pcie_ep *pcie; > > > + struct pci_epc_features *ls_epc; > > > struct resource *dbi_base; > > > int ret; > > > > > > @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct > > platform_device *pdev) > > > if (!pci) > > > return -ENOMEM; > > > > > > + ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL); > > > + if (!ls_epc) > > > + return -ENOMEM; > > > + > > > dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, > > "regs"); > > > pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base); > > > if (IS_ERR(pci->dbi_base)) > > > @@ -139,6 +146,13 @@ static int __init ls_pcie_ep_probe(struct > > platform_device *pdev) > > > pci->ops = &ls_pcie_ep_ops; > > > pcie->pci = pci; > > > > > > + ls_epc->linkup_notifier = false, > > > + ls_epc->msi_capable = true, > > > + ls_epc->msix_capable = true, > > > > As [msi,msix]_capable is shortly set from ls_pcie_ep_init - is there any > > reason > > to set them here (to potentially incorrect values)? > This is a INIT value, maybe false is better for msi_capable and msix_capable, > of course, we don't need to set it. ls_epc is kzalloc'd and so all zeros, so you get false for free. I think you can remove these two lines (or all three if you don't care that linkup_notifier isn't explicitly set). Thanks, Andrew Murray > > > > Thanks, > > > > Andrew Murray > > > > > + ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4), > > > + > > > + pcie->ls_epc = ls_epc; > > > + > > > platform_set_drvdata(pdev, pcie); > > > > > > ret = ls_add_pcie_ep(pcie, pdev); > > > -- > > > 2.9.5 > > >
Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC
On Fri, Aug 16, 2019 at 11:00:01AM +, Xiaowei Bao wrote: > > > > -Original Message- > > From: Andrew Murray > > Sent: 2019年8月16日 17:45 > > To: Xiaowei Bao > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com; > > mark.rutl...@arm.com; shawn...@kernel.org; Leo Li > > ; kis...@ti.com; lorenzo.pieral...@arm.com; > > a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian > > ; Roy Zang ; > > linux-...@vger.kernel.org; devicet...@vger.kernel.org; > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > > linuxppc-dev@lists.ozlabs.org; Z.q. Hou > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for > > DWC > > > > On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote: > > > > > > > > > > -Original Message- > > > > From: Andrew Murray > > > > Sent: 2019年8月15日 19:32 > > > > To: Xiaowei Bao > > > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com; > > > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com; > > > > shawn...@kernel.org; Leo Li ; kis...@ti.com; > > > > lorenzo.pieral...@arm.com; a...@arndb.de; > > > > gre...@linuxfoundation.org; M.h. Lian ; > > > > Mingkai Hu ; Roy Zang ; > > > > linux-...@vger.kernel.org; devicet...@vger.kernel.org; > > > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > > > > linuxppc-dev@lists.ozlabs.org > > > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs > > > > support for DWC > > > > > > > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote: > > > > > Add multiple PFs support for DWC, different PF have different > > > > > config space, we use pf-offset property which get from the DTS to > > > > > access the different pF config space. > > > > > > > > Thanks for the patch. I haven't seen a cover letter for this series, > > > > is there one missing? > > > Maybe I miss, I will add you to review next time, thanks a lot for your > > comments. > > > > > > > > > > > > > > > > > > Signed-off-by: Xiaowei Bao > > > > > --- > > > > > drivers/pci/controller/dwc/pcie-designware-ep.c | 97 > > > > +- > > > > > drivers/pci/controller/dwc/pcie-designware.c| 105 > > > > ++-- > > > > > drivers/pci/controller/dwc/pcie-designware.h| 10 ++- > > > > > include/linux/pci-epc.h | 1 + > > > > > 4 files changed, 164 insertions(+), 49 deletions(-) > > > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > > > index 2bf5a35..75e2955 100644 > > > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep > > *ep) > > > > > pci_epc_linkup(epc); > > > > > } > > > > > > > > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum > > > > > pci_barno > > > > bar, > > > > > -int flags) > > > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no, > > > > > +enum pci_barno bar, int flags) > > > > > { > > > > > u32 reg; > > > > > + struct pci_epc *epc = pci->ep.epc; > > > > > + u32 pf_base = func_no * epc->pf_offset; > > > > > > > > > > - reg = PCI_BASE_ADDRESS_0 + (4 * bar); > > > > > + reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar); > > > > > > > > I think I'd rather see this arithmetic (and the one for determining > > > > pf_base) inside a macro or inline header function. This would make > > > > this code more readable and reduce the chances of an error by avoiding > > duplication of code. > > > > > > > > For example look at cdns_pcie_ep_fn_writeb and > > > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that do > > > > this. > > > Agree, this looks fine, thanks a lot for your
[PATCH v4 01/13] perf/doc: update design.txt for exclude_{host|guest} flags
Update design.txt to reflect the presence of the exclude_host and exclude_guest perf flags. Signed-off-by: Andrew Murray --- tools/perf/design.txt | 4 1 file changed, 4 insertions(+) diff --git a/tools/perf/design.txt b/tools/perf/design.txt index a28dca2..0453ba2 100644 --- a/tools/perf/design.txt +++ b/tools/perf/design.txt @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a way to request that counting of events be restricted to times when the CPU is in user, kernel and/or hypervisor mode. +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way +to request counting of events restricted to guest and host contexts when +using Linux as the hypervisor. + The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap operations, these can be used to relate userspace IP addresses to actual code, even after the mapping (or even the whole process) is gone, -- 2.7.4