Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver
Le jeu. 27 juin 2019 à 8:58, Lee Jones a écrit : On Wed, 26 Jun 2019, Paul Cercueil wrote: Le mer. 26 juin 2019 à 15:18, Lee Jones a écrit : > On Tue, 21 May 2019, Paul Cercueil wrote: > > > This driver will provide a regmap that can be retrieved very early > > in > > the boot process through the API function ingenic_tcu_get_regmap(). > > > > Additionally, it will call devm_of_platform_populate() so that all > > the > > children devices will be probed. > > > > Signed-off-by: Paul Cercueil > > --- > > > > Notes: > > v12: New patch > > > > drivers/mfd/Kconfig | 8 +++ > > drivers/mfd/Makefile| 1 + > > drivers/mfd/ingenic-tcu.c | 113 > > > > include/linux/mfd/ingenic-tcu.h | 8 +++ > > 4 files changed, 130 insertions(+) > > create mode 100644 drivers/mfd/ingenic-tcu.c [...] > > +static struct regmap * __init ingenic_tcu_create_regmap(struct > > device_node *np) > > +{ > > + struct resource res; > > + void __iomem *base; > > + struct regmap *map; > > + > > + if (!of_match_node(ingenic_tcu_of_match, np)) > > + return ERR_PTR(-EINVAL); Drop this check. > > + base = of_io_request_and_map(np, 0, "TCU"); > > + if (IS_ERR(base)) > > + return ERR_PTR(PTR_ERR(base)); > > + > > + map = regmap_init_mmio(NULL, base, &ingenic_tcu_regmap_config); > > + if (IS_ERR(map)) > > + goto err_iounmap; Place this inside probe(). > > + return map; > > + > > +err_iounmap: > > + iounmap(base); > > + of_address_to_resource(np, 0, &res); > > + release_mem_region(res.start, resource_size(&res)); > > + > > + return map; > > +} > > Why does this need to be set-up earlier than probe()? See the explanation below. I think the answer is, it doesn't. > > +static int __init ingenic_tcu_probe(struct platform_device *pdev) > > +{ > > + struct regmap *map = ingenic_tcu_get_regmap(pdev->dev.of_node); > > + > > + platform_set_drvdata(pdev, map); > > + > > + regmap_attach_dev(&pdev->dev, map, &ingenic_tcu_regmap_config); > > + > > + return devm_of_platform_populate(&pdev->dev); > > +} > > + > > +static struct platform_driver ingenic_tcu_driver = { > > + .driver = { > > + .name = "ingenic-tcu", > > + .of_match_table = ingenic_tcu_of_match, > > + }, > > +}; > > + > > +static int __init ingenic_tcu_platform_init(void) > > +{ > > + return platform_driver_probe(&ingenic_tcu_driver, > > +ingenic_tcu_probe); > > What? Why? The device driver probed here will populate the children devices, which will be able to retrieve the pointer to the regmap through device_get_regmap(dev->parent). I've never heard of this call. Where is it? dev_get_regmap, in . The children devices are normal platform drivers that can be probed the normal way. These are the PWM driver, the watchdog driver, and the OST (OS Timer) clocksource driver, all part of the same hardware block (the Timer/Counter Unit or TCU). If they are normal devices, then there is no need to roll your own regmap-getter implementation like this. > > +} > > +subsys_initcall(ingenic_tcu_platform_init); > > + > > +struct regmap * __init ingenic_tcu_get_regmap(struct device_node > > *np) > > +{ > > + if (!tcu_regmap) > > + tcu_regmap = ingenic_tcu_create_regmap(np); > > + > > + return tcu_regmap; > > +} > > This makes me pretty uncomfortable. > > What calls it? The TCU IRQ driver (patch [06/13]), clocks driver (patch [05/13]), and the non-OST clocksource driver (patch [07/13]) all probe very early in the boot process, and share the same devicetree node. They call this function to get a pointer to the regmap. Horrible! Instead, you should send it through platform_set_drvdata() and collect it in the child drivers with platform_get_drvdata(dev->parent). The IRQ, clocks and clocksource driver do NOT have a "struct device" to begin with. They are not platform drivers, and cannot be platform drivers, as they must register so early in the boot process, before "struct device" is even a thing. All they get is a pointer to the same devicetree node. Since all of these have to use the same registers, they need to use a shared regmap, which they obtain by calling ingenic_tcu_get_regmap() below. Then, when this driver's probe gets called, the regmap is retrieved and attached to the struct device, and then the children devices will be probed: the watchdog device, the PWM device, the OST device. These three will retrieve the regmap by calling dev_get_regmap(dev->parent, NULL). > > +bool ingenic_tcu_pwm_can_use_chn(struct device *dev, unsigned int > > channel) > > +{ > > + const struct ingenic_soc_info *soc = > > device_get_match_data(dev->parent); > > + > > + /* Enable al
Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver
On Thu, 27 Jun 2019, Paul Cercueil wrote: > Le jeu. 27 juin 2019 à 8:58, Lee Jones a écrit : > > On Wed, 26 Jun 2019, Paul Cercueil wrote: > > > Le mer. 26 juin 2019 à 15:18, Lee Jones a > > > écrit : > > > > On Tue, 21 May 2019, Paul Cercueil wrote: > > > > > > > > > This driver will provide a regmap that can be retrieved very > > > early > > > > > in > > > > > the boot process through the API function > > > ingenic_tcu_get_regmap(). > > > > > > > > > > Additionally, it will call devm_of_platform_populate() so that > > > all > > > > > the > > > > > children devices will be probed. > > > > > > > > > > Signed-off-by: Paul Cercueil > > > > > --- > > > > > > > > > > Notes: > > > > > v12: New patch > > > > > > > > > > drivers/mfd/Kconfig | 8 +++ > > > > > drivers/mfd/Makefile| 1 + > > > > > drivers/mfd/ingenic-tcu.c | 113 > > > > > > > > > > include/linux/mfd/ingenic-tcu.h | 8 +++ > > > > > 4 files changed, 130 insertions(+) > > > > > create mode 100644 drivers/mfd/ingenic-tcu.c > > > > [...] > > > > > > > +static struct regmap * __init ingenic_tcu_create_regmap(struct > > > > > device_node *np) > > > > > +{ > > > > > + struct resource res; > > > > > + void __iomem *base; > > > > > + struct regmap *map; > > > > > + > > > > > + if (!of_match_node(ingenic_tcu_of_match, np)) > > > > > + return ERR_PTR(-EINVAL); > > > > Drop this check. > > > > > > > + base = of_io_request_and_map(np, 0, "TCU"); > > > > > + if (IS_ERR(base)) > > > > > + return ERR_PTR(PTR_ERR(base)); > > > > > + > > > > > + map = regmap_init_mmio(NULL, base, > > > &ingenic_tcu_regmap_config); > > > > > + if (IS_ERR(map)) > > > > > + goto err_iounmap; > > > > Place this inside probe(). > > > > > > > + return map; > > > > > + > > > > > +err_iounmap: > > > > > + iounmap(base); > > > > > + of_address_to_resource(np, 0, &res); > > > > > + release_mem_region(res.start, resource_size(&res)); > > > > > + > > > > > + return map; > > > > > +} > > > > > > > > Why does this need to be set-up earlier than probe()? > > > > > > See the explanation below. > > > > I think the answer is, it doesn't. > > > > > > > +static int __init ingenic_tcu_probe(struct platform_device > > > *pdev) > > > > > +{ > > > > > + struct regmap *map = > > > ingenic_tcu_get_regmap(pdev->dev.of_node); > > > > > + > > > > > + platform_set_drvdata(pdev, map); > > > > > + > > > > > + regmap_attach_dev(&pdev->dev, map, > > > &ingenic_tcu_regmap_config); > > > > > + > > > > > + return devm_of_platform_populate(&pdev->dev); > > > > > +} > > > > > + > > > > > +static struct platform_driver ingenic_tcu_driver = { > > > > > + .driver = { > > > > > + .name = "ingenic-tcu", > > > > > + .of_match_table = ingenic_tcu_of_match, > > > > > + }, > > > > > +}; > > > > > + > > > > > +static int __init ingenic_tcu_platform_init(void) > > > > > +{ > > > > > + return platform_driver_probe(&ingenic_tcu_driver, > > > > > +ingenic_tcu_probe); > > > > > > > > What? Why? > > > > > > The device driver probed here will populate the children devices, > > > which will be able to retrieve the pointer to the regmap through > > > device_get_regmap(dev->parent). > > > > I've never heard of this call. Where is it? > > dev_get_regmap, in . > > > > The children devices are normal platform drivers that can be probed > > > the normal way. These are the PWM driver, the watchdog driver, and > > > the > > > OST (OS Timer) clocksource driver, all part of the same hardware > > > block > > > (the Timer/Counter Unit or TCU). > > > > If they are normal devices, then there is no need to roll your own > > regmap-getter implementation like this. > > > > > > > +} > > > > > +subsys_initcall(ingenic_tcu_platform_init); > > > > > + > > > > > +struct regmap * __init ingenic_tcu_get_regmap(struct > > > device_node > > > > > *np) > > > > > +{ > > > > > + if (!tcu_regmap) > > > > > + tcu_regmap = ingenic_tcu_create_regmap(np); > > > > > + > > > > > + return tcu_regmap; > > > > > +} > > > > > > > > This makes me pretty uncomfortable. > > > > > > > > What calls it? > > > > > > The TCU IRQ driver (patch [06/13]), clocks driver (patch [05/13]), > > > and the > > > non-OST clocksource driver (patch [07/13]) all probe very early in > > > the boot > > > process, and share the same devicetree node. They call this > > > function to get > > > a pointer to the regmap. > > > > Horrible! > > > > Instead, you should send it through platform_set_drvdata() and collect > > it in the child drivers with platform_get_drvdata(dev->parent). > > The IRQ, clocks and clocksource driver do NOT have a "struct device" to > begin with. They are not platform drivers, and c
Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver
Le jeu. 27 juin 2019 à 11:01, Lee Jones a écrit : On Thu, 27 Jun 2019, Paul Cercueil wrote: Le jeu. 27 juin 2019 à 8:58, Lee Jones a écrit : > On Wed, 26 Jun 2019, Paul Cercueil wrote: > > Le mer. 26 juin 2019 à 15:18, Lee Jones a > > écrit : > > > On Tue, 21 May 2019, Paul Cercueil wrote: > > > > > > > This driver will provide a regmap that can be retrieved very > > early > > > > in > > > > the boot process through the API function > > ingenic_tcu_get_regmap(). > > > > > > > > Additionally, it will call devm_of_platform_populate() so that > > all > > > > the > > > > children devices will be probed. > > > > > > > > Signed-off-by: Paul Cercueil > > > > --- > > > > > > > > Notes: > > > > v12: New patch > > > > > > > > drivers/mfd/Kconfig | 8 +++ > > > > drivers/mfd/Makefile| 1 + > > > > drivers/mfd/ingenic-tcu.c | 113 > > > > > > > > include/linux/mfd/ingenic-tcu.h | 8 +++ > > > > 4 files changed, 130 insertions(+) > > > > create mode 100644 drivers/mfd/ingenic-tcu.c > > [...] > > > > > +static struct regmap * __init ingenic_tcu_create_regmap(struct > > > > device_node *np) > > > > +{ > > > > +struct resource res; > > > > +void __iomem *base; > > > > +struct regmap *map; > > > > + > > > > +if (!of_match_node(ingenic_tcu_of_match, np)) > > > > +return ERR_PTR(-EINVAL); > > Drop this check. > > > > > +base = of_io_request_and_map(np, 0, "TCU"); > > > > +if (IS_ERR(base)) > > > > +return ERR_PTR(PTR_ERR(base)); > > > > + > > > > +map = regmap_init_mmio(NULL, base, > > &ingenic_tcu_regmap_config); > > > > +if (IS_ERR(map)) > > > > +goto err_iounmap; > > Place this inside probe(). > > > > > +return map; > > > > + > > > > +err_iounmap: > > > > +iounmap(base); > > > > +of_address_to_resource(np, 0, &res); > > > > +release_mem_region(res.start, resource_size(&res)); > > > > + > > > > +return map; > > > > +} > > > > > > Why does this need to be set-up earlier than probe()? > > > > See the explanation below. > > I think the answer is, it doesn't. > > > > > +static int __init ingenic_tcu_probe(struct platform_device > > *pdev) > > > > +{ > > > > +struct regmap *map = > > ingenic_tcu_get_regmap(pdev->dev.of_node); > > > > + > > > > +platform_set_drvdata(pdev, map); > > > > + > > > > +regmap_attach_dev(&pdev->dev, map, > > &ingenic_tcu_regmap_config); > > > > + > > > > +return devm_of_platform_populate(&pdev->dev); > > > > +} > > > > + > > > > +static struct platform_driver ingenic_tcu_driver = { > > > > +.driver = { > > > > +.name = "ingenic-tcu", > > > > +.of_match_table = ingenic_tcu_of_match, > > > > +}, > > > > +}; > > > > + > > > > +static int __init ingenic_tcu_platform_init(void) > > > > +{ > > > > +return platform_driver_probe(&ingenic_tcu_driver, > > > > + ingenic_tcu_probe); > > > > > > What? Why? > > > > The device driver probed here will populate the children devices, > > which will be able to retrieve the pointer to the regmap through > > device_get_regmap(dev->parent). > > I've never heard of this call. Where is it? dev_get_regmap, in . > > The children devices are normal platform drivers that can be probed > > the normal way. These are the PWM driver, the watchdog driver, and > > the > > OST (OS Timer) clocksource driver, all part of the same hardware > > block > > (the Timer/Counter Unit or TCU). > > If they are normal devices, then there is no need to roll your own > regmap-getter implementation like this. > > > > > +} > > > > +subsys_initcall(ingenic_tcu_platform_init); > > > > + > > > > +struct regmap * __init ingenic_tcu_get_regmap(struct > > device_node > > > > *np) > > > > +{ > > > > +if (!tcu_regmap) > > > > +tcu_regmap = ingenic_tcu_create_regmap(np); > > > > + > > > > +return tcu_regmap; > > > > +} > > > > > > This makes me pretty uncomfortable. > > > > > > What calls it? > > > > The TCU IRQ driver (patch [06/13]), clocks driver (patch [05/13]), > > and the > > non-OST clocksource driver (patch [07/13]) all probe very early in > > the boot > > process, and share the same devicetree node. They call this > > function to get > > a pointer to the regmap. > > Horrible! > > Instead, you should send it through platform_set_drvdata() and collect > it in the child drivers with platform_get_drvdata(dev->parent). The IRQ, clocks and clocksource driver do NOT have a "struct
Re: [PATCH] binfmt_elf: Extract .note.gnu.property from an ELF file
On Wed, Jun 26, 2019 at 10:14:07AM -0700, Andy Lutomirski wrote: > On Thu, May 2, 2019 at 4:10 AM Dave Martin wrote: [...] > > A couple of questions before I look in more detail: > > > > 1) Can we rely on PT_GNU_PROPERTY being present in the phdrs to describe > > the NT_GNU_PROPERTY_TYPE_0 note? If so, we can avoid trying to parse > > irrelevant PT_NOTE segments. > > > > > > 2) Are there standard types for things like the program property header? > > If not, can we add something in elf.h? We should try to coordinate with > > libc on that. Something like > > > > Where did PT_GNU_PROPERTY come from? Are there actual docs for it? > Can someone here tell us what the actual semantics of this new ELF > thingy are? From some searching, it seems like it's kind of an ELF > note but kind of not. An actual description would be fantastic. https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf I don't know _when_ it was added, and the description is minimal, but it's there. (I'd say it's fairly obvious how it should be used, but it could do with some clarification...) > Also, I don't think there's any actual requirement that the upstream > kernel recognize existing CET-enabled RHEL 8 binaries as being > CET-enabled. I tend to think that RHEL 8 jumped the gun here. While > the upstream kernel should make some reasonble effort to make sure > that RHEL 8 binaries will continue to run, I don't see why we need to > go out of our way to keep the full set of mitigations available for > binaries that were developed against a non-upstream kernel. If that's an accpetable approach, it should certainly make our life easier. > In fact, if we handle the legacy bitmap differently from RHEL 8, we > may *have* to make sure that we don't recognize existing RHEL 8 > binaries as CET-enabled. Can't comment on that. If the existing RHEL 8 binaries strictly don't have the PT_GNU_PROPERTY phdr, then this might serve a dual purpose ... otherwise, x86 might need some additional annotation for new binaries. I'll leave it for others to comment. Cheers ---Dave
Re: [PATCH] binfmt_elf: Extract .note.gnu.property from an ELF file
* Andy Lutomirski: > Also, I don't think there's any actual requirement that the upstream > kernel recognize existing CET-enabled RHEL 8 binaries as being > CET-enabled. I tend to think that RHEL 8 jumped the gun here. The ABI was supposed to be finalized and everyone involved thought it had been reviewed by the GNU gABI community and other interested parties. It had been included in binutils for several releases. >From my point of view, the kernel is just a consumer of the ABI. The kernel would not change an instruction encoding if it doesn't like it for some reason, either. > While the upstream kernel should make some reasonble effort to make > sure that RHEL 8 binaries will continue to run, I don't see why we > need to go out of our way to keep the full set of mitigations > available for binaries that were developed against a non-upstream > kernel. They were developed against the ABI specification. I do not have a strong opinion what the kernel should do going forward. I just want to make clear what happened. Thanks, Florian
Re: [PATCH 12/14] doc-rst: add ABI documentation to the admin-guide book
On Fri, 21 Jun 2019, Mauro Carvalho Chehab wrote: > Em Wed, 19 Jun 2019 13:37:39 -0300 > Mauro Carvalho Chehab escreveu: > >> Em Tue, 18 Jun 2019 11:47:32 +0300 >> Jani Nikula escreveu: >> >> > On Mon, 17 Jun 2019, Mauro Carvalho Chehab >> > wrote: >> > > Yeah, I guess it should be possible to do that. How a python script >> > > can identify if it was called by Sphinx, or if it was called directly? >> > > >> > >> > if __name__ == '__main__': >> ># run on the command-line, not imported >> >> Ok, when I have some spare time, I may try to convert one script >> to python and see how it behaves. > > Did a quick test here... > > Probably I'm doing something wrong (as I'm a rookie with Python), but, > in order to be able to use the same script as command line and as an Sphinx > extension, everything that it is currently there should be "escaped" > by an: > > if __name__ != '__main__': > > As event the class definition: > > class KernelCmd(Directive): > > depends on: > > from docutils.parsers.rst import directives, Directive > > With is only required when one needs to parse ReST - e. g. only > when the script runs as a Sphinx extension. > > If this is right, as we want a script that can run by command line > to parse and search inside ABI files, at the end of the day, it will > be a lot easier to maintain if the parser script is different from the > Sphinx extension. Split it into two files, one has the nuts and bolts of parsing and has the "if __name__ == '__main__':" bit to run on the command line, and the other interfaces with Sphinx and imports the parser. > If so, as the Sphinx extension script will need to call a parsing script > anyway, it doesn't matter the language of the script with will be > doing the ABI file parsing. Calling the parser using an API will be easier to use, maintain and extend than using pipes, with all the interleaved sideband information to adjust line numbers and whatnot. BR, Jani. > > See the enclosing file. I didn't add anything from the ABI parsing > script yet. It was just changed in order to not generate an error > when trying to run it from command line. > > > Thanks, > Mauro > #!/usr/bin/env python3 > # coding=utf-8 > # SPDX-License-Identifier: GPL-2.0 > # > u""" > kernel-abi > ~~ > > Implementation of the ``kernel-abi`` reST-directive. > > :copyright: Copyright (C) 2016 Markus Heiser > :copyright: Copyright (C) 2016-2019 Mauro Carvalho Chehab > :maintained-by: Mauro Carvalho Chehab > :license:GPL Version 2, June 1991 see Linux/COPYING for details. > > The ``kernel-abi`` (:py:class:`KernelCmd`) directive calls the > scripts/get_abi.pl script to parse the Kernel ABI files. > > Overview of directive's argument and options. > > .. code-block:: rst > > .. kernel-abi:: > :debug: > > The argument is required. It contains the > location of the ABI files to be parsed. > > ``debug`` > Inserts a code-block with the *raw* reST. Sometimes it is helpful to see > what reST is generated. > > """ > > import codecs > import os > import subprocess > import sys > > from os import path > > if __name__ != '__main__': > from docutils import nodes, statemachine > from docutils.statemachine import ViewList > from docutils.parsers.rst import directives, Directive > from docutils.utils.error_reporting import ErrorString > > # > # AutodocReporter is only good up to Sphinx 1.7 > # > import sphinx > > Use_SSI = sphinx.__version__[:3] >= '1.7' > if Use_SSI: > from sphinx.util.docutils import switch_source_input > else: > from sphinx.ext.autodoc import AutodocReporter > > __version__ = '1.0' > > if __name__ != '__main__': > def setup(app): > > app.add_directive("kernel-abi", KernelCmd) > return dict( > version = __version__ > , parallel_read_safe = True > , parallel_write_safe = True > ) > > class KernelCmd(Directive): > > u"""KernelABI (``kernel-abi``) directive""" > > required_arguments = 1 > optional_arguments = 2 > has_content = False > final_argument_whitespace = True > > option_spec = { > "debug" : directives.flag, > "rst" : directives.unchanged > } > > def warn(self, message, **replace): > replace["fname"] = self.state.document.current_source > replace["line_no"] = replace.get("line_no", self.lineno) > message = ("%(fname)s:%(line_no)s: [kernel-abi WARN] : " + > message) % replace > self.state.document.settings.env.app.warn(message, prefix="") > > def run(self): > > doc = self.state.document > if not doc.settings.file_insertion_enabled: > raise self.warning("docutils: file insertion disabled") > > env = doc.setting
Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
Hi Ganapat, On Fri, Jun 14, 2019 at 05:42:46PM +, Ganapatrao Kulkarni wrote: > CCPI2 is a low-latency high-bandwidth serial interface for connecting > ThunderX2 processors. This patch adds support to capture CCPI2 perf events. > > Signed-off-by: Ganapatrao Kulkarni > --- > drivers/perf/thunderx2_pmu.c | 179 ++- > 1 file changed, 157 insertions(+), 22 deletions(-) > > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c > index 43d76c85da56..3791ac9b897d 100644 > --- a/drivers/perf/thunderx2_pmu.c > +++ b/drivers/perf/thunderx2_pmu.c > @@ -16,7 +16,9 @@ > * they need to be sampled before overflow(i.e, at every 2 seconds). > */ > > -#define TX2_PMU_MAX_COUNTERS 4 > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4 I find it odd that you rename this... > +#define TX2_PMU_CCPI2_MAX_COUNTERS 8 > + > #define TX2_PMU_DMC_CHANNELS 8 > #define TX2_PMU_L3_TILES 16 > > @@ -28,11 +30,22 @@ >*/ > #define DMC_EVENT_CFG(idx, val) ((val) << (((idx) * 8) + 1)) > > +#define GET_EVENTID_CCPI2(ev)((ev->hw.config) & 0x1ff) > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */ > +#define GET_COUNTERID_CCPI2(ev) ((ev->hw.idx) & 0x7) > +#define CCPI2_COUNTER_OFFSET 8 ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C events. Saying that, if you have the event you can figure out its type, so could you avoid the need for additional macros entirely and just use the correct masks based on the corresponding PMU type? That might also avoid some of the conditional control flow you're introducing elsewhere. > #define L3C_COUNTER_CTL 0xA8 > #define L3C_COUNTER_DATA 0xAC > #define DMC_COUNTER_CTL 0x234 > #define DMC_COUNTER_DATA 0x240 > > +#define CCPI2_PERF_CTL 0x108 > +#define CCPI2_COUNTER_CTL0x10C > +#define CCPI2_COUNTER_SEL0x12c > +#define CCPI2_COUNTER_DATA_L 0x130 > +#define CCPI2_COUNTER_DATA_H 0x134 > + > /* L3C event IDs */ > #define L3_EVENT_READ_REQ0xD > #define L3_EVENT_WRITEBACK_REQ 0xE > @@ -51,9 +64,16 @@ > #define DMC_EVENT_READ_TXNS 0xF > #define DMC_EVENT_MAX0x10 > > +#define CCPI2_EVENT_REQ_PKT_SENT 0x3D > +#define CCPI2_EVENT_SNOOP_PKT_SENT 0x65 > +#define CCPI2_EVENT_DATA_PKT_SENT0x105 > +#define CCPI2_EVENT_GIC_PKT_SENT 0x12D > +#define CCPI2_EVENT_MAX 0x200 > + > enum tx2_uncore_type { > PMU_TYPE_L3C, > PMU_TYPE_DMC, > + PMU_TYPE_CCPI2, > PMU_TYPE_INVALID, > }; > > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu { > u32 max_events; > u64 hrtimer_interval; > void __iomem *base; > - DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS); > - struct perf_event *events[TX2_PMU_MAX_COUNTERS]; > + DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS); > + struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS]; Hmm, that looks very odd. Why are they now different sizes? > struct device *dev; > struct hrtimer hrtimer; > const struct attribute_group **attr_groups; > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu > *pmu_to_tx2_pmu(struct pmu *pmu) > return container_of(pmu, struct tx2_uncore_pmu, pmu); > } > > -PMU_FORMAT_ATTR(event, "config:0-4"); > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)\ > +static ssize_t > \ > +__tx2_pmu_##_var##_show(struct device *dev, \ > +struct device_attribute *attr, \ > +char *page) \ > +{\ > + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ > + return sprintf(page, _format "\n"); \ > +}\ > + \ > +static struct device_attribute format_attr_##_var = \ > + __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL) > + > +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4"); > +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9"); This doesn't look right. Can perf deal with overlapping fields? It seems wrong that we'd allow the user to specify both, for example. > > static struct attribute *l3c_pmu_format_attrs[] = { > &format_attr_event.attr, > @@ -104,6 +138,11 @@ static struct attribute *dmc_pmu_format_attrs[] = { > NULL, > }; > > +static struct attribute *ccpi2_pmu_format_attrs[] = { > + &format_attr_event_ccpi2.attr, > + NULL, > +}; > + > static const struct attribute_group
Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote: > From: Ganapatrao Kulkarni > > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU. > > Signed-off-by: Ganapatrao Kulkarni > --- > Documentation/perf/thunderx2-pmu.txt | 20 +++- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/Documentation/perf/thunderx2-pmu.txt > b/Documentation/perf/thunderx2-pmu.txt > index dffc57143736..62243230abc3 100644 > --- a/Documentation/perf/thunderx2-pmu.txt > +++ b/Documentation/perf/thunderx2-pmu.txt > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU > UNCORE) > = > > The ThunderX2 SoC PMU consists of independent, system-wide, per-socket > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC). > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and > +Cavium Coherent Processor Interconnect (CCPI2). > > The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles. > Events are counted for the default channel (i.e. channel 0) and prorated > to the total number of channels/tiles. > > -The DMC and L3C support up to 4 counters. Counters are independently > -programmable and can be started and stopped individually. Each counter > -can be set to a different event. Counters are 32-bit and do not support > -an overflow interrupt; they are read every 2 seconds. > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters. The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8 counters. > +Counters are independently programmable and can be started and stopped > +individually. Each counter can be set to a different event. DMC and L3C > +Counters are 32-bit and do not support an overflow interrupt; they are read Counters -> counters > +every 2 seconds. CCPI2 counters are 64-bit. Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these two sentences as: None of the counters support an overflow interrupt and therefore sampling events are unsupported. The DMC and L3C counters are 32-bit and read every 2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in normal operation. > PMU UNCORE (perf) driver: > > The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and > -L3C devices. Each PMU can be used to count up to 4 events > -simultaneously. The PMUs provide a description of their available events > -and configuration options under sysfs, see > -/sys/devices/uncore_; S is the socket id. > +L3C devices. Each PMU can be used to count up to 4(DMC/L3C) or up to 8 Space between 4 and ( Will
Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
On Mon, Jun 24, 2019 at 10:34:02AM +, qi.f...@fujitsu.com wrote: > On 6/18/19 2:03 AM, Will Deacon wrote: > > On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote: > >> From: Takao Indoh > >> > >> I found a performance issue related on the implementation of Linux's TLB > >> flush for arm64. > >> > >> When I run a single-threaded test program on moderate environment, it > >> usually takes 39ms to finish its work. However, when I put a small > >> apprication, which just calls mprotest() continuously, on one of sibling > >> cores and run it simultaneously, the test program slows down significantly. > >> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on > >> ThunderX1 and Fujitsu A64FX. > > This is a problem for any applications that share hardware resources with > > each other, so I don't think it's something we should be too concerned about > > addressing unless there is a practical DoS scenario, which there doesn't > > appear to be in this case. It may be that the real answer is "don't call > > mprotect() in a loop". > I think there has been a misunderstanding, please let me explain. > This application is just an example using for reproducing the > performance issue we found. > Our original purpose is reducing OS jitter by this series. > The OS jitter on massively parallel processing systems have been known > and studied for many years. > The 2.5% OS jitter can result in over a factor of 20 slowdown for the > same application [1]. I think it's worth pointing out that the system in question was neither ARM-based nor running Linux, so I'd be cautious in applying the conclusions of that paper directly to our TLB invalidation code. Furthermore, the noise being generated in their experiments uses a timer interrupt, which has a /vastly/ different profile to a DVM message in terms of both system impact and frequency. > Though it may be an extreme example, reducing the OS jitter has been an > issue in HPC environment. > > [1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell. > "Characterizing application sensitivity to OS interference using > kernel-level noise injection." Proceedings of the 2008 ACM/IEEE > conference on Supercomputing. IEEE Press, 2008. > > >> I suppose the root cause of this issue is the implementation of Linux's TLB > >> flush for arm64, especially use of TLBI-is instruction which is a broadcast > >> to all processor core on the system. In case of the above situation, > >> TLBI-is is called by mprotect(). > > On the flip side, Linux is providing the hardware with enough information > > not to broadcast to cores for which the remote TLBs don't have entries > > allocated for the ASID being invalidated. I would say that the root cause > > of the issue is that this filtering is not taking place. > > Do you mean that the filter should be implemented in hardware? Yes. If you're building a large system and you care about "jitter", then you either need to partition it in such a way that sources of noise are contained, or you need to introduce filters to limit their scope. Rewriting the low-level memory-management parts of the operating system is a red herring and imposes a needless burden on everybody else without solving the real problem, which is that contended use of shared resources doesn't scale. Will
Re: [PATCH 12/14] doc-rst: add ABI documentation to the admin-guide book
Em Thu, 27 Jun 2019 12:48:12 +0300 Jani Nikula escreveu: > On Fri, 21 Jun 2019, Mauro Carvalho Chehab wrote: > > Em Wed, 19 Jun 2019 13:37:39 -0300 > > Mauro Carvalho Chehab escreveu: > > > >> Em Tue, 18 Jun 2019 11:47:32 +0300 > >> Jani Nikula escreveu: > >> > >> > On Mon, 17 Jun 2019, Mauro Carvalho Chehab > >> > wrote: > >> > > Yeah, I guess it should be possible to do that. How a python script > >> > > can identify if it was called by Sphinx, or if it was called directly? > >> > > > >> > > >> > if __name__ == '__main__': > >> > # run on the command-line, not imported > >> > >> Ok, when I have some spare time, I may try to convert one script > >> to python and see how it behaves. > > > > Did a quick test here... > > > > Probably I'm doing something wrong (as I'm a rookie with Python), but, > > in order to be able to use the same script as command line and as an Sphinx > > extension, everything that it is currently there should be "escaped" > > by an: > > > > if __name__ != '__main__': > > > > As event the class definition: > > > > class KernelCmd(Directive): > > > > depends on: > > > > from docutils.parsers.rst import directives, Directive > > > > With is only required when one needs to parse ReST - e. g. only > > when the script runs as a Sphinx extension. > > > > If this is right, as we want a script that can run by command line > > to parse and search inside ABI files, at the end of the day, it will > > be a lot easier to maintain if the parser script is different from the > > Sphinx extension. > > Split it into two files, one has the nuts and bolts of parsing and has > the "if __name__ == '__main__':" bit to run on the command line, and the > other interfaces with Sphinx and imports the parser. It seems we have an agreement here: the best is indeed to have two files, one with the Documentation/ABI parser, and another one with the Sphinx extension... > > > If so, as the Sphinx extension script will need to call a parsing script > > anyway, it doesn't matter the language of the script with will be > > doing the ABI file parsing. > > Calling the parser using an API will be easier to use, maintain and > extend than using pipes, with all the interleaved sideband information > to adjust line numbers and whatnot. ... and here is where we have two different views. From debug PoV, the Documentation/ABI parser script should be able to print the results by a command line call. This is also a feature that it is useful for the users: to be able to seek for a symbol and output its ABI description. So, the "stdout" output will be there anyway. The only extra data for the extension side is the file name where the information came and the line number. From maintainership PoV, adding the sideband API for file+line is one line at the parser script (a print) and two lines at the Sphinx extension (a regex expression and a match line). That's simple to maintain. It is also simple to verify both sides independently, as what you see when running the parser script is what you get at the extension. If we add a new ABI between the parser script and the extension script, this would require to also maintain the ABI, and would make harder to identify problems on ABI problems. - Another advantage of having those independent is that the language of the parsing script can be different. Not being python is a big advantage for me, as perl is a lot more intuitive and easier to write parser scripts for my eyes. I can write a perl parsing script in a matter of minutes. It takes me a lot more time to do the same with python, and then ensure that it will work with two similar but different languages (python2 and python3) [1]. [1] btw, with that regards, I still don't know how to teach a python script that it should "prefer" to run with python3 but would fall back to python2. Adding this shebang: # /usr/bin/env python just do the opposite - at least on Fedora Thanks, Mauro
Re: [PATCH v2 2/4] kbuild: do not create wrappers for header-test-y
On Thu, 27 Jun 2019, Masahiro Yamada wrote: > header-test-y does not work with headers in sub-directories. > > For example, you can write a Makefile, like this: > > include/linux/Kbuild: > > header-test-y += mtd/nand.h > > This entry creates a wrapper include/linux/mtd/nand.hdrtest.c with > the following content: > > #include "mtd/nand.h" > > To make this work, we need to add $(srctree)/include/linux to the > header search path. It would be tedious to add ccflags-y. > > We could change the *.hdrtest.c rule to wrap: > > #include "nand.h" > > This works for in-tree build since #include "..." searches in the > relative path from the header with this directive. For O=... build, > we need to add $(srctree)/include/linux/mtd to the header search path, > which will be even more tedious. > > After all, I thought it would be handier to compile headers directly > without creating wrappers. > > I added a new build rule to compile %.h into %.h.s > > I chose %.h.s instead of %.h.o because it was a little bit faster. > Also, for GCC, an empty assembly is smaller than an empty object. > > I wrote the build rule: > > $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $< > > instead of: > > $(CC) $(c_flags) -S -o $@ -x c $< > > Both work fine with GCC, but the latter is not good for Clang. > > This comes down to the difference in the -Wunused-function policy. > GCC does not warn about unused 'static inline' functions at all. > Clang does not warn about the ones in included headers, but does > about the ones in the source. So, we should handle headers as > headers, not as source files. > > In fact, this has been hidden since commit abb2ea7dfd82 ("compiler, > clang: suppress warning for unused static inline functions"), but we > should not rely on that. > > Signed-off-by: Masahiro Yamada > --- > > Changes in v2: > - New patch > > .gitignore | 1 - > Documentation/dontdiff | 1 - > Documentation/kbuild/makefiles.txt | 3 +-- > Makefile | 1 - > scripts/Makefile.build | 10 +- > scripts/Makefile.lib | 2 +- > 6 files changed, 7 insertions(+), 11 deletions(-) > > diff --git a/.gitignore b/.gitignore > index 4bb60f0fa23b..7587ef56b92d 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -22,7 +22,6 @@ > *.elf > *.gcno > *.gz > -*.hdrtest.c > *.i > *.ko > *.lex.c > diff --git a/Documentation/dontdiff b/Documentation/dontdiff > index 554dfe4883d2..5eba889ea84d 100644 > --- a/Documentation/dontdiff > +++ b/Documentation/dontdiff > @@ -19,7 +19,6 @@ > *.grep > *.grp > *.gz > -*.hdrtest.c > *.html > *.i > *.jpeg > diff --git a/Documentation/kbuild/makefiles.txt > b/Documentation/kbuild/makefiles.txt > index ca4b24ec0399..5080fec34609 100644 > --- a/Documentation/kbuild/makefiles.txt > +++ b/Documentation/kbuild/makefiles.txt > @@ -1023,8 +1023,7 @@ When kbuild executes, the following steps are followed > (roughly): > header-test-y specifies headers (*.h) in the current directory that > should be compile tested to ensure they are self-contained, > i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled, > - this autogenerates dummy sources to include the headers, and builds them > - as part of extra-y. > + this builds them as part of extra-y. > > --- 6.7 Commands useful for building a boot image > > diff --git a/Makefile b/Makefile > index f23516980796..7f293b0431fe 100644 > --- a/Makefile > +++ b/Makefile > @@ -1648,7 +1648,6 @@ clean: $(clean-dirs) > -o -name '*.dwo' -o -name '*.lst' \ > -o -name '*.su' \ > -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \ > - -o -name '*.hdrtest.c' \ > -o -name '*.lex.c' -o -name '*.tab.[ch]' \ > -o -name '*.asn1.[ch]' \ > -o -name '*.symtypes' -o -name 'modules.order' \ > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index ee0319560513..776842b7e6a3 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -294,14 +294,14 @@ quiet_cmd_cc_lst_c = MKLST $@ > $(obj)/%.lst: $(src)/%.c FORCE > $(call if_changed_dep,cc_lst_c) > > -# Dummy C sources for header test (header-test-y target) > +# header test (header-test-y target) > # --- > > -quiet_cmd_header_test = HDRTEST $@ > - cmd_header_test = echo "\#include \"$*.h\"" > $@ > +quiet_cmd_cc_s_h = CC $@ > + cmd_cc_s_h = $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $< I think I'd prefer HDRTEST or something more informative than just CC in the quiet cmd to distinguish this from the usual build lines. Especially now that the file name is also just .h.s. Other than that, good job getting rid of the intermediate file. I couldn't figure it out when I did the original. Acked-by: Jani Nikula Tested-by: Jani Nikula > > -$(obj)/%.hdrtest.c: > -
Re: [PATCH v2 3/4] kbuild: support header-test-pattern-y
On Thu, 27 Jun 2019, Masahiro Yamada wrote: > In my view, most of headers can be self-contained. So, it would be > tedious to add every header to header-test-y explicitly. We usually > end up with "all headers with some exceptions". > > There are two types in exceptions: > > [1] headers that are never compiled as standalone units > > For examples, include/linux/compiler-gcc.h is not intended to be > included directly. We should always exclude such ones. > > [2] headers that are conditionally compiled as standalone units > > Some headers can be compiled only for particular architectures. > For example, include/linux/arm-cci.h can be compiled only for > arm/arm64 because it requires to exist. > Clang can compile include/soc/nps/mtm.h only for arc because > it contains an arch-specific register in inline assembler. > > For [2], we can write Makefile like this: > > header-test-$(CONFIG_ARM) += linux/arm-cci.h > > The new syntax header-test-pattern-y will be useful to specify > "the rest". > > The typical usage is like this: > > header-test-pattern-y += */*.h > > This adds all the headers in sub-directories to the test coverage, > but headers added to header-test- are excluded. In this regards, > header-test-pattern-y behaves like a weaker variant of header-test-y. > > Caveat: > The patterns in header-test-pattern-y are prefixed with $(srctree)/$(src)/ > but not $(objtree)/$(obj)/. Stale generated patterns are often left over. > For example, you will have ones when you traverse the git history for > 'git bisect' without cleaning. If a wildcard is used for generated > headers, it may match to stale headers. > > If you really want to compile-test generated headers, I recommend to > add them to header-test-y explicitly. One pitfall is $(srctree)/$(src)/ > and $(objtree)/$(obj)/ point to the same directory for in-tree building. > So, header-test-pattern-y should be used with care. It can potentially > match to generated headers, which may be stale and fail to compile. > > Caveat2: > You could use wildcard for header-test-. For example, > > header-test- += asm-generic/% > > ... will exclude headers in asm-generic directory. Unfortunately, the > wildcard character is '%' instead of '*' because this is evaluated by > $(filter-out ...) whereas header-test-pattern-y is evaluated by > $(wildcard ...). This is a kludge, but seems useful in some places... > > Signed-off-by: Masahiro Yamada Awesome! This will let us get rid of our local $(wildcard) hacks. Tested-by: Jani Nikula > --- > > Changes in v2: > - New patch > > Documentation/kbuild/makefiles.txt | 10 ++ > scripts/Makefile.lib | 10 ++ > 2 files changed, 20 insertions(+) > > diff --git a/Documentation/kbuild/makefiles.txt > b/Documentation/kbuild/makefiles.txt > index 5080fec34609..b817e6cefb77 100644 > --- a/Documentation/kbuild/makefiles.txt > +++ b/Documentation/kbuild/makefiles.txt > @@ -1025,6 +1025,16 @@ When kbuild executes, the following steps are followed > (roughly): > i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled, > this builds them as part of extra-y. > > +header-test-pattern-y > + > + This works as a weaker version of header-test-y, and accepts wildcard > + patterns. The typical usage is: > + > + header-test-pattern-y += *.h > + > + This specifies all the files that matches to '*.h' in the current > + directory, but the files in 'header-test-' are excluded. > + > --- 6.7 Commands useful for building a boot image > > Kbuild provides a few macros that are useful when building a > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib > index 55ae1ec65342..5933bbab 100644 > --- a/scripts/Makefile.lib > +++ b/scripts/Makefile.lib > @@ -67,6 +67,16 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, > $(dtb-)) > endif > > # Test self-contained headers > + > +# Wildcard searches in $(srctree)/$(src)/, but not in $(objtree)/$(obj)/. > +# Stale generated headers are often left over, so wildcard matching should > +# be avoided. Please notice $(srctree)/$(src)/ and $(objtree)/$(obj) point > +# to the same location for in-tree building. > +header-test-y+= $(filter-out $(header-test-), \ > + $(patsubst $(srctree)/$(src)/%, %, \ > + $(wildcard $(addprefix $(srctree)/$(src)/, \ > + $(header-test-pattern-y) > + > extra-$(CONFIG_HEADER_TEST) += $(addsuffix .s, $(header-test-y)) > > # Add subdir path -- Jani Nikula, Intel Open Source Graphics Center
Re: [PATCH v2 0/4] Compile-test UAPI and kernel headers
On Thu, 27 Jun 2019, Masahiro Yamada wrote: > 1/4: reworked v2. > > 2/4: fix a flaw I noticed when I was working on this series > > 3/4: maybe useful for 4/4 and in some other places > > 4/4: v2. compile as many headers as possible. > > > Changes in v2: > - Add CONFIG_CPU_{BIG,LITTLE}_ENDIAN guard to avoid build error > - Use 'header-test-' instead of 'no-header-test' > - Avoid weird 'find' warning when cleaning > - New patch > - New patch > - Add everything to test coverage, and exclude broken ones > - Rename 'Makefile' to 'Kbuild' > - Add CONFIG_KERNEL_HEADER_TEST option > > Masahiro Yamada (4): > kbuild: compile-test UAPI headers to ensure they are self-contained > kbuild: do not create wrappers for header-test-y > kbuild: support header-test-pattern-y > kbuild: compile-test kernel headers to ensure they are self-contained [responding here because I didn't receive the actual patch] This looks like it's doing what it's supposed to, but I ran into a bunch of build fails with CONFIG_OF=n. Sent a fix to one [1], but stopped at the next. Looks like you'll have to exclude more. And I'm pretty sure we'll uncover more configurations where this will fail. But I do applaud the goal, and I'm committed to making all include/drm headers self-contained. I wouldn't block this based on the issues, it's pretty much the only way to expose them and get them fixed/excluded, and it's behind a config knob after all. With the caveat that I didn't finish the build, but OTOH tested the rainy day scenario and had the patch find issues it's meant to find: Tested-by: Jani Nikula [1] http://patchwork.freedesktop.org/patch/msgid/20190627110103.7539-1-jani.nik...@intel.com > > .gitignore |1 - > Documentation/dontdiff |1 - > Documentation/kbuild/makefiles.txt | 13 +- > Makefile |4 +- > include/Kbuild | 1134 > init/Kconfig | 22 + > scripts/Makefile.build | 10 +- > scripts/Makefile.lib | 12 +- > scripts/cc-system-headers.sh |8 + > usr/.gitignore |1 - > usr/Makefile |2 + > usr/include/.gitignore |3 + > usr/include/Makefile | 133 > 13 files changed, 1331 insertions(+), 13 deletions(-) > create mode 100644 include/Kbuild > create mode 100755 scripts/cc-system-headers.sh > create mode 100644 usr/include/.gitignore > create mode 100644 usr/include/Makefile -- Jani Nikula, Intel Open Source Graphics Center
Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote: > > You've used so much on this so shouldn't this have that somewhat new > > co-developed-by tag? I'm also wondering can this work at all > > Honestly, I've just been massaging this patch more than "authoring" it. > If you feel strongly about it feel free to add a Co-authored patch with > my name, but in my mind this is just Thiru's work. This is just my subjective view but writing code is easier than making it work in the mainline in 99% of cases. If this patch was doing something revolutional, lets say a new outstanding scheduling algorithm, then I would think otherwise. It is not. You without question deserve both credit and also the blame (if this breaks everything) :-) > > process-wise if the original author of the patch is also the only tester > > of the patch? > > There's not much we can do about this... Linaro folks have tested this > without the fTPM firmware, so at the very least it won't explode for > everyone. If for some reason non-microsoft folks see issues then we can > submit patches on top to fix this, we're not just throwing this at you > and running away. So why any of those Linaro folks can't do it? I can add after tested-by tag parentheses something explaining that context of testing. It is reasonable given the circumstances. I can also give an explanation in my next PR along the lines what you are saying. This would definitely work for me. /Jarkko
Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
On Thu, 2019-06-27 at 16:17 +0300, Jarkko Sakkinen wrote: > On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote: > > > You've used so much on this so shouldn't this have that somewhat new > > > co-developed-by tag? I'm also wondering can this work at all > > > > Honestly, I've just been massaging this patch more than "authoring" it. > > If you feel strongly about it feel free to add a Co-authored patch with > > my name, but in my mind this is just Thiru's work. > > This is just my subjective view but writing code is easier than making > it work in the mainline in 99% of cases. If this patch was doing > something revolutional, lets say a new outstanding scheduling algorithm, > then I would think otherwise. It is not. You without question deserve > both credit and also the blame (if this breaks everything) :-) Not like I'm putting the pressure on this. You make the call with the tag. Put it if you wwant. I'm cool with either. /Jarkko
Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
Hi Jarkko, > On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote: > > > You've used so much on this so shouldn't this have that somewhat new > > > co-developed-by tag? I'm also wondering can this work at all > > > > Honestly, I've just been massaging this patch more than "authoring" it. > > If you feel strongly about it feel free to add a Co-authored patch with > > my name, but in my mind this is just Thiru's work. > > This is just my subjective view but writing code is easier than making > it work in the mainline in 99% of cases. If this patch was doing > something revolutional, lets say a new outstanding scheduling algorithm, > then I would think otherwise. It is not. You without question deserve > both credit and also the blame (if this breaks everything) :-) > > > > process-wise if the original author of the patch is also the only tester > > > of the patch? > > > > There's not much we can do about this... Linaro folks have tested this > > without the fTPM firmware, so at the very least it won't explode for > > everyone. If for some reason non-microsoft folks see issues then we can > > submit patches on top to fix this, we're not just throwing this at you > > and running away. > > So why any of those Linaro folks can't do it? I can add after tested-by > tag parentheses something explaining that context of testing. It is > reasonable given the circumstances. There's 2 teams from Microsoft trying to do this [1]. We tested the previous implementation (which problems on probing as built-in). We had to change some stuff in the OP-TEE fTPM implementation [2] and test it in QEMU. What i quickly did with this module was to replace the kernel of the previous build with the new one. Unfortunately i couldn't get it to work, but i don't know if it's the module or the changes in the fTPM OP-TEE part. Since you have tested it my guess is that it has something to do with the OP-TEE part. I don't have any objections in this going in. On the contrary i think the functionality is really useful. I don't have hardware to test this at the moment, but once i get it, i'll give it a spin. The part i tested is that the probing works as expected when no fTPM implementation is running on secure world. Since it has been tested and doesn't break anything we can always fix corner, cases afterwards with more extensive testing [1] https://github.com/ms-iot/linux/blob/ms-optee-ocalls-merge/drivers/char/tpm/tpm_ftpm_optee.c [2] https://github.com/jbech-linaro/manifest/blob/ftpm/README.md Thanks /Ilias > > I can also give an explanation in my next PR along the lines what you > are saying. This would definitely work for me. > > /Jarkko >
[PATCH] docs: format kernel-parameters -- as code
The current ReStructuredText formatting results in "--", used to indicate the end of the kernel command-line parameters, appearing as an en-dash instead of two hyphens; this patch formats them as code, "``--``", as done elsewhere in the documentation. Signed-off-by: Stephen Kitt --- Documentation/admin-guide/kernel-parameters.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 0124980dca2d..b8d479b76648 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -9,11 +9,11 @@ and sorted into English Dictionary order (defined as ignoring all punctuation and sorting digits before letters in a case insensitive manner), and with descriptions where known. -The kernel parses parameters from the kernel command line up to "--"; +The kernel parses parameters from the kernel command line up to "``--``"; if it doesn't recognize a parameter and it doesn't contain a '.', the parameter gets passed to init: parameters with '=' go into init's environment, others are passed as command line arguments to init. -Everything after "--" is passed as an argument to init. +Everything after "``--``" is passed as an argument to init. Module parameters can be specified in two ways: via the kernel command line with a module name prefix, or via modprobe, e.g.:: -- 2.11.0
Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST
On Wed, 26 Jun 2019 15:07:01 -0500 Jiunn Chang wrote: > RCU basic concepts reST markup. > > Signed-off-by: Jiunn Chang > Reviewed-by: Joel Fernandes (Google) So this is a little detail but ... your signoff should be the last thing in the set of tags on the patch. This isn't worth making you do yet another revision, so I went ahead and applied the patches and fixed the tag ordering on the way in. I'll also append a patch adding the new RCU stuff into the core-api manual so people can actually get to it. Thanks, jon
Re: [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all()
On Mon 24-06-19 13:42:18, Waiman Long wrote: > Add a memcg_iterate_all() function for iterating all the available > memory cgroups and call the given callback function for each of the > memory cgruops. Why is a trivial wrapper any better than open coded usage of the iterator? > Signed-off-by: Waiman Long > --- > include/linux/memcontrol.h | 3 +++ > mm/memcontrol.c| 13 + > 2 files changed, 16 insertions(+) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 1dcb763bb610..0e31418e5a47 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1268,6 +1268,9 @@ static inline bool > mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) > struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); > void memcg_kmem_put_cache(struct kmem_cache *cachep); > > +extern void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg, > +void *arg), void *arg); > + > #ifdef CONFIG_MEMCG_KMEM > int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order); > void __memcg_kmem_uncharge(struct page *page, int order); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ba9138a4a1de..c1c4706f7696 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -443,6 +443,19 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup > *memcg) > static void memcg_free_shrinker_maps(struct mem_cgroup *memcg) { } > #endif /* CONFIG_MEMCG_KMEM */ > > +/* > + * Iterate all the memory cgroups and call the given callback function > + * for each of the memory cgroups. > + */ > +void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg, void *arg), > +void *arg) > +{ > + struct mem_cgroup *memcg; > + > + for_each_mem_cgroup(memcg) > + callback(memcg, arg); > +} > + > /** > * mem_cgroup_css_from_page - css of the memcg associated with a page > * @page: page of interest > -- > 2.18.1 -- Michal Hocko SUSE Labs
Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST
On Thu, 27 Jun 2019 08:34:43 -0600 Jonathan Corbet wrote: > On Wed, 26 Jun 2019 15:07:01 -0500 > Jiunn Chang wrote: > > > RCU basic concepts reST markup. > > > > Signed-off-by: Jiunn Chang > > Reviewed-by: Joel Fernandes (Google) > > So this is a little detail but ... your signoff should be the last thing > in the set of tags on the patch. Note, I've been seeing this a lot lately, and then noticed, that when I downloaded a patch directly from patchwork, it placed all the Reviewed-by and Acked-by tags after the original Signed-off-by. I checked the original patch on the mailing list, and it had no other tags but the Signed-off-by. I then pulled one of my own patches, and it did it to that patch as well. I too prefer the Signed-off-by be last, but our tooling needs to do this as well, otherwise it's a failure in our procedures. -- Steve
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On Mon 24-06-19 13:42:19, Waiman Long wrote: > With the slub memory allocator, the numbers of active slab objects > reported in /proc/slabinfo are not real because they include objects > that are held by the per-cpu slab structures whether they are actually > used or not. The problem gets worse the more CPUs a system have. For > instance, looking at the reported number of active task_struct objects, > one will wonder where all the missing tasks gone. > > I know it is hard and costly to get a real count of active objects. What exactly is expensive? Why cannot slabinfo reduce the number of active objects by per-cpu cached objects? > So > I am not advocating for that. Instead, this patch extends the > /proc/sys/vm/drop_caches sysctl parameter by using a new bit (bit 3) > to shrink all the kmem slabs which will flush out all the slabs in the > per-cpu structures and give a more accurate view of how much memory are > really used up by the active slab objects. This is a costly operation, > of course, but it gives a way to have a clearer picture of the actual > number of slab objects used, if the need arises. drop_caches is a terrible interface. It destroys all the caching and people are just too easy in using it to solve any kind of problem they think they might have and cause others they might not see immediately. I am strongly discouraging anybody - except for some tests which really do want to see reproducible results without cache effects - from using this interface and therefore I am not really happy to paper over something that might be a real problem with yet another mode. If SLUB indeed caches too aggressively on large machines then this should be fixed. -- Michal Hocko SUSE Labs
Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote: > On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote: > > From: Ganapatrao Kulkarni > > > > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU. > > > > Signed-off-by: Ganapatrao Kulkarni > > --- > > Documentation/perf/thunderx2-pmu.txt | 20 +++- > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > diff --git a/Documentation/perf/thunderx2-pmu.txt > > b/Documentation/perf/thunderx2-pmu.txt > > index dffc57143736..62243230abc3 100644 > > --- a/Documentation/perf/thunderx2-pmu.txt > > +++ b/Documentation/perf/thunderx2-pmu.txt > > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU > > UNCORE) > > = > > > > The ThunderX2 SoC PMU consists of independent, system-wide, per-socket > > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC). > > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and > > +Cavium Coherent Processor Interconnect (CCPI2). > > > > The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles. > > Events are counted for the default channel (i.e. channel 0) and prorated > > to the total number of channels/tiles. > > > > -The DMC and L3C support up to 4 counters. Counters are independently > > -programmable and can be started and stopped individually. Each counter > > -can be set to a different event. Counters are 32-bit and do not support > > -an overflow interrupt; they are read every 2 seconds. > > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters. > > The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8 > counters. > > > +Counters are independently programmable and can be started and stopped > > +individually. Each counter can be set to a different event. DMC and L3C > > +Counters are 32-bit and do not support an overflow interrupt; they are read > > Counters -> counters > > > +every 2 seconds. CCPI2 counters are 64-bit. > > Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these > two sentences as: > > None of the counters support an overflow interrupt and therefore sampling > events are unsupported. The DMC and L3C counters are 32-bit and read every > 2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in > normal operation. Mark reminded me that these are system PMUs and therefore sampling is unsupported irrespective of the presence of an overflow interrupt, so you can drop that part from the text. Sorry for the confusion, Will
Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST
On Thu, Jun 27, 2019 at 08:34:43AM -0600, Jonathan Corbet wrote: > On Wed, 26 Jun 2019 15:07:01 -0500 > Jiunn Chang wrote: > > > RCU basic concepts reST markup. > > > > Signed-off-by: Jiunn Chang > > Reviewed-by: Joel Fernandes (Google) > > So this is a little detail but ... your signoff should be the last thing > in the set of tags on the patch. > > This isn't worth making you do yet another revision, so I went ahead and > applied the patches and fixed the tag ordering on the way in. I'll also > append a patch adding the new RCU stuff into the core-api manual so people > can actually get to it. Please feel free to add my ack: Acked-by: Paul E. McKenney Thanx, Paul
Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
On Thu, 2019-06-27 at 16:30 +0300, Ilias Apalodimas wrote: > is really useful. I don't have hardware to test this at the moment, but once i > get it, i'll give it a spin. Thank you for responding, really appreciate it. Please note, however, that I already did my v5.3 PR so there is a lot of time to give it a spin. In all cases, we will find a way to put this to my v5.4 PR. I don't see any reason why not. As soon as the cosmetic stuff is fixed that I remarked in v7 I'm ready to take this to my tree and after that soonish make it available on linux-next. /Jarkko
[PATCH v3 3/4] kbuild: support header-test-pattern-y
In my view, most of headers can be self-contained. So, it would be tedious to add every header to header-test-y explicitly. We usually end up with "all headers with some exceptions". There are two types in exceptions: [1] headers that are never compiled as standalone units For examples, include/linux/compiler-gcc.h is not intended for direct inclusion. We should always exclude such ones. [2] headers that are conditionally compiled as standalone units Some headers can be compiled only for particular architectures. For example, include/linux/arm-cci.h can be compiled only for arm/arm64 because it requires to exist. Clang can compile include/soc/nps/mtm.h only for arc because it contains an arch-specific register in inline assembler. So, you can write Makefile like this: header-test-+= linux/compiler-gcc.h header-test-$(CONFIG_ARM) += linux/arm-cci.h header-test-$(CONFIG_ARM64) += linux/arm-cci.h header-test-$(CONFIG_ARC) += soc/nps/mtm.h The new syntax header-test-pattern-y will be useful to specify "the rest". The typical usage is like this: header-test-pattern-y += */*.h This will add all the headers in sub-directories to the test coverage, excluding $(header-test-). In this regards, header-test-pattern-y behaves like a weaker variant of header-test-y. Caveat: The patterns in header-test-pattern-y are prefixed with $(srctree)/$(src)/ but not $(objtree)/$(obj)/. Stale generated headers are often left over when you traverse the git history without cleaning. Wildcard patterns for $(objtree) may match to stale headers, which could fail to compile. One pitfall is $(srctree)/$(src)/ and $(objtree)/$(obj)/ point to the same directory for in-tree building. So, header-test-pattern-y should be used with care since it can potentially match to stale headers. Caveat2: You could use wildcard for header-test-. For example, header-test- += asm-generic/% ... will exclude headers in asm-generic directory. Unfortunately, the wildcard character is '%' instead of '*' here because this is evaluated by $(filter-out ...) whereas header-test-pattern-y is evaluated by $(wildcard ...). This is a kludge, but seems useful in some places... Signed-off-by: Masahiro Yamada Tested-by: Jani Nikula --- Changes in v3: None Changes in v2: - New patch Documentation/kbuild/makefiles.txt | 10 ++ scripts/Makefile.lib | 11 +++ 2 files changed, 21 insertions(+) diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 5080fec34609..b817e6cefb77 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -1025,6 +1025,16 @@ When kbuild executes, the following steps are followed (roughly): i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled, this builds them as part of extra-y. +header-test-pattern-y + + This works as a weaker version of header-test-y, and accepts wildcard + patterns. The typical usage is: + + header-test-pattern-y += *.h + + This specifies all the files that matches to '*.h' in the current + directory, but the files in 'header-test-' are excluded. + --- 6.7 Commands useful for building a boot image Kbuild provides a few macros that are useful when building a diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 55ae1ec65342..281864fcf0fe 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -67,6 +67,17 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, $(dtb-)) endif # Test self-contained headers + +# Wildcard searches in $(srctree)/$(src)/, but not in $(objtree)/$(obj)/. +# Stale generated headers are often left over, so pattern matching should +# be avoided. Please notice $(srctree)/$(src)/ and $(objtree)/$(obj) point +# to the same location for in-tree building. So, header-test-pattern-y should +# be used with care. +header-test-y += $(filter-out $(header-test-), \ + $(patsubst $(srctree)/$(src)/%, %, \ + $(wildcard $(addprefix $(srctree)/$(src)/, \ + $(header-test-pattern-y) + extra-$(CONFIG_HEADER_TEST) += $(addsuffix .s, $(header-test-y)) # Add subdir path -- 2.17.1
[PATCH v3 2/4] kbuild: do not create wrappers for header-test-y
header-test-y does not work with headers in sub-directories. For example, you may want to write a Makefile, like this: include/linux/Kbuild: header-test-y += mtd/nand.h This entry will create a wrapper include/linux/mtd/nand.hdrtest.c with the following content: #include "mtd/nand.h" To make this work, we need to add $(srctree)/include/linux to the header search path. It would be tedious to add ccflags-y. Instead, we could change the *.hdrtest.c rule to wrap: #include "nand.h" This works for in-tree build since #include "..." searches in the relative path from the header with this directive. For O=... build, we need to add $(srctree)/include/linux/mtd to the header search path, which will be even more tedious. After all, I thought it would be handier to compile headers directly without creating wrappers. I added a new build rule to compile %.h into %.h.s The target is %.h.s instead of %.h.o because it is slightly faster. Also, as for GCC, an empty assembly is smaller than an empty object. I wrote the build rule: $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $< instead of: $(CC) $(c_flags) -S -o $@ -x c $< Both work fine with GCC, but the latter is bad for Clang. This comes down to the difference in the -Wunused-function policy. GCC does not warn about unused 'static inline' functions at all. Clang does not warn about the ones in included headers, but does about the ones in the source. So, we should handle headers as headers, not as source files. In fact, this has been hidden since commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions"), but we should not rely on that. Signed-off-by: Masahiro Yamada Acked-by: Jani Nikula Tested-by: Jani Nikula --- Changes in v3: None Changes in v2: - New patch .gitignore | 1 - Documentation/dontdiff | 1 - Documentation/kbuild/makefiles.txt | 3 +-- Makefile | 1 - scripts/Makefile.build | 10 +- scripts/Makefile.lib | 2 +- 6 files changed, 7 insertions(+), 11 deletions(-) diff --git a/.gitignore b/.gitignore index 4bb60f0fa23b..7587ef56b92d 100644 --- a/.gitignore +++ b/.gitignore @@ -22,7 +22,6 @@ *.elf *.gcno *.gz -*.hdrtest.c *.i *.ko *.lex.c diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 554dfe4883d2..5eba889ea84d 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -19,7 +19,6 @@ *.grep *.grp *.gz -*.hdrtest.c *.html *.i *.jpeg diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index ca4b24ec0399..5080fec34609 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -1023,8 +1023,7 @@ When kbuild executes, the following steps are followed (roughly): header-test-y specifies headers (*.h) in the current directory that should be compile tested to ensure they are self-contained, i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled, - this autogenerates dummy sources to include the headers, and builds them - as part of extra-y. + this builds them as part of extra-y. --- 6.7 Commands useful for building a boot image diff --git a/Makefile b/Makefile index f23516980796..7f293b0431fe 100644 --- a/Makefile +++ b/Makefile @@ -1648,7 +1648,6 @@ clean: $(clean-dirs) -o -name '*.dwo' -o -name '*.lst' \ -o -name '*.su' \ -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \ - -o -name '*.hdrtest.c' \ -o -name '*.lex.c' -o -name '*.tab.[ch]' \ -o -name '*.asn1.[ch]' \ -o -name '*.symtypes' -o -name 'modules.order' \ diff --git a/scripts/Makefile.build b/scripts/Makefile.build index ee0319560513..776842b7e6a3 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -294,14 +294,14 @@ quiet_cmd_cc_lst_c = MKLST $@ $(obj)/%.lst: $(src)/%.c FORCE $(call if_changed_dep,cc_lst_c) -# Dummy C sources for header test (header-test-y target) +# header test (header-test-y target) # --- -quiet_cmd_header_test = HDRTEST $@ - cmd_header_test = echo "\#include \"$*.h\"" > $@ +quiet_cmd_cc_s_h = CC $@ + cmd_cc_s_h = $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $< -$(obj)/%.hdrtest.c: - $(call cmd,header_test) +$(obj)/%.h.s: $(src)/%.h FORCE + $(call if_changed_dep,cc_s_h) # Compile assembler sources (.S) # --- diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 3e630fcaffd1..55ae1ec65342 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -67,7 +67,7 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, $(dtb-)) endif # Test self-contained headers -extra-$(CONFIG_HEADER_TEST) += $(patsubst %.
[PATCH v3 0/4] Compile-test UAPI and kernel headers
1/4: Compile-test exported headers (reworked in v2) 2/4: fix a flaw I noticed when I was working on this series. Avoid generating intermediate wrappers. 3/4: maybe useful for 4/4 and in some other places. Add header-test-pattern-y syntax. 4/4: Compile-test kernel-space headers in include/. v2: compile as many headers as possible. v3: exclude more headers causing build errors Masahiro Yamada (4): kbuild: compile-test UAPI headers to ensure they are self-contained kbuild: do not create wrappers for header-test-y kbuild: support header-test-pattern-y kbuild: compile-test kernel headers to ensure they are self-contained .gitignore |1 - Documentation/dontdiff |1 - Documentation/kbuild/makefiles.txt | 13 +- Makefile |4 +- include/Kbuild | 1250 init/Kconfig | 22 + scripts/Makefile.build | 10 +- scripts/Makefile.lib | 13 +- scripts/cc-system-headers.sh |8 + usr/.gitignore |1 - usr/Makefile |2 + usr/include/.gitignore |3 + usr/include/Makefile | 134 +++ 13 files changed, 1449 insertions(+), 13 deletions(-) create mode 100644 include/Kbuild create mode 100755 scripts/cc-system-headers.sh create mode 100644 usr/include/.gitignore create mode 100644 usr/include/Makefile -- 2.17.1
Re: [PATCH v2 0/4] Compile-test UAPI and kernel headers
On Thu, Jun 27, 2019 at 8:36 PM Jani Nikula wrote: > > On Thu, 27 Jun 2019, Masahiro Yamada wrote: > > 1/4: reworked v2. > > > > 2/4: fix a flaw I noticed when I was working on this series > > > > 3/4: maybe useful for 4/4 and in some other places > > > > 4/4: v2. compile as many headers as possible. > > > > > > Changes in v2: > > - Add CONFIG_CPU_{BIG,LITTLE}_ENDIAN guard to avoid build error > > - Use 'header-test-' instead of 'no-header-test' > > - Avoid weird 'find' warning when cleaning > > - New patch > > - New patch > > - Add everything to test coverage, and exclude broken ones > > - Rename 'Makefile' to 'Kbuild' > > - Add CONFIG_KERNEL_HEADER_TEST option > > > > Masahiro Yamada (4): > > kbuild: compile-test UAPI headers to ensure they are self-contained > > kbuild: do not create wrappers for header-test-y > > kbuild: support header-test-pattern-y > > kbuild: compile-test kernel headers to ensure they are self-contained > > [responding here because I didn't receive the actual patch] > > This looks like it's doing what it's supposed to, but I ran into a bunch > of build fails with CONFIG_OF=n. Sent a fix to one [1], but stopped at > the next. Looks like you'll have to exclude more. And I'm pretty sure > we'll uncover more configurations where this will fail. Thanks for testing. I did more compile-tests, and excluded more headers in v3. Thanks. -- Best Regards Masahiro Yamada
Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST
On Thu, Jun 27, 2019 at 08:34:43AM -0600, Jonathan Corbet wrote: > On Wed, 26 Jun 2019 15:07:01 -0500 > Jiunn Chang wrote: > > > RCU basic concepts reST markup. > > > > Signed-off-by: Jiunn Chang > > Reviewed-by: Joel Fernandes (Google) > > So this is a little detail but ... your signoff should be the last thing > in the set of tags on the patch. > > This isn't worth making you do yet another revision, so I went ahead and > applied the patches and fixed the tag ordering on the way in. I'll also > append a patch adding the new RCU stuff into the core-api manual so people > can actually get to it. > > Thanks, > > jon Hello Jon, I will keep this in mind for next time. I would like to thank you, Joel, Paul and everyone else who has helped me learn the Linux kernel patch process. I will send a patch for the UP systems change Paul sent me for _bh suffix. THX, Jiunn
Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST
On 6/27/19 9:13 AM, Steven Rostedt wrote: On Thu, 27 Jun 2019 08:34:43 -0600 Jonathan Corbet wrote: On Wed, 26 Jun 2019 15:07:01 -0500 Jiunn Chang wrote: RCU basic concepts reST markup. Signed-off-by: Jiunn Chang Reviewed-by: Joel Fernandes (Google) So this is a little detail but ... your signoff should be the last thing in the set of tags on the patch. Note, I've been seeing this a lot lately, and then noticed, that when I downloaded a patch directly from patchwork, it placed all the Reviewed-by and Acked-by tags after the original Signed-off-by. I checked the original patch on the mailing list, and it had no other tags but the Signed-off-by. I then pulled one of my own patches, and it did it to that patch as well. I too prefer the Signed-off-by be last, but our tooling needs to do this as well, otherwise it's a failure in our procedures. Thanks Steve for pointing this out. I am seeing some odd behavior with tags. It appears some maintainers want the tags in chronological order, which is Reviewed-by after Signed-off which doesn't make sense to me. I prefer Signed-off-by last. I am working on FAQ (Frequently Answered Questions) section for mentees. I will add this to it. thanks, -- Shuah
[PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores. This has been root caused to threads being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated. The expiration of per-cpu slices was recently fixed by 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")'. Prior to that it appears that this has been broken since at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That added the following conditional which resulted in slices never being expired. if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; Because this was broken for nearly 5 years, and has recently been fixed and is now being noticed by many users running kubernetes (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion that the mechanisms around expiring runtime should be removed altogether. This allows quota already allocated to per-cpu run-queues to live longer than the period boundary. This allows threads on runqueues that do not use much CPU to continue to use their remaining slice over a longer period of time than cpu.cfs_period_us. However, this helps prevents the above condition of hitting throttling while also not fully utilizing your cpu quota. This theoretically allows a machine to use slightly more than its allotted quota in some periods. This overflow would be bounded by the remaining quota left on each per-cpu runqueueu. This is typically no more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will change nothing, as they should theoretically fully utilize all of their quota in each period. For user-interactive tasks as described above this provides a much better user/application experience as their cpu utilization will more closely match the amount they requested when they hit throttling. This means that cpu limits no longer strictly apply per period for non-cpu bound applications, but that they are still accurate over longer timeframes. This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions. That testcase is available at https://github.com/indeedeng/fibtest. Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition") Signed-off-by: Dave Chiluk --- Documentation/scheduler/sched-bwc.txt | 73 --- kernel/sched/fair.c | 71 +++--- kernel/sched/sched.h | 4 -- 3 files changed, 65 insertions(+), 83 deletions(-) diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt index f6b1873..4c19c94 100644 --- a/Documentation/scheduler/sched-bwc.txt +++ b/Documentation/scheduler/sched-bwc.txt @@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the specification of the maximum CPU bandwidth available to a group or hierarchy. The bandwidth allowed for a group is specified using a quota and period. Within -each given "period" (microseconds), a group is allowed to consume only up to -"quota" microseconds of CPU time. When the CPU bandwidth consumption of a -group exceeds this limit (for that period), the tasks belonging to its -hierarchy will be throttled and are not allowed to run again until the next -period. - -A group's unused runtime is globally tracked, being refreshed with quota units -above at each period boundary. As threads consume this bandwidth it is -transferred to cpu-local "silos" on a demand basis. The amount transferred +each given "period" (microseconds), a task group is allocated up to "quota" +microseconds of CPU time. That quota is assigned to per cpu run queues in +slices as threads in the cgroup become runnable. Once all quota has been +assigned any additional requests for quota will result in those threads being +throttled. Throttled threads will not be able to run again until the next +period when the quota is replenished. + +A group's unassigned quota is globally tracked, being refreshed back to +cfs_quota units at each period boundary. As threads consume this bandw
[PATCH v5 0/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
Changelog v5 - Based on this comment from Ben Segall's comment on v4 > If the cost of taking this global lock across all cpus without a > ratelimit was somehow not a problem, I'd much prefer to just set > min_cfs_rq_runtime = 0. (Assuming it is, I definitely prefer the "lie > and sorta have 2x period 2x runtime" solution of removing expiration) I'm resubmitting my v3 patchset, with the requested changes. - Updated Commit log given review comments - Update sched-bwc.txt give my new understanding of the slack timer. Changelog v4 - Rewrote patchset around the concept of returning all of runtime_remaining when cfs_b nears the end of available quota. Changelog v3 - Reworked documentation to better describe behavior of slice expiration per feedback from Peter Oskolkov Changelog v2 - Fixed some checkpatch errors in the commit message.
[PATCH v5 1/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores. This has been root caused to threads being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated. The expiration of per-cpu slices was recently fixed by 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")'. Prior to that it appears that this has been broken since at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That added the following conditional which resulted in slices never being expired. if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; Because this was broken for nearly 5 years, and has recently been fixed and is now being noticed by many users running kubernetes (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion that the mechanisms around expiring runtime should be removed altogether. This allows quota already allocated to per-cpu run-queues to live longer than the period boundary. This allows threads on runqueues that do not use much CPU to continue to use their remaining slice over a longer period of time than cpu.cfs_period_us. However, this helps prevents the above condition of hitting throttling while also not fully utilizing your cpu quota. This theoretically allows a machine to use slightly more than its allotted quota in some periods. This overflow would be bounded by the remaining quota left on each per-cpu runqueueu. This is typically no more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will change nothing, as they should theoretically fully utilize all of their quota in each period. For user-interactive tasks as described above this provides a much better user/application experience as their cpu utilization will more closely match the amount they requested when they hit throttling. This means that cpu limits no longer strictly apply per period for non-cpu bound applications, but that they are still accurate over longer timeframes. This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions. That testcase is available at https://github.com/indeedeng/fibtest. Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition") Signed-off-by: Dave Chiluk --- Documentation/scheduler/sched-bwc.txt | 73 --- kernel/sched/fair.c | 71 +++--- kernel/sched/sched.h | 4 -- 3 files changed, 65 insertions(+), 83 deletions(-) diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt index f6b1873..4c19c94 100644 --- a/Documentation/scheduler/sched-bwc.txt +++ b/Documentation/scheduler/sched-bwc.txt @@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the specification of the maximum CPU bandwidth available to a group or hierarchy. The bandwidth allowed for a group is specified using a quota and period. Within -each given "period" (microseconds), a group is allowed to consume only up to -"quota" microseconds of CPU time. When the CPU bandwidth consumption of a -group exceeds this limit (for that period), the tasks belonging to its -hierarchy will be throttled and are not allowed to run again until the next -period. - -A group's unused runtime is globally tracked, being refreshed with quota units -above at each period boundary. As threads consume this bandwidth it is -transferred to cpu-local "silos" on a demand basis. The amount transferred +each given "period" (microseconds), a task group is allocated up to "quota" +microseconds of CPU time. That quota is assigned to per cpu run queues in +slices as threads in the cgroup become runnable. Once all quota has been +assigned any additional requests for quota will result in those threads being +throttled. Throttled threads will not be able to run again until the next +period when the quota is replenished. + +A group's unassigned quota is globally tracked, being refreshed back to +cfs_quota units at each period boundary. As threads consume this bandw
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On 6/26/19 4:19 PM, Roman Gushchin wrote: >> >> +#ifdef CONFIG_MEMCG_KMEM >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg, >> +void __maybe_unused *arg) >> +{ >> +struct kmem_cache *s; >> + >> +if (memcg == root_mem_cgroup) >> +return; >> +mutex_lock(&slab_mutex); >> +list_for_each_entry(s, &memcg->kmem_caches, >> +memcg_params.kmem_caches_node) { >> +kmem_cache_shrink(s); >> +} >> +mutex_unlock(&slab_mutex); >> +cond_resched(); >> +} > A couple of questions: > 1) how about skipping already offlined kmem_caches? They are already shrunk, >so you probably won't get much out of them. Or isn't it true? I have been thinking about that. This patch is based on the linux tree and so don't have an easy to find out if the kmem caches have been shrinked. Rebasing this on top of linux-next, I can use the SLAB_DEACTIVATED flag as a marker for skipping the shrink. With all the latest patches, I am still seeing 121 out of a total of 726 memcg kmem caches (1/6) that are deactivated caches after system bootup one of the test systems. My system is still using cgroup v1 and so the number may be different in a v2 setup. The next step is probably to figure out why those deactivated caches are still there. > 2) what's your long-term vision here? do you think that we need to shrink >kmem_caches periodically, depending on memory pressure? how a user >will use this new sysctl? Shrinking the kmem caches under extreme memory pressure can be one way to free up extra pages, but the effect will probably be temporary. > What's the problem you're trying to solve in general? At least for the slub allocator, shrinking the caches allow the number of active objects reported in slabinfo to be more accurate. In addition, this allow to know the real slab memory consumption. I have been working on a BZ about continuous memory leaks with a container based workloads. The ability to shrink caches allow us to get a more accurate memory consumption picture. Another alternative is to turn on slub_debug which will then disables all the per-cpu slabs. Anyway, I think this can be useful to others that is why I posted the patch. Cheers, Longman
[Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
The UP.rst file calls for locks acquired within RCU callback functions to use _irq variants (spin_lock_irqsave() or similar), which does work, but can be overkill. This commit therefore instead calls for _bh variants (spin_lock_bh() or similar), while noting that _irq does work. Signed-off-by: Paul E. McKenney Signed-off-by: Jiunn Chang --- Documentation/RCU/UP.rst | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst index 67715a47ae89..e26dda27430c 100644 --- a/Documentation/RCU/UP.rst +++ b/Documentation/RCU/UP.rst @@ -113,12 +113,13 @@ Answer to Quick Quiz #1: Answer to Quick Quiz #2: What locking restriction must RCU callbacks respect? - Any lock that is acquired within an RCU callback must be - acquired elsewhere using an _irq variant of the spinlock - primitive. For example, if "mylock" is acquired by an - RCU callback, then a process-context acquisition of this - lock must use something like spin_lock_irqsave() to - acquire the lock. + Any lock that is acquired within an RCU callback must be acquired + elsewhere using an _bh variant of the spinlock primitive. + For example, if "mylock" is acquired by an RCU callback, then + a process-context acquisition of this lock must use something + like spin_lock_bh() to acquire the lock. Please note that + it is also OK to use _irq variants of spinlocks, for example, + spin_lock_irqsave(). If the process-context code were to simply use spin_lock(), then, since RCU callbacks can be invoked from softirq context, -- 2.22.0
Re: [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all()
On 6/27/19 11:07 AM, Michal Hocko wrote: > On Mon 24-06-19 13:42:18, Waiman Long wrote: >> Add a memcg_iterate_all() function for iterating all the available >> memory cgroups and call the given callback function for each of the >> memory cgruops. > Why is a trivial wrapper any better than open coded usage of the > iterator? Because the iterator is only defined within memcontrol.c. So an alternative may be to put the iterator into a header file that can be used by others. Will take a look at that. Cheers, Longman
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On 6/27/19 11:15 AM, Michal Hocko wrote: > On Mon 24-06-19 13:42:19, Waiman Long wrote: >> With the slub memory allocator, the numbers of active slab objects >> reported in /proc/slabinfo are not real because they include objects >> that are held by the per-cpu slab structures whether they are actually >> used or not. The problem gets worse the more CPUs a system have. For >> instance, looking at the reported number of active task_struct objects, >> one will wonder where all the missing tasks gone. >> >> I know it is hard and costly to get a real count of active objects. > What exactly is expensive? Why cannot slabinfo reduce the number of > active objects by per-cpu cached objects? > The number of cachelines that needs to be accessed in order to get an accurate count will be much higher if we need to iterate through all the per-cpu structures. In addition, accessing the per-cpu partial list will be racy. >> So >> I am not advocating for that. Instead, this patch extends the >> /proc/sys/vm/drop_caches sysctl parameter by using a new bit (bit 3) >> to shrink all the kmem slabs which will flush out all the slabs in the >> per-cpu structures and give a more accurate view of how much memory are >> really used up by the active slab objects. This is a costly operation, >> of course, but it gives a way to have a clearer picture of the actual >> number of slab objects used, if the need arises. > drop_caches is a terrible interface. It destroys all the caching and > people are just too easy in using it to solve any kind of problem they > think they might have and cause others they might not see immediately. > I am strongly discouraging anybody - except for some tests which really > do want to see reproducible results without cache effects - from using > this interface and therefore I am not really happy to paper over > something that might be a real problem with yet another mode. If SLUB > indeed caches too aggressively on large machines then this should be > fixed. > OK, as explained in another thread, the main reason for doing this patch is to be able to do more accurate measurement of changes in kmem cache memory consumption. Yes, I do agree that drop_caches is not a general purpose interface that should be used lightly. Cheers, Longman
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote: > On 6/26/19 4:19 PM, Roman Gushchin wrote: > >> > >> +#ifdef CONFIG_MEMCG_KMEM > >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg, > >> + void __maybe_unused *arg) > >> +{ > >> + struct kmem_cache *s; > >> + > >> + if (memcg == root_mem_cgroup) > >> + return; > >> + mutex_lock(&slab_mutex); > >> + list_for_each_entry(s, &memcg->kmem_caches, > >> + memcg_params.kmem_caches_node) { > >> + kmem_cache_shrink(s); > >> + } > >> + mutex_unlock(&slab_mutex); > >> + cond_resched(); > >> +} > > A couple of questions: > > 1) how about skipping already offlined kmem_caches? They are already shrunk, > >so you probably won't get much out of them. Or isn't it true? > > I have been thinking about that. This patch is based on the linux tree > and so don't have an easy to find out if the kmem caches have been > shrinked. Rebasing this on top of linux-next, I can use the > SLAB_DEACTIVATED flag as a marker for skipping the shrink. > > With all the latest patches, I am still seeing 121 out of a total of 726 > memcg kmem caches (1/6) that are deactivated caches after system bootup > one of the test systems. My system is still using cgroup v1 and so the > number may be different in a v2 setup. The next step is probably to > figure out why those deactivated caches are still there. It's not a secret: these kmem_caches are holding objects, which are in use. It's a drawback of the current slab accounting implementation: every object holds a whole page and the corresponding kmem_cache. It's optimized for a large number of objects, which are created and destroyed within the life of the cgroup (e.g. task_structs), and it works worse for long-living objects like vfs cache. Long-term I think we need a different implementation for long-living objects, so that objects belonging to different memory cgroups can share the same page and kmem_caches. It's a fairly big change though. > > > 2) what's your long-term vision here? do you think that we need to shrink > >kmem_caches periodically, depending on memory pressure? how a user > >will use this new sysctl? > Shrinking the kmem caches under extreme memory pressure can be one way > to free up extra pages, but the effect will probably be temporary. > > What's the problem you're trying to solve in general? > > At least for the slub allocator, shrinking the caches allow the number > of active objects reported in slabinfo to be more accurate. In addition, > this allow to know the real slab memory consumption. I have been working > on a BZ about continuous memory leaks with a container based workloads. > The ability to shrink caches allow us to get a more accurate memory > consumption picture. Another alternative is to turn on slub_debug which > will then disables all the per-cpu slabs. I see... I agree with Michal here, that extending drop_caches sysctl isn't the best idea. Isn't it possible to achieve the same effect using slub sysfs? Thanks!
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote: > On 6/26/19 4:19 PM, Roman Gushchin wrote: > >> > >> +#ifdef CONFIG_MEMCG_KMEM > >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg, > >> + void __maybe_unused *arg) > >> +{ > >> + struct kmem_cache *s; > >> + > >> + if (memcg == root_mem_cgroup) > >> + return; > >> + mutex_lock(&slab_mutex); > >> + list_for_each_entry(s, &memcg->kmem_caches, > >> + memcg_params.kmem_caches_node) { > >> + kmem_cache_shrink(s); > >> + } > >> + mutex_unlock(&slab_mutex); > >> + cond_resched(); > >> +} > > A couple of questions: > > 1) how about skipping already offlined kmem_caches? They are already shrunk, > >so you probably won't get much out of them. Or isn't it true? > > I have been thinking about that. This patch is based on the linux tree > and so don't have an easy to find out if the kmem caches have been > shrinked. Rebasing this on top of linux-next, I can use the > SLAB_DEACTIVATED flag as a marker for skipping the shrink. > > With all the latest patches, I am still seeing 121 out of a total of 726 > memcg kmem caches (1/6) that are deactivated caches after system bootup > one of the test systems. My system is still using cgroup v1 and so the > number may be different in a v2 setup. The next step is probably to > figure out why those deactivated caches are still there. > > > 2) what's your long-term vision here? do you think that we need to shrink > >kmem_caches periodically, depending on memory pressure? how a user > >will use this new sysctl? > Shrinking the kmem caches under extreme memory pressure can be one way > to free up extra pages, but the effect will probably be temporary. > > What's the problem you're trying to solve in general? > > At least for the slub allocator, shrinking the caches allow the number > of active objects reported in slabinfo to be more accurate. In addition, > this allow to know the real slab memory consumption. I have been working > on a BZ about continuous memory leaks with a container based workloads. So.. this is still a work around? > The ability to shrink caches allow us to get a more accurate memory > consumption picture. Another alternative is to turn on slub_debug which > will then disables all the per-cpu slabs. So this is a debugging mechanism? > Anyway, I think this can be useful to others that is why I posted the patch. Since this is debug stuff, please add this to /proc/sys/debug/ instead. That would reflect the intention, and would avoid the concern that folks in production would use these things. Since we only have 2 users of /proc/sys/debug/ I am now wondering if would be best to add a new sysctl debug taint flag. This way bug reports with these stupid knobs can got to /dev/null inbox for bug reports. Masami, /proc/sys/debug/kprobes-optimization is debug. Would you be OK to add the taint for it too? Masoud, /proc/sys/debug/exception-trace seems to actually be enabled by default, and its goal seems to be to enable disabling it. So I don't think it would make sense to taint there. So.. maybe we need something /proc/sys/taints/ or /proc/sys/debug/taints/ so its *very* clear this is by no way ever expected to be used in production. May even be good to long term add a symlink for vm/drop_caches there as well? Luis
Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
On 6/27/19 5:24 PM, Roman Gushchin wrote: >>> 2) what's your long-term vision here? do you think that we need to shrink >>>kmem_caches periodically, depending on memory pressure? how a user >>>will use this new sysctl? >> Shrinking the kmem caches under extreme memory pressure can be one way >> to free up extra pages, but the effect will probably be temporary. >>> What's the problem you're trying to solve in general? >> At least for the slub allocator, shrinking the caches allow the number >> of active objects reported in slabinfo to be more accurate. In addition, >> this allow to know the real slab memory consumption. I have been working >> on a BZ about continuous memory leaks with a container based workloads. >> The ability to shrink caches allow us to get a more accurate memory >> consumption picture. Another alternative is to turn on slub_debug which >> will then disables all the per-cpu slabs. > I see... I agree with Michal here, that extending drop_caches sysctl isn't > the best idea. Isn't it possible to achieve the same effect using slub sysfs? Yes, using the slub sysfs interface can be a possible alternative. Cheers, Longman
[tip:timers/core] hrtimer: Use a bullet for the returns bullet list
Commit-ID: 516337048fa40496ae5ca9863c367ec991a44d9a Gitweb: https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a Author: Mauro Carvalho Chehab AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300 Committer: Thomas Gleixner CommitDate: Thu, 27 Jun 2019 23:30:04 +0200 hrtimer: Use a bullet for the returns bullet list That gets rid of this warning: ./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent. and displays nicely both at the source code and at the produced documentation. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Thomas Gleixner Cc: Linux Doc Mailing List Cc: Mauro Carvalho Chehab Cc: Jonathan Corbet Link: https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+sams...@kernel.org --- kernel/time/hrtimer.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index edb230aba3d1..5ee77f1a8a92 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns); * @timer: hrtimer to stop * * Returns: - * 0 when the timer was not active - * 1 when the timer was active - * -1 when the timer is currently executing the callback function and + * + * * 0 when the timer was not active + * * 1 when the timer was active + * * -1 when the timer is currently executing the callback function and *cannot be stopped */ int hrtimer_try_to_cancel(struct hrtimer *timer)
Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
On 6/27/19 3:01 PM, Jiunn Chang wrote: The UP.rst file calls for locks acquired within RCU callback functions to use _irq variants (spin_lock_irqsave() or similar), which does work, but can be overkill. This commit therefore instead calls for _bh variants (spin_lock_bh() or similar), while noting that _irq does work. Signed-off-by: Paul E. McKenney Should this by Suggested-by? Signed-off-by: Jiunn Chang --- Documentation/RCU/UP.rst | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst index 67715a47ae89..e26dda27430c 100644 --- a/Documentation/RCU/UP.rst +++ b/Documentation/RCU/UP.rst @@ -113,12 +113,13 @@ Answer to Quick Quiz #1: Answer to Quick Quiz #2: What locking restriction must RCU callbacks respect? - Any lock that is acquired within an RCU callback must be - acquired elsewhere using an _irq variant of the spinlock - primitive. For example, if "mylock" is acquired by an - RCU callback, then a process-context acquisition of this - lock must use something like spin_lock_irqsave() to - acquire the lock. + Any lock that is acquired within an RCU callback must be acquired + elsewhere using an _bh variant of the spinlock primitive. + For example, if "mylock" is acquired by an RCU callback, then + a process-context acquisition of this lock must use something + like spin_lock_bh() to acquire the lock. Please note that + it is also OK to use _irq variants of spinlocks, for example, + spin_lock_irqsave(). If the process-context code were to simply use spin_lock(), then, since RCU callbacks can be invoked from softirq context, thanks, -- Shuah
Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list
On Thu, 2019-06-27 at 14:46 -0700, tip-bot for Mauro Carvalho Chehab wrote: > Commit-ID: 516337048fa40496ae5ca9863c367ec991a44d9a > Gitweb: > https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a > Author: Mauro Carvalho Chehab > AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300 > Committer: Thomas Gleixner > CommitDate: Thu, 27 Jun 2019 23:30:04 +0200 > > hrtimer: Use a bullet for the returns bullet list > > That gets rid of this warning: > >./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank > line; unexpected unindent. Doesn't this form occur multiple dozens of times in kernel sources? For instance: $ git grep -B3 -A5 -P "^ \* Returns:?$" | \ grep -P -A8 '\-\s+\*\s*@\w+:' I think the warning is odd at best and docutils might be updated or the warning ignored or suppressed. > and displays nicely both at the source code and at the produced > documentation. > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c [] > @@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns); > * @timer: hrtimer to stop > * > * Returns: > - * 0 when the timer was not active > - * 1 when the timer was active > - * -1 when the timer is currently executing the callback function and > + * > + * * 0 when the timer was not active > + * * 1 when the timer was active > + * * -1 when the timer is currently executing the callback function and > *cannot be stopped > */ > int hrtimer_try_to_cancel(struct hrtimer *timer)
Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote: > On 6/27/19 3:01 PM, Jiunn Chang wrote: > >The UP.rst file calls for locks acquired within RCU callback functions > >to use _irq variants (spin_lock_irqsave() or similar), which does work, > >but can be overkill. This commit therefore instead calls for _bh variants > >(spin_lock_bh() or similar), while noting that _irq does work. > > > >Signed-off-by: Paul E. McKenney > > Should this by Suggested-by? I wrote it and Jiunn converted my change to .rst, so I believe that this is correct as is. Thanx, Paul > >Signed-off-by: Jiunn Chang > >--- > > Documentation/RCU/UP.rst | 13 +++-- > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > >diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst > >index 67715a47ae89..e26dda27430c 100644 > >--- a/Documentation/RCU/UP.rst > >+++ b/Documentation/RCU/UP.rst > >@@ -113,12 +113,13 @@ Answer to Quick Quiz #1: > > Answer to Quick Quiz #2: > > What locking restriction must RCU callbacks respect? > >-Any lock that is acquired within an RCU callback must be > >-acquired elsewhere using an _irq variant of the spinlock > >-primitive. For example, if "mylock" is acquired by an > >-RCU callback, then a process-context acquisition of this > >-lock must use something like spin_lock_irqsave() to > >-acquire the lock. > >+Any lock that is acquired within an RCU callback must be acquired > >+elsewhere using an _bh variant of the spinlock primitive. > >+For example, if "mylock" is acquired by an RCU callback, then > >+a process-context acquisition of this lock must use something > >+like spin_lock_bh() to acquire the lock. Please note that > >+it is also OK to use _irq variants of spinlocks, for example, > >+spin_lock_irqsave(). > > If the process-context code were to simply use spin_lock(), > > then, since RCU callbacks can be invoked from softirq context, > > > > thanks, > -- Shuah >
Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
On 6/27/19 4:10 PM, Paul E. McKenney wrote: On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote: On 6/27/19 3:01 PM, Jiunn Chang wrote: The UP.rst file calls for locks acquired within RCU callback functions to use _irq variants (spin_lock_irqsave() or similar), which does work, but can be overkill. This commit therefore instead calls for _bh variants (spin_lock_bh() or similar), while noting that _irq does work. Signed-off-by: Paul E. McKenney Should this by Suggested-by? I wrote it and Jiunn converted my change to .rst, so I believe that this is correct as is. Great. thanks, -- Shuah
Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
On Thu, 27 Jun 2019 15:10:45 -0700 "Paul E. McKenney" wrote: > On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote: > > On 6/27/19 3:01 PM, Jiunn Chang wrote: > > >The UP.rst file calls for locks acquired within RCU callback functions > > >to use _irq variants (spin_lock_irqsave() or similar), which does work, > > >but can be overkill. This commit therefore instead calls for _bh variants > > >(spin_lock_bh() or similar), while noting that _irq does work. > > > > > >Signed-off-by: Paul E. McKenney > > > > Should this by Suggested-by? > > I wrote it and Jiunn converted my change to .rst, so I believe that > this is correct as is. Note, you did send Jiunn an explicit Signed-off-by when you wrote it, correct? As Signed-off-by is equivalent to a signature. -- Steve
Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq
On Thu, Jun 27, 2019 at 06:29:38PM -0400, Steven Rostedt wrote: > On Thu, 27 Jun 2019 15:10:45 -0700 > "Paul E. McKenney" wrote: > > > On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote: > > > On 6/27/19 3:01 PM, Jiunn Chang wrote: > > > >The UP.rst file calls for locks acquired within RCU callback functions > > > >to use _irq variants (spin_lock_irqsave() or similar), which does work, > > > >but can be overkill. This commit therefore instead calls for _bh > > > >variants > > > >(spin_lock_bh() or similar), while noting that _irq does work. > > > > > > > >Signed-off-by: Paul E. McKenney > > > > > > Should this by Suggested-by? > > > > I wrote it and Jiunn converted my change to .rst, so I believe that > > this is correct as is. > > Note, you did send Jiunn an explicit Signed-off-by when you wrote it, > correct? As Signed-off-by is equivalent to a signature. Indeed I did, but I now see that it was via private email. Here it is again for public consumption, and Jiunn's patch is based on this one, just translated to .rst. I once again verified that the Jiunn's version is word-for-word identical to mine, so Jiunn's patch should be good. ;-) Thanx, Paul commit a293734a310b463a0dba68409943a4e6065cd39d Author: Paul E. McKenney Date: Wed Jun 26 10:16:19 2019 -0700 doc: RCU callback locks need only _bh, not necessarily _irq The UP.txt file calls for locks acquired within RCU callback functions to use _irq variants (spin_lock_irqsave() or similar), which does work, but can be overkill. This commit therefore instead calls for _bh variants (spin_lock_bh() or similar), while noting that _irq does work. Signed-off-by: Paul E. McKenney diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt index 53bde717017b..0edd8c5af0b5 100644 --- a/Documentation/RCU/UP.txt +++ b/Documentation/RCU/UP.txt @@ -104,12 +104,13 @@ Answer to Quick Quiz #1: Answer to Quick Quiz #2: What locking restriction must RCU callbacks respect? - Any lock that is acquired within an RCU callback must be - acquired elsewhere using an _irq variant of the spinlock - primitive. For example, if "mylock" is acquired by an - RCU callback, then a process-context acquisition of this - lock must use something like spin_lock_irqsave() to - acquire the lock. + Any lock that is acquired within an RCU callback must be acquired + elsewhere using an _bh variant of the spinlock primitive. + For example, if "mylock" is acquired by an RCU callback, then + a process-context acquisition of this lock must use something + like spin_lock_bh() to acquire the lock. Please note that + it is also OK to use _irq variants of spinlocks, for example, + spin_lock_irqsave(). If the process-context code were to simply use spin_lock(), then, since RCU callbacks can be invoked from softirq context,
Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list
Em Thu, 27 Jun 2019 15:08:59 -0700 Joe Perches escreveu: > On Thu, 2019-06-27 at 14:46 -0700, tip-bot for Mauro Carvalho Chehab > wrote: > > Commit-ID: 516337048fa40496ae5ca9863c367ec991a44d9a > > Gitweb: > > https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a > > Author: Mauro Carvalho Chehab > > AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300 > > Committer: Thomas Gleixner > > CommitDate: Thu, 27 Jun 2019 23:30:04 +0200 > > > > hrtimer: Use a bullet for the returns bullet list > > > > That gets rid of this warning: > > > >./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank > > line; unexpected unindent. > > Doesn't this form occur multiple dozens of times in > kernel sources? > > For instance: > > $ git grep -B3 -A5 -P "^ \* Returns:?$" | \ > grep -P -A8 '\-\s+\*\s*@\w+:' Yes, this is a common pattern, but not all patterns that match the above regex are broken. > > I think the warning is odd at best and docutils might > be updated or the warning ignored or suppressed. > > > and displays nicely both at the source code and at the produced > > documentation. The warnings are painful - and they're the main reason why I wrote this change: - I wanted to avoid new warnings actually unrelated to my changes that were sometimes appearing while doing incremental "make htmldocs" on a big patchset that I've been rebasing almost every week over the last two months. - Yet, did you try to look how this pattern will appear at the html and pdf output? Something like this: sound/soc/codecs/wm8960.c: * Returns: sound/soc/codecs/wm8960.c- * -1, in case no sysclk frequency available found sound/soc/codecs/wm8960.c- * >=0, in case we could derive bclk and lrclk from sysclk using sound/soc/codecs/wm8960.c- * (@sysclk_idx, @dac_idx, @bclk_idx) dividers Will be displayed as: **Returns:** -1, in case no sysclk frequency available found **>=0, in case we could derive bclk and lrclk from sysclk using** (@sysclk_idx, @dac_idx, @bclk_idx) dividers (where **foo**) means that "foo" will be printed in bold. E. g. it will just merge all returns values into a single line and, if there are alignment differences, it will make the previous line bold and produce a warning. On some places, however, what's there will be properly displayed, like this one: ** * wimax_reset - Reset a WiMAX device * * @wimax_dev: WiMAX device descriptor * * Returns: * * %0 if ok and a warm reset was done (the device still exists in * the system). * * -%ENODEV if a cold/bus reset had to be done (device has * disconnected and reconnected, so current handle is not valid * any more). * * -%EINVAL if the device is not even registered. * * Any other negative error code shall be considered as * non-recoverable. * As there are blank lines between each value, making each return code a different line. This one: tools/lib/traceevent/parse-filter.c: * Returns: tools/lib/traceevent/parse-filter.c- * 1 if the two filters hold the same content. tools/lib/traceevent/parse-filter.c- * 0 if they do not. will also not mangle too much, as the dots will help for someone to understand, if reading the html/pdf output, like this: **Returns:** 1 if the two filters hold the same content. 0 if they do not. So, it all depends on the context. - While it would likely be possible to improve kernel-doc to present better results, I'm afraid that it would be too complex for simple regex expressions, and hard to tune, as it would be a hint-based approach, and doing a natural language processing would be too much effort. > > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > [] > > @@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns); > > * @timer: hrtimer to stop > > * > > * Returns: > > - * 0 when the timer was not active > > - * 1 when the timer was active > > - * -1 when the timer is currently executing the callback function and > > + * > > + * * 0 when the timer was not active > > + * * 1 when the timer was active > > + * * -1 when the timer is currently executing the callback function and > > *cannot be stopped > > */ > > int hrtimer_try_to_cancel(struct hrtimer *timer) > Thanks, Mauro
[PATCH 00/15] FPGA DFL updates
Hi Greg, can you please take the following patches. They're mostly new features and some cleanup of the DFL internals. They've been on the mailing list and have been reviewed. Note: I've seen that Mauro touched Documentation/fpga/dfl.rst in linux-next commit c220a1fae6c5d ("docs: fpga: convert docs to ReST and rename to *.rst") and fixed up PATCH 05/15 to apply on top of that. If you prefer the original series against char-misc-next let me know, and I'll resubmit. Thanks, Moritz Wu Hao (15): fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address. fpga: dfl: fme: remove copy_to_user() in ioctl for PR fpga: dfl: fme: align PR buffer size per PR datawidth fpga: dfl: fme: support 512bit data width PR Documentation: fpga: dfl: add descriptions for virtualization and new interfaces. fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support. fpga: dfl: pci: enable SRIOV support. fpga: dfl: afu: add AFU state related sysfs interfaces fpga: dfl: afu: add userclock sysfs interfaces. fpga: dfl: add id_table for dfl private feature driver fpga: dfl: afu: export __port_enable/disable function. fpga: dfl: afu: add error reporting support. fpga: dfl: afu: add STP (SignalTap) support fpga: dfl: fme: add capability sysfs interfaces fpga: dfl: fme: add global error reporting support .../ABI/testing/sysfs-platform-dfl-fme| 98 + .../ABI/testing/sysfs-platform-dfl-port | 104 + Documentation/fpga/dfl.rst| 100 + drivers/fpga/Makefile | 3 +- drivers/fpga/dfl-afu-error.c | 225 ++ drivers/fpga/dfl-afu-main.c | 330 ++- drivers/fpga/dfl-afu.h| 7 + drivers/fpga/dfl-fme-error.c | 385 ++ drivers/fpga/dfl-fme-main.c | 120 +- drivers/fpga/dfl-fme-mgr.c| 117 +- drivers/fpga/dfl-fme-pr.c | 65 +-- drivers/fpga/dfl-fme.h| 7 +- drivers/fpga/dfl-pci.c| 40 ++ drivers/fpga/dfl.c| 169 +++- drivers/fpga/dfl.h| 54 ++- include/uapi/linux/fpga-dfl.h | 32 ++ 16 files changed, 1776 insertions(+), 80 deletions(-) create mode 100644 drivers/fpga/dfl-afu-error.c create mode 100644 drivers/fpga/dfl-fme-error.c -- 2.22.0
[PATCH 12/15] fpga: dfl: afu: add error reporting support.
From: Wu Hao Error reporting is one important private feature, it reports error detected on port and accelerated function unit (AFU). It introduces several sysfs interfaces to allow userspace to check and clear errors detected by hardware. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- .../ABI/testing/sysfs-platform-dfl-port | 39 +++ drivers/fpga/Makefile | 1 + drivers/fpga/dfl-afu-error.c | 225 ++ drivers/fpga/dfl-afu-main.c | 4 + drivers/fpga/dfl-afu.h| 4 + 5 files changed, 273 insertions(+) create mode 100644 drivers/fpga/dfl-afu-error.c diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 04ea7f2971c7..4aeca94856b6 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -79,3 +79,42 @@ KernelVersion: 5.3 Contact: Wu Hao Description: Read-only. Read this file to get the status of issued command to userclck_freqcntrcmd. + +What: /sys/bus/platform/devices/dfl-port.0/errors/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of this error + reporting private feature. + +What: /sys/bus/platform/devices/dfl-port.0/errors/errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get errors detected on port and + Accelerated Function Unit (AFU). + +What: /sys/bus/platform/devices/dfl-port.0/errors/first_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first malformed request + captured by hardware. + +What: /sys/bus/platform/devices/dfl-port.0/errors/clear +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. Write error code to this file to clear errors. + Write fails with -EINVAL if input parsing fails or input error + code doesn't match. + Write fails with -EBUSY or -ETIMEDOUT if error can't be cleared + as hardware is in low power state (-EBUSY) or not responding + (-ETIMEDOUT). diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index 312b9371742f..72558914a29c 100644 --- a/drivers/fpga/Makefile +++ b/drivers/fpga/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_FPGA_DFL_AFU)+= dfl-afu.o dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o +dfl-afu-objs += dfl-afu-error.o # Drivers for FPGAs which implement DFL obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c new file mode 100644 index ..f20dbdf5805d --- /dev/null +++ b/drivers/fpga/dfl-afu-error.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting + * + * Copyright 2019 Intel Corporation, Inc. + * + * Authors: + * Wu Hao + * Xiao Guangrong + * Joseph Grecco + * Enno Luebbers + * Tim Whisonant + * Ananda Ravuri + * Mitchel Henry + */ + +#include + +#include "dfl-afu.h" + +#define PORT_ERROR_MASK0x8 +#define PORT_ERROR 0x10 +#define PORT_FIRST_ERROR 0x18 +#define PORT_MALFORMED_REQ00x20 +#define PORT_MALFORMED_REQ10x28 + +#define ERROR_MASK GENMASK_ULL(63, 0) + +/* mask or unmask port errors by the error mask register. */ +static void __port_err_mask(struct device *dev, bool mask) +{ + void __iomem *base; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); + + writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK); +} + +/* clear port errors. */ +static int __port_err_clear(struct device *dev, u64 err) +{ + struct platform_device *pdev = to_platform_device(dev); + void __iomem *base_err, *base_hdr; + int ret; + u64 v; + + base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); + base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + /* +* clear Port Errors +* +* - Check for AP6 State +* - Halt Port by keeping Port in reset +* - Set PORT Error mask to all 1 to mask errors +* - Clear all errors +* - Set Port mask to all 0 to enable errors +* - All errors start capturing
[PATCH 07/15] fpga: dfl: pci: enable SRIOV support.
From: Wu Hao This patch enables the standard sriov support. It allows user to enable SRIOV (and VFs), then user could pass through accelerators (VFs) into virtual machine or use VFs directly in host. Signed-off-by: Zhang Yi Z Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-pci.c | 40 drivers/fpga/dfl.c | 41 + drivers/fpga/dfl.h | 1 + 3 files changed, 82 insertions(+) diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c index 66b5720582bb..2fa571b0fdea 100644 --- a/drivers/fpga/dfl-pci.c +++ b/drivers/fpga/dfl-pci.c @@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pcidevid) return ret; } +static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs) +{ + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); + struct dfl_fpga_cdev *cdev = drvdata->cdev; + int ret = 0; + + mutex_lock(&cdev->lock); + + if (!num_vfs) { + /* +* disable SRIOV and then put released ports back to default +* PF access mode. +*/ + pci_disable_sriov(pcidev); + + __dfl_fpga_cdev_config_port_vf(cdev, false); + + } else if (cdev->released_port_num == num_vfs) { + /* +* only enable SRIOV if cdev has matched released ports, put +* released ports into VF access mode firstly. +*/ + __dfl_fpga_cdev_config_port_vf(cdev, true); + + ret = pci_enable_sriov(pcidev, num_vfs); + if (ret) + __dfl_fpga_cdev_config_port_vf(cdev, false); + } else { + ret = -EINVAL; + } + + mutex_unlock(&cdev->lock); + return ret; +} + static void cci_pci_remove(struct pci_dev *pcidev) { + if (dev_is_pf(&pcidev->dev)) + cci_pci_sriov_configure(pcidev, 0); + cci_remove_feature_devs(pcidev); pci_disable_pcie_error_reporting(pcidev); } @@ -234,6 +272,7 @@ static struct pci_driver cci_pci_driver = { .id_table = cci_pcie_id_tbl, .probe = cci_pci_probe, .remove = cci_pci_remove, + .sriov_configure = cci_pci_sriov_configure, }; module_pci_driver(cci_pci_driver); @@ -241,3 +280,4 @@ module_pci_driver(cci_pci_driver); MODULE_DESCRIPTION("FPGA DFL PCIe Device Driver"); MODULE_AUTHOR("Intel Corporation"); MODULE_LICENSE("GPL v2"); +MODULE_VERSION(DRV_VERSION); diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 308c80868af4..28d61b611165 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, } EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); +static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_PORT_OFST(port_id)); + + v &= ~FME_PORT_OFST_ACC_CTRL; + v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL, + is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF); + + writeq(v, base + FME_HDR_PORT_OFST(port_id)); +} + +/** + * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode + * + * @cdev: parent container device. + * @if_vf: true for VF access mode, and false for PF access mode + * + * Return: 0 on success, negative error code otherwise. + * + * This function is needed in sriov configuration routine. It could be used to + * configures the released ports access mode to VF or PF. + * The caller needs to hold lock for protection. + */ +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf) +{ + struct dfl_feature_platform_data *pdata; + + list_for_each_entry(pdata, &cdev->port_dev_list, node) { + if (device_is_registered(&pdata->dev->dev)) + continue; + + config_port_vf(cdev->fme_dev, pdata->id, is_vf); + } +} +EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf); + static int __init dfl_fpga_init(void) { int ret; diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h index 63f39ab08905..1350e8eb9e59 100644 --- a/drivers/fpga/dfl.h +++ b/drivers/fpga/dfl.h @@ -421,5 +421,6 @@ dfl_fpga_cdev_find_port(struct dfl_fpga_cdev *cdev, void *data, int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, u32 port_id, bool release); +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf); #endif /* __FPGA_DFL_H */ -- 2.22.0
[PATCH 02/15] fpga: dfl: fme: remove copy_to_user() in ioctl for PR
From: Wu Hao This patch removes copy_to_user() code in partial reconfiguration ioctl, as it's useless as user never needs to read the data structure after ioctl. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-fme-pr.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index d9ca9554844a..6ec0f09e5994 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -159,9 +159,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) mutex_unlock(&pdata->lock); free_exit: vfree(buf); - if (copy_to_user((void __user *)arg, &port_pr, minsz)) - return -EFAULT; - return ret; } -- 2.22.0
[PATCH 05/15] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.
From: Wu Hao This patch adds virtualization support description for DFL based FPGA devices (based on PCIe SRIOV), and introductions to new interfaces added by new dfl private feature drivers. [m...@kernel.org: Fixed up to make it work with new reStructuredText docs] Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- Documentation/fpga/dfl.rst | 100 + 1 file changed, 100 insertions(+) diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst index 2f125abd777f..be9929dd7251 100644 --- a/Documentation/fpga/dfl.rst +++ b/Documentation/fpga/dfl.rst @@ -87,6 +87,8 @@ The following functions are exposed through ioctls: - Get driver API version (DFL_FPGA_GET_API_VERSION) - Check for extensions (DFL_FPGA_CHECK_EXTENSION) - Program bitstream (DFL_FPGA_FME_PORT_PR) +- Assign port to PF (DFL_FPGA_FME_PORT_ASSIGN) +- Release port from PF (DFL_FPGA_FME_PORT_RELEASE) More functions are exposed through sysfs (/sys/class/fpga_region/regionX/dfl-fme.n/): @@ -143,6 +145,9 @@ More functions are exposed through sysfs: Read Accelerator GUID (afu_id) afu_id indicates which PR bitstream is programmed to this AFU. + Global error reporting management (errors/) + error reporting sysfs interfaces allow user to read errors detected by the + hardware, and clear the logged errors. DFL Framework Overview == @@ -218,6 +223,101 @@ the compat_id exposed by the target FPGA region. This check is usually done by userspace before calling the reconfiguration IOCTL. +FPGA virtualization - PCIe SRIOV + +This section describes the virtualization support on DFL based FPGA device to +enable accessing an accelerator from applications running in a virtual machine +(VM). This section only describes the PCIe based FPGA device with SRIOV support. + +Features supported by the particular FPGA device are exposed through Device +Feature Lists, as illustrated below: + +:: + ++---+ +-+ +| PF | | VF | ++---+ +-+ +^^ ^ ^ +|| | | + +-||-|--|---+ + | || | | | + | +-+ +---+ +---+ +---+ | + | | FME | | Port0 | | Port1 | | Port2 | | + | +-+ +---+ +---+ +---+ | + | ^ ^ ^ | + | | | | | + | +---+ +--+ +---+ | + | | AFU | | AFU | | AFU | | + | +---+ +--+ +---+ | + | | + |DFL based FPGA PCIe Device | + +---+ + +FME is always accessed through the physical function (PF). + +Ports (and related AFUs) are accessed via PF by default, but could be exposed +through virtual function (VF) devices via PCIe SRIOV. Each VF only contains +1 Port and 1 AFU for isolation. Users could assign individual VFs (accelerators) +created via PCIe SRIOV interface, to virtual machines. + +The driver organization in virtualization case is illustrated below: +:: + ++---++--++--+ | +| FME || FME || FME | | +| FPGA || FPGA || FPGA | | +|Manager||Bridge||Region| | ++---++--++--+ | ++---+ ++ | ++ +| FME | | AFU | | | AFU | +| Module| | Module | | | Module | ++---+ ++ | ++ + +---+ | +---+ + | FPGA Container Device | | | FPGA Container Device | + | (FPGA Base Region) | | | (FPGA Base Region) | + +---+ | +---+ ++--+ | +--+ +| FPGA PCIE Module | | Virtual | FPGA PCIE Module | ++--+ Host | Machine +--+ + -- | -- + +---+| +---+ + | PCI PF Device || | PCI VF Device | + +---+| +---+ + +FPGA PCIe device driver is always loaded first once a FPGA PCIe PF or VF device +is detected. It: + +* Finishes enumeration on both FPGA PCIe PF and VF device using common + int
[PATCH 03/15] fpga: dfl: fme: align PR buffer size per PR datawidth
From: Wu Hao Current driver checks if input bitstream file size is aligned or not per PR data width (default 32bits). It requires one additional step for end user when they generate the bitstream file, padding extra zeros to bitstream file to align its size per PR data width, but they don't have to as hardware will drop extra padding bytes automatically. In order to simplify the user steps, this patch aligns PR buffer size per PR data width in driver, to allow user to pass unaligned size bitstream files to driver. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-fme-pr.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index 6ec0f09e5994..3c71dc3faaf5 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) struct dfl_fme *fme; unsigned long minsz; void *buf = NULL; + size_t length; int ret = 0; u64 v; @@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) if (port_pr.argsz < minsz || port_pr.flags) return -EINVAL; - if (!IS_ALIGNED(port_pr.buffer_size, 4)) - return -EINVAL; - /* get fme header region */ fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev, FME_FEATURE_ID_HEADER); @@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) port_pr.buffer_size)) return -EFAULT; - buf = vmalloc(port_pr.buffer_size); + /* +* align PR buffer per PR bandwidth, as HW ignores the extra padding +* data automatically. +*/ + length = ALIGN(port_pr.buffer_size, 4); + + buf = vmalloc(length); if (!buf) return -ENOMEM; @@ -140,7 +144,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) fpga_image_info_free(region->info); info->buf = buf; - info->count = port_pr.buffer_size; + info->count = length; info->region_id = port_pr.port_id; region->info = info; -- 2.22.0
[PATCH 08/15] fpga: dfl: afu: add AFU state related sysfs interfaces
From: Wu Hao This patch introduces more sysfs interfaces for Accelerated Function Unit (AFU). These interfaces allow users to read current AFU Power State (APx), read / clear AFU Power (APx) events which are sticky to identify transient APx state, and manage AFU's LTR (latency tolerance reporting). Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- .../ABI/testing/sysfs-platform-dfl-port | 30 drivers/fpga/dfl-afu-main.c | 140 ++ drivers/fpga/dfl.h| 11 ++ 3 files changed, 181 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 6a92dda517b0..17b37d110618 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -14,3 +14,33 @@ Description: Read-only. User can program different PR bitstreams to FPGA Accelerator Function Unit (AFU) for different functions. It returns uuid which could be used to identify which PR bitstream is programmed in this AFU. + +What: /sys/bus/platform/devices/dfl-port.0/power_state +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It reports the APx (AFU Power) state, different APx + means different throttling level. When reading this file, it + returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6. + +What: /sys/bus/platform/devices/dfl-port.0/ap1_event +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read or set 1 to clear AP1 (AFU Power State 1) + event. It's used to indicate transient AP1 state. + +What: /sys/bus/platform/devices/dfl-port.0/ap2_event +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read or set 1 to clear AP2 (AFU Power State 2) + event. It's used to indicate transient AP2 state. + +What: /sys/bus/platform/devices/dfl-port.0/ltr +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read and set AFU latency tolerance reporting value. + Set ltr to 1 if the AFU can tolerate latency >= 40us or set it + to 0 if it is latency sensitive. diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 02baa6a227c0..040ed8ad16e5 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -21,6 +21,8 @@ #include "dfl-afu.h" +#define DRV_VERSION"0.8" + /** * port_enable - enable a port * @pdev: port platform device. @@ -141,8 +143,145 @@ id_show(struct device *dev, struct device_attribute *attr, char *buf) } static DEVICE_ATTR_RO(id); +static ssize_t +ltr_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_CTRL); + mutex_unlock(&pdata->lock); + + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v)); +} + +static ssize_t +ltr_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u8 ltr; + u64 v; + + if (kstrtou8(buf, 0,1) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_CTRL); + v &= ~PORT_CTRL_LATENCY; + v |= FIELD_PREP(PORT_CTRL_LATENCY, ltr); + writeq(v, base + PORT_HDR_CTRL); + mutex_unlock(&pdata->lock); + + return count; +} +static DEVICE_ATTR_RW(ltr); + +static ssize_t +ap1_event_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_STS); + mutex_unlock(&pdata->lock); + + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_STS_AP1_EVT, v)); +} + +static ssize_t +ap1_event_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u8 ap1_event; + + if (kstrtou8(buf, 0, &ap1_event) || ap1_event != 1) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEAT
[PATCH 10/15] fpga: dfl: add id_table for dfl private feature driver
From: Wu Hao This patch adds id_table for each dfl private feature driver, it allows to reuse same private feature driver to match and support multiple dfl private features. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-afu-main.c | 14 -- drivers/fpga/dfl-fme-main.c | 11 --- drivers/fpga/dfl-fme-pr.c | 7 ++- drivers/fpga/dfl-fme.h | 3 ++- drivers/fpga/dfl.c | 21 +++-- drivers/fpga/dfl.h | 21 +++-- 6 files changed, 62 insertions(+), 15 deletions(-) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 8b434a405498..65b3e895e364 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -435,6 +435,11 @@ port_hdr_ioctl(struct platform_device *pdev, struct dfl_feature *feature, return ret; } +static const struct dfl_feature_id port_hdr_id_table[] = { + {.id = PORT_FEATURE_ID_HEADER,}, + {0,} +}; + static const struct dfl_feature_ops port_hdr_ops = { .init = port_hdr_init, .uinit = port_hdr_uinit, @@ -495,6 +500,11 @@ static void port_afu_uinit(struct platform_device *pdev, sysfs_remove_files(&pdev->dev.kobj, port_afu_attrs); } +static const struct dfl_feature_id port_afu_id_table[] = { + {.id = PORT_FEATURE_ID_AFU,}, + {0,} +}; + static const struct dfl_feature_ops port_afu_ops = { .init = port_afu_init, .uinit = port_afu_uinit, @@ -502,11 +512,11 @@ static const struct dfl_feature_ops port_afu_ops = { static struct dfl_feature_driver port_feature_drvs[] = { { - .id = PORT_FEATURE_ID_HEADER, + .id_table = port_hdr_id_table, .ops = &port_hdr_ops, }, { - .id = PORT_FEATURE_ID_AFU, + .id_table = port_afu_id_table, .ops = &port_afu_ops, }, { diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 8b2a33760483..38c6342e1865 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -158,6 +158,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev, return -ENODEV; } +static const struct dfl_feature_id fme_hdr_id_table[] = { + {.id = FME_FEATURE_ID_HEADER,}, + {0,} +}; + static const struct dfl_feature_ops fme_hdr_ops = { .init = fme_hdr_init, .uinit = fme_hdr_uinit, @@ -166,12 +171,12 @@ static const struct dfl_feature_ops fme_hdr_ops = { static struct dfl_feature_driver fme_feature_drvs[] = { { - .id = FME_FEATURE_ID_HEADER, + .id_table = fme_hdr_id_table, .ops = &fme_hdr_ops, }, { - .id = FME_FEATURE_ID_PR_MGMT, - .ops = &pr_mgmt_ops, + .id_table = fme_pr_mgmt_id_table, + .ops = &fme_pr_mgmt_ops, }, { .ops = NULL, diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index cd94ba870094..52f1745dfb25 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -483,7 +483,12 @@ static long fme_pr_ioctl(struct platform_device *pdev, return ret; } -const struct dfl_feature_ops pr_mgmt_ops = { +const struct dfl_feature_id fme_pr_mgmt_id_table[] = { + {.id = FME_FEATURE_ID_PR_MGMT,}, + {0} +}; + +const struct dfl_feature_ops fme_pr_mgmt_ops = { .init = pr_mgmt_init, .uinit = pr_mgmt_uinit, .ioctl = fme_pr_ioctl, diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h index de207556b70a..7a021c483e9b 100644 --- a/drivers/fpga/dfl-fme.h +++ b/drivers/fpga/dfl-fme.h @@ -35,6 +35,7 @@ struct dfl_fme { struct dfl_feature_platform_data *pdata; }; -extern const struct dfl_feature_ops pr_mgmt_ops; +extern const struct dfl_feature_ops fme_pr_mgmt_ops; +extern const struct dfl_feature_id fme_pr_mgmt_id_table[]; #endif /* __DFL_FME_H */ diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 28d61b611165..1bb2b582e4b0 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -14,6 +14,8 @@ #include "dfl.h" +#define DRV_VERSION"0.8" + static DEFINE_MUTEX(dfl_id_mutex); /* @@ -281,6 +283,21 @@ static int dfl_feature_instance_init(struct platform_device *pdev, return ret; } +static bool dfl_feature_drv_match(struct dfl_feature *feature, + struct dfl_feature_driver *driver) +{ + const struct dfl_feature_id *ids = driver->id_table; + + if (ids) { + while (ids->id) { + if (ids->id == feature->id) + return true; + ids++; + } + } + return false; +} + /** * dfl_fpga_dev_feature_init - init for sub features of dfl feature device * @pdev: feature device. @@ -301,8 +318,7 @@ int
[PATCH 11/15] fpga: dfl: afu: export __port_enable/disable function.
From: Wu Hao As these two functions are used by other private features. e.g. in error reporting private feature, it requires to check port status and reset port for error clearing. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-afu-main.c | 25 ++--- drivers/fpga/dfl-afu.h | 3 +++ 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 65b3e895e364..c8bc0b5d9c16 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -24,14 +24,16 @@ #define DRV_VERSION"0.8" /** - * port_enable - enable a port + * __port_enable - enable a port * @pdev: port platform device. * * Enable Port by clear the port soft reset bit, which is set by default. * The AFU is unable to respond to any MMIO access while in reset. - * port_enable function should only be used after port_disable function. + * __port_enable function should only be used after __port_disable function. + * + * The caller needs to hold lock for protection. */ -static void port_enable(struct platform_device *pdev) +void __port_enable(struct platform_device *pdev) { struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); void __iomem *base; @@ -54,13 +56,14 @@ static void port_enable(struct platform_device *pdev) #define RST_POLL_TIMEOUT 1000 /* us */ /** - * port_disable - disable a port + * __port_disable - disable a port * @pdev: port platform device. * - * Disable Port by setting the port soft reset bit, it puts the port into - * reset. + * Disable Port by setting the port soft reset bit, it puts the port into reset. + * + * The caller needs to hold lock for protection. */ -static int port_disable(struct platform_device *pdev) +int __port_disable(struct platform_device *pdev) { struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); void __iomem *base; @@ -106,9 +109,9 @@ static int __port_reset(struct platform_device *pdev) { int ret; - ret = port_disable(pdev); + ret = __port_disable(pdev); if (!ret) - port_enable(pdev); + __port_enable(pdev); return ret; } @@ -805,9 +808,9 @@ static int port_enable_set(struct platform_device *pdev, bool enable) mutex_lock(&pdata->lock); if (enable) - port_enable(pdev); + __port_enable(pdev); else - ret = port_disable(pdev); + ret = __port_disable(pdev); mutex_unlock(&pdata->lock); return ret; diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h index 0c7630ae3cda..35e60c5859a4 100644 --- a/drivers/fpga/dfl-afu.h +++ b/drivers/fpga/dfl-afu.h @@ -79,6 +79,9 @@ struct dfl_afu { struct dfl_feature_platform_data *pdata; }; +void __port_enable(struct platform_device *pdev); +int __port_disable(struct platform_device *pdev); + void afu_mmio_region_init(struct dfl_feature_platform_data *pdata); int afu_mmio_region_add(struct dfl_feature_platform_data *pdata, u32 region_index, u64 region_size, u64 phys, u32 flags); -- 2.22.0
[PATCH 14/15] fpga: dfl: fme: add capability sysfs interfaces
From: Wu Hao This patch adds 3 read-only sysfs interfaces for FPGA Management Engine (FME) block for capabilities including cache_size, fabric_version and socket_id. Signed-off-by: Luwei Kang Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- .../ABI/testing/sysfs-platform-dfl-fme| 23 + drivers/fpga/dfl-fme-main.c | 48 +++ 2 files changed, 71 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme index 8fa4febfa4b2..99cd3b2acff5 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme @@ -21,3 +21,26 @@ Contact: Wu Hao Description: Read-only. It returns Bitstream (static FPGA region) meta data, which includes the synthesis date, seed and other information of this static FPGA region. + +What: /sys/bus/platform/devices/dfl-fme.0/cache_size +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns cache size of this FPGA device. + +What: /sys/bus/platform/devices/dfl-fme.0/fabric_version +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns fabric version of this FPGA device. + Userspace applications need this information to select + best data channels per different fabric design. + +What: /sys/bus/platform/devices/dfl-fme.0/socket_id +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns socket_id to indicate which socket + this FPGA belongs to, only valid for integrated solution. + User only needs this information, in case standard numa node + can't provide correct information. diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 38c6342e1865..2d69b8fd0137 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -75,10 +75,58 @@ static ssize_t bitstream_metadata_show(struct device *dev, } static DEVICE_ATTR_RO(bitstream_metadata); +static ssize_t cache_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v)); +} +static DEVICE_ATTR_RO(cache_size); + +static ssize_t fabric_version_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v)); +} +static DEVICE_ATTR_RO(fabric_version); + +static ssize_t socket_id_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v)); +} +static DEVICE_ATTR_RO(socket_id); + static const struct attribute *fme_hdr_attrs[] = { &dev_attr_ports_num.attr, &dev_attr_bitstream_id.attr, &dev_attr_bitstream_metadata.attr, + &dev_attr_cache_size.attr, + &dev_attr_fabric_version.attr, + &dev_attr_socket_id.attr, NULL, }; -- 2.22.0
[PATCH 04/15] fpga: dfl: fme: support 512bit data width PR
From: Wu Hao In early partial reconfiguration private feature, it only supports 32bit data width when writing data to hardware for PR. 512bit data width PR support is an important optimization for some specific solutions (e.g. XEON with FPGA integrated), it allows driver to use AVX512 instruction to improve the performance of partial reconfiguration. e.g. programming one 100MB bitstream image via this 512bit data width PR hardware only takes ~300ms, but 32bit revision requires ~3s per test result. Please note now this optimization is only done on revision 2 of this PR private feature which is only used in integrated solution that AVX512 is always supported. This revision 2 hardware doesn't support 32bit PR. Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-fme-main.c | 3 + drivers/fpga/dfl-fme-mgr.c | 113 +++- drivers/fpga/dfl-fme-pr.c | 43 +- drivers/fpga/dfl-fme.h | 2 + drivers/fpga/dfl.h | 5 ++ 5 files changed, 135 insertions(+), 31 deletions(-) diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 086ad2420ade..076d74f6416d 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -21,6 +21,8 @@ #include "dfl.h" #include "dfl-fme.h" +#define DRV_VERSION"0.8" + static ssize_t ports_num_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -277,3 +279,4 @@ MODULE_DESCRIPTION("FPGA Management Engine driver"); MODULE_AUTHOR("Intel Corporation"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("platform:dfl-fme"); +MODULE_VERSION(DRV_VERSION); diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c index b3f7eee3c93f..d1a4ba5d1d3d 100644 --- a/drivers/fpga/dfl-fme-mgr.c +++ b/drivers/fpga/dfl-fme-mgr.c @@ -22,14 +22,18 @@ #include #include +#include "dfl.h" #include "dfl-fme-pr.h" +#define DRV_VERSION"0.8" + /* FME Partial Reconfiguration Sub Feature Register Set */ #define FME_PR_DFH 0x0 #define FME_PR_CTRL0x8 #define FME_PR_STS 0x10 #define FME_PR_DATA0x18 #define FME_PR_ERR 0x20 +#define FME_PR_512_DATA0x40 /* Data Register for 512bit datawidth PR */ #define FME_PR_INTFC_ID_L 0xA8 #define FME_PR_INTFC_ID_H 0xB0 @@ -67,8 +71,43 @@ #define PR_WAIT_TIMEOUT 800 #define PR_HOST_STATUS_IDLE0 +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512) + +#include +#include + +static inline int is_cpu_avx512_enabled(void) +{ + return cpu_feature_enabled(X86_FEATURE_AVX512F); +} + +static inline void copy512(const void *src, void __iomem *dst) +{ + kernel_fpu_begin(); + + asm volatile("vmovdqu64 (%0), %%zmm0;" +"vmovntdq %%zmm0, (%1);" +: +: "r"(src), "r"(dst) +: "memory"); + + kernel_fpu_end(); +} +#else +static inline int is_cpu_avx512_enabled(void) +{ + return 0; +} + +static inline void copy512(const void *src, void __iomem *dst) +{ + WARN_ON_ONCE(1); +} +#endif + struct fme_mgr_priv { void __iomem *ioaddr; + unsigned int pr_datawidth; u64 pr_error; }; @@ -169,7 +208,7 @@ static int fme_mgr_write(struct fpga_manager *mgr, struct fme_mgr_priv *priv = mgr->priv; void __iomem *fme_pr = priv->ioaddr; u64 pr_ctrl, pr_status, pr_data; - int delay = 0, pr_credit, i = 0; + int ret = 0, delay = 0, pr_credit; dev_dbg(dev, "start request\n"); @@ -181,9 +220,9 @@ static int fme_mgr_write(struct fpga_manager *mgr, /* * driver can push data to PR hardware using PR_DATA register once HW -* has enough pr_credit (> 1), pr_credit reduces one for every 32bit -* pr data write to PR_DATA register. If pr_credit <= 1, driver needs -* to wait for enough pr_credit from hardware by polling. +* has enough pr_credit (> 1), pr_credit reduces one for every pr data +* width write to PR_DATA register. If pr_credit <= 1, driver needs to +* wait for enough pr_credit from hardware by polling. */ pr_status = readq(fme_pr + FME_PR_STS); pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status); @@ -192,7 +231,8 @@ static int fme_mgr_write(struct fpga_manager *mgr, while (pr_credit <= 1) { if (delay++ > PR_WAIT_TIMEOUT) { dev_err(dev, "PR_CREDIT timeout\n"); - return -ETIMEDOUT; + ret = -ETIMEDOUT; + goto done; } udelay(1); @@ -200,21 +240,27 @@ static int fme_mgr_write(struct fpga_manager *mgr, pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr
[PATCH 13/15] fpga: dfl: afu: add STP (SignalTap) support
From: Wu Hao STP (SignalTap) is one of the private features under the port for debugging. This patch adds private feature driver support for it to allow userspace applications to mmap related mmio region and provide STP service. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-afu-main.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index bcf6e285a854..8241aced2d5d 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -513,6 +513,36 @@ static const struct dfl_feature_ops port_afu_ops = { .uinit = port_afu_uinit, }; +static int port_stp_init(struct platform_device *pdev, +struct dfl_feature *feature) +{ + struct resource *res = &pdev->resource[feature->resource_index]; + + dev_dbg(&pdev->dev, "PORT STP Init.\n"); + + return afu_mmio_region_add(dev_get_platdata(&pdev->dev), + DFL_PORT_REGION_INDEX_STP, + resource_size(res), res->start, + DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ | + DFL_PORT_REGION_WRITE); +} + +static void port_stp_uinit(struct platform_device *pdev, + struct dfl_feature *feature) +{ + dev_dbg(&pdev->dev, "PORT STP UInit.\n"); +} + +static const struct dfl_feature_id port_stp_id_table[] = { + {.id = PORT_FEATURE_ID_STP,}, + {0,} +}; + +static const struct dfl_feature_ops port_stp_ops = { + .init = port_stp_init, + .uinit = port_stp_uinit, +}; + static struct dfl_feature_driver port_feature_drvs[] = { { .id_table = port_hdr_id_table, @@ -526,6 +556,10 @@ static struct dfl_feature_driver port_feature_drvs[] = { .id_table = port_err_id_table, .ops = &port_err_ops, }, + { + .id_table = port_stp_id_table, + .ops = &port_stp_ops, + }, { .ops = NULL, } -- 2.22.0
[PATCH 09/15] fpga: dfl: afu: add userclock sysfs interfaces.
From: Wu Hao This patch introduces userclock sysfs interfaces for AFU, user could use these interfaces for clock setting to AFU. Please note that, this is only working for port header feature with revision 0, for later revisions, userclock setting is moved to a separated private feature, so one revision sysfs interface is exposed to userspace application for this purpose too. Signed-off-by: Ananda Ravuri Signed-off-by: Russ Weight Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- .../ABI/testing/sysfs-platform-dfl-port | 35 ++ drivers/fpga/dfl-afu-main.c | 113 +- drivers/fpga/dfl.h| 4 + 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 17b37d110618..04ea7f2971c7 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -44,3 +44,38 @@ Contact: Wu Hao Description: Read-write. Read and set AFU latency tolerance reporting value. Set ltr to 1 if the AFU can tolerate latency >= 40us or set it to 0 if it is latency sensitive. + +What: /sys/bus/platform/devices/dfl-port.0/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of port header + feature. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. User writes command to this interface to set + userclock to AFU. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqsts +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the status of issued command + to userclck_freqcmd. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. User writes command to this interface to set + userclock counter. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the status of issued command + to userclck_freqcntrcmd. diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 040ed8ad16e5..8b434a405498 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -143,6 +143,17 @@ id_show(struct device *dev, struct device_attribute *attr, char *buf) } static DEVICE_ATTR_RO(id); +static ssize_t +revision_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + void __iomem *base; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + return sprintf(buf, "%x\n", dfl_feature_revision(base)); +} +static DEVICE_ATTR_RO(revision); + static ssize_t ltr_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -278,6 +289,7 @@ static DEVICE_ATTR_RO(power_state); static const struct attribute *port_hdr_attrs[] = { &dev_attr_id.attr, + &dev_attr_revision.attr, &dev_attr_ltr.attr, &dev_attr_ap1_event.attr, &dev_attr_ap2_event.attr, @@ -285,14 +297,112 @@ static const struct attribute *port_hdr_attrs[] = { NULL, }; +static ssize_t +userclk_freqcmd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + u64 userclk_freq_cmd; + void __iomem *base; + + if (kstrtou64(buf, 0, &userclk_freq_cmd)) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0); + mutex_unlock(&pdata->lock); + + return count; +} +static DEVICE_ATTR_WO(userclk_freqcmd); + +static ssize_t +userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + u64 userclk_freqcntr_cmd; + void __iomem *base; + + if (kstrtou64(buf, 0, &userclk_freqcntr_cmd)) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1); + mutex_unlock(&pdata->lock); + + return count; +} +static DEVICE_ATTR_WO(userclk_freqcntrcmd); + +static ssize_t +userclk_freqsts_show(struct device *dev, struct device
[PATCH 15/15] fpga: dfl: fme: add global error reporting support
From: Wu Hao This patch adds support for global error reporting for FPGA Management Engine (FME), it introduces sysfs interfaces to report different error detected by the hardware, and allow user to clear errors or inject error for testing purpose. Signed-off-by: Luwei Kang Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Signed-off-by: Moritz Fischer --- .../ABI/testing/sysfs-platform-dfl-fme| 75 drivers/fpga/Makefile | 2 +- drivers/fpga/dfl-fme-error.c | 385 ++ drivers/fpga/dfl-fme-main.c | 4 + drivers/fpga/dfl-fme.h| 2 + drivers/fpga/dfl.h| 2 + 6 files changed, 469 insertions(+), 1 deletion(-) create mode 100644 drivers/fpga/dfl-fme-error.c diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme index 99cd3b2acff5..86eef83938b2 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme @@ -44,3 +44,78 @@ Description: Read-only. It returns socket_id to indicate which socket this FPGA belongs to, only valid for integrated solution. User only needs this information, in case standard numa node can't provide correct information. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of this global + error reporting private feature. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file for errors detected on pcie0 link. + Write this file to clear errors logged in pcie0_errors. Write + fails with -EINVAL if input parsing fails or input error code + doesn't match. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file for errors detected on pcie1 link. + Write this file to clear errors logged in pcie1_errors. Write + fails with -EINVAL if input parsing fails or input error code + doesn't match. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns non-fatal errors detected. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns catastrophic and fatal errors detected. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file to check errors injected. Write this + file to inject errors for testing purpose. Write fails with + -EINVAL if input parsing fails or input inject error code isn't + supported. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get errors detected by hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the second error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. Write error code to this file to clear all errors + logged in errors, first_error and next_error. Write fails with + -EINVAL if input parsing fails or input error code doesn't + match. diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index 72558914a29c..4865b74b00a4 100644 --- a/drivers/fpga/Makefile +++ b/drivers/fpga/Makefile @@ -39,7 +39,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o -dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o +dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o
[PATCH 06/15] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
From: Wu Hao In order to support virtualization usage via PCIe SRIOV, this patch adds two ioctls under FPGA Management Engine (FME) to release and assign back the port device. In order to safely turn Port from PF into VF and enable PCIe SRIOV, it requires user to invoke this PORT_RELEASE ioctl to release port firstly to remove userspace interfaces, and then configure the PF/VF access register in FME. After disable SRIOV, it requires user to invoke this PORT_ASSIGN ioctl to attach the port back to PF. Ioctl interfaces: * DFL_FPGA_FME_PORT_RELEASE Release platform device of given port, it deletes port platform device to remove related userspace interfaces on PF, then configures PF/VF access mode to VF. * DFL_FPGA_FME_PORT_ASSIGN Assign platform device of given port back to PF, it configures PF/VF access mode to PF, then adds port platform device back to re-enable related userspace interfaces on PF. Signed-off-by: Zhang Yi Z Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-fme-main.c | 54 + drivers/fpga/dfl.c| 107 ++ drivers/fpga/dfl.h| 10 include/uapi/linux/fpga-dfl.h | 32 ++ 4 files changed, 191 insertions(+), 12 deletions(-) diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 076d74f6416d..8b2a33760483 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -16,6 +16,7 @@ #include #include +#include #include #include "dfl.h" @@ -105,9 +106,62 @@ static void fme_hdr_uinit(struct platform_device *pdev, sysfs_remove_files(&pdev->dev.kobj, fme_hdr_attrs); } +static long fme_hdr_ioctl_release_port(struct dfl_feature_platform_data *pdata, + void __user *arg) +{ + struct dfl_fpga_cdev *cdev = pdata->dfl_cdev; + struct dfl_fpga_fme_port_release release; + unsigned long minsz; + + minsz = offsetofend(struct dfl_fpga_fme_port_release, port_id); + + if (copy_from_user(&release, arg, minsz)) + return -EFAULT; + + if (release.argsz < minsz || release.flags) + return -EINVAL; + + return dfl_fpga_cdev_config_port(cdev, release.port_id, true); +} + +static long fme_hdr_ioctl_assign_port(struct dfl_feature_platform_data *pdata, + void __user *arg) +{ + struct dfl_fpga_cdev *cdev = pdata->dfl_cdev; + struct dfl_fpga_fme_port_assign assign; + unsigned long minsz; + + minsz = offsetofend(struct dfl_fpga_fme_port_assign, port_id); + + if (copy_from_user(&assign, arg, minsz)) + return -EFAULT; + + if (assign.argsz < minsz || assign.flags) + return -EINVAL; + + return dfl_fpga_cdev_config_port(cdev, assign.port_id, false); +} + +static long fme_hdr_ioctl(struct platform_device *pdev, + struct dfl_feature *feature, + unsigned int cmd, unsigned long arg) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); + + switch (cmd) { + case DFL_FPGA_FME_PORT_RELEASE: + return fme_hdr_ioctl_release_port(pdata, (void __user *)arg); + case DFL_FPGA_FME_PORT_ASSIGN: + return fme_hdr_ioctl_assign_port(pdata, (void __user *)arg); + } + + return -ENODEV; +} + static const struct dfl_feature_ops fme_hdr_ops = { .init = fme_hdr_init, .uinit = fme_hdr_uinit, + .ioctl = fme_hdr_ioctl, }; static struct dfl_feature_driver fme_feature_drvs[] = { diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 4b66aaa32b5a..308c80868af4 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -231,16 +231,20 @@ EXPORT_SYMBOL_GPL(dfl_fpga_port_ops_del); */ int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id) { - struct dfl_fpga_port_ops *port_ops = dfl_fpga_port_ops_get(pdev); - int port_id; + struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); + struct dfl_fpga_port_ops *port_ops; + + if (pdata->id != FEATURE_DEV_ID_UNUSED) + return pdata->id == *(int *)pport_id; + port_ops = dfl_fpga_port_ops_get(pdev); if (!port_ops || !port_ops->get_id) return 0; - port_id = port_ops->get_id(pdev); + pdata->id = port_ops->get_id(pdev); dfl_fpga_port_ops_put(port_ops); - return port_id == *(int *)pport_id; + return pdata->id == *(int *)pport_id; } EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id); @@ -474,6 +478,7 @@ static int build_info_commit_dev(struct build_feature_devs_info *binfo) pdata->dev = fdev; pdata->num = binfo->feature_num; pdata->dfl_cdev = binfo->cdev; + pdata->id = FEATURE_DEV_ID_UNUSED; mutex_init(&pdata->lock);
[PATCH 01/15] fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.
From: Wu Hao FME_PR_INTFC_ID is used as compat_id for fpga manager and region, but high 64 bits and low 64 bits of the compat_id are swapped by mistake. This patch fixes this problem by fixing register address. Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Moritz Fischer --- drivers/fpga/dfl-fme-mgr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c index 76f37709dd1a..b3f7eee3c93f 100644 --- a/drivers/fpga/dfl-fme-mgr.c +++ b/drivers/fpga/dfl-fme-mgr.c @@ -30,8 +30,8 @@ #define FME_PR_STS 0x10 #define FME_PR_DATA0x18 #define FME_PR_ERR 0x20 -#define FME_PR_INTFC_ID_H 0xA8 -#define FME_PR_INTFC_ID_L 0xB0 +#define FME_PR_INTFC_ID_L 0xA8 +#define FME_PR_INTFC_ID_H 0xB0 /* FME PR Control Register Bitfield */ #define FME_PR_CTRL_PR_RST BIT_ULL(0) /* Reset PR engine */ -- 2.22.0
[PATCH] kbuild: get rid of misleading $(AS) from documents
The assembler files in the kernel are *.S instead of *.s, so they must be preprocessed. Hence, we always use $(CC) as an assembler driver. $(AS) is almost unused in Kbuild. As of writing, there is just one user. $ git grep '$(AS)' -- :^Documentation drivers/net/wan/Makefile: AS68K = $(AS) The documentation about *_AFLAGS* sounds like the flags were passed to $(AS). This is somewhat misleading since we do not invoke $(AS) directly. Signed-off-by: Masahiro Yamada --- Documentation/kbuild/kbuild.txt| 5 ++--- Documentation/kbuild/makefiles.txt | 12 ++-- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index 9c230ea71963..7a7e2aa2fab5 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and modules). AFLAGS_MODULE -- -Additional module specific options to use for $(AS). +Additional module specific options to use for assembler. AFLAGS_KERNEL -- -Additional options for $(AS) when used for assembler -code for code that is compiled as built-in. +Additional options when used for assembling code that is compiled as built-in. KCFLAGS -- diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index d65ad5746f94..f0b3a30b985d 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -306,7 +306,7 @@ more details, with real examples. variable $(KBUILD_CFLAGS) and uses it for compilation flags for the entire tree. - asflags-y specifies options for assembling with $(AS). + asflags-y specifies options for assembling. Example: #arch/sparc/kernel/Makefile @@ -441,7 +441,7 @@ more details, with real examples. as-instr checks if the assembler reports a specific instruction and then outputs either option1 or option2 C escapes are supported in the test instruction - Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options + Note: as-instr-option uses KBUILD_AFLAGS for assembler options cc-option cc-option is used to check if $(CC) supports a given option, and if @@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed (roughly): In this example, the binary $(obj)/image is a binary version of vmlinux. The usage of $(call if_changed,xxx) will be described later. -KBUILD_AFLAGS $(AS) assembler flags +KBUILD_AFLAGS assembler flags Default value - see top level Makefile Append or modify as required per architecture. @@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed (roughly): The first example utilises the trick that a config option expands to 'y' when selected. -KBUILD_AFLAGS_KERNEL $(AS) options specific for built-in +KBUILD_AFLAGS_KERNEL assembler options specific for built-in $(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile resident kernel code. -KBUILD_AFLAGS_MODULE Options for $(AS) when building modules +KBUILD_AFLAGS_MODULE Options for assembler when building modules $(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that - are used for $(AS). + are used for assembler. From commandline AFLAGS_MODULE shall be used (see kbuild.txt). KBUILD_CFLAGS_KERNEL $(CC) options specific for built-in -- 2.17.1
Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list
On Thu, 2019-06-27 at 21:39 -0300, Mauro Carvalho Chehab wrote: > Em Thu, 27 Jun 2019 15:08:59 -0700 > Joe Perches escreveu: [] > > > hrtimer: Use a bullet for the returns bullet list > > > > > > That gets rid of this warning: > > > > > >./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a > > > blank line; unexpected unindent. > > > > Doesn't this form occur multiple dozens of times in > > kernel sources? > > > > For instance: > > > > $ git grep -B3 -A5 -P "^ \* Returns:?$" | \ > > grep -P -A8 '\-\s+\*\s*@\w+:' > > Yes, this is a common pattern, but not all patterns that match the above > regex are broken. > > > I think the warning is odd at best and docutils might > > be updated or the warning ignored or suppressed. > > > > > and displays nicely both at the source code and at the produced > > > documentation. > > The warnings are painful - and they're the main reason why I wrote this > change: - I wanted to avoid new warnings actually unrelated to my > changes that were sometimes appearing while doing incremental > "make htmldocs" on a big patchset that I've been rebasing almost every > week over the last two months. > > - > > Yet, did you try to look how this pattern will appear at the html and pdf > output? No I did not. I just would like to avoid changing perfectly intelligible kernel-doc content into something less directly readable for the sake of external output. I don't use the externally generated formatted output docs. I read and use the source when necessary. Automatic creation of bulleted blocks from relatively unformatted content is a hard problem. I appreciate the work Mauro, I just would like to minimize the necessary changes if possible. The grep I did was trivial, I'm sure there are better tools to isolate the kernel-doc bits where the Return: block is emitted. > Something like this: > > sound/soc/codecs/wm8960.c: * Returns: > sound/soc/codecs/wm8960.c- * -1, in case no sysclk frequency available > found > sound/soc/codecs/wm8960.c- * >=0, in case we could derive bclk and > lrclk from sysclk using > sound/soc/codecs/wm8960.c- * (@sysclk_idx, @dac_idx, @bclk_idx) > dividers > > > Will be displayed as: > > **Returns:** > -1, in case no sysclk frequency available found **>=0, in case we > could derive bclk and lrclk from sysclk using** (@sysclk_idx, @dac_idx, > @bclk_idx) dividers > (where **foo**) means that "foo" will be printed in bold.> That's a yuck from me. > While it would likely be possible to improve kernel-doc to present better > results, I'm afraid that it would be too complex for simple regex > expressions, and hard to tune, as it would be a hint-based approach, > and doing a natural language processing would be too much effort. Yeah, tough problem. I don't envy it. cheers and g'luck...
Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
Hi Will, On Thu, Jun 27, 2019 at 9:52 PM Will Deacon wrote: > > On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote: > > On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote: > > > From: Ganapatrao Kulkarni > > > > > > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU. > > > > > > Signed-off-by: Ganapatrao Kulkarni > > > --- > > > Documentation/perf/thunderx2-pmu.txt | 20 +++- > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > diff --git a/Documentation/perf/thunderx2-pmu.txt > > > b/Documentation/perf/thunderx2-pmu.txt > > > index dffc57143736..62243230abc3 100644 > > > --- a/Documentation/perf/thunderx2-pmu.txt > > > +++ b/Documentation/perf/thunderx2-pmu.txt > > > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU > > > UNCORE) > > > = > > > > > > The ThunderX2 SoC PMU consists of independent, system-wide, per-socket > > > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC). > > > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and > > > +Cavium Coherent Processor Interconnect (CCPI2). > > > > > > The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles. > > > Events are counted for the default channel (i.e. channel 0) and prorated > > > to the total number of channels/tiles. > > > > > > -The DMC and L3C support up to 4 counters. Counters are independently > > > -programmable and can be started and stopped individually. Each counter > > > -can be set to a different event. Counters are 32-bit and do not support > > > -an overflow interrupt; they are read every 2 seconds. > > > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters. > > > > The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8 > > counters. > > > > > +Counters are independently programmable and can be started and stopped > > > +individually. Each counter can be set to a different event. DMC and L3C > > > +Counters are 32-bit and do not support an overflow interrupt; they are > > > read > > > > Counters -> counters > > > > > +every 2 seconds. CCPI2 counters are 64-bit. > > > > Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these > > two sentences as: > > > > None of the counters support an overflow interrupt and therefore sampling > > events are unsupported. The DMC and L3C counters are 32-bit and read every > > 2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in > > normal operation. > Thanks for the comments, will update in v2. Yes, CCPI2 is 64bit counter and there is no overflow issue. > Mark reminded me that these are system PMUs and therefore sampling is > unsupported irrespective of the presence of an overflow interrupt, so you > can drop that part from the text. sure. > > Sorry for the confusion, > > Will Thanks, Ganapat
Re: [PATCH v3 0/4] Compile-test UAPI and kernel headers
On Fri, Jun 28, 2019 at 1:41 AM Masahiro Yamada wrote: > > 1/4: Compile-test exported headers (reworked in v2) > > 2/4: fix a flaw I noticed when I was working on this series. > Avoid generating intermediate wrappers. > > 3/4: maybe useful for 4/4 and in some other places. > Add header-test-pattern-y syntax. > > 4/4: Compile-test kernel-space headers in include/. > v2: compile as many headers as possible. > v3: exclude more headers causing build errors I push this series to git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git header-test-v3 for somebody who wants to test it. > > Masahiro Yamada (4): > kbuild: compile-test UAPI headers to ensure they are self-contained > kbuild: do not create wrappers for header-test-y > kbuild: support header-test-pattern-y > kbuild: compile-test kernel headers to ensure they are self-contained > > .gitignore |1 - > Documentation/dontdiff |1 - > Documentation/kbuild/makefiles.txt | 13 +- > Makefile |4 +- > include/Kbuild | 1250 > init/Kconfig | 22 + > scripts/Makefile.build | 10 +- > scripts/Makefile.lib | 13 +- > scripts/cc-system-headers.sh |8 + > usr/.gitignore |1 - > usr/Makefile |2 + > usr/include/.gitignore |3 + > usr/include/Makefile | 134 +++ > 13 files changed, 1449 insertions(+), 13 deletions(-) > create mode 100644 include/Kbuild > create mode 100755 scripts/cc-system-headers.sh > create mode 100644 usr/include/.gitignore > create mode 100644 usr/include/Makefile > > -- > 2.17.1 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- Best Regards Masahiro Yamada
Re: [PATCH] kbuild: get rid of misleading $(AS) from documents
On Fri, Jun 28, 2019 at 11:04:33AM +0900, Masahiro Yamada wrote: > The assembler files in the kernel are *.S instead of *.s, so they must > be preprocessed. Hence, we always use $(CC) as an assembler driver. > > $(AS) is almost unused in Kbuild. As of writing, there is just one user. > > $ git grep '$(AS)' -- :^Documentation > drivers/net/wan/Makefile: AS68K = $(AS) > > The documentation about *_AFLAGS* sounds like the flags were passed > to $(AS). This is somewhat misleading since we do not invoke $(AS) > directly. > > Signed-off-by: Masahiro Yamada I did notice this when I grepped for $(AS) in the tree. Certainly makes sense and looks good to me. Reviewed-by: Nathan Chancellor > --- > > Documentation/kbuild/kbuild.txt| 5 ++--- > Documentation/kbuild/makefiles.txt | 12 ++-- > 2 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt > index 9c230ea71963..7a7e2aa2fab5 100644 > --- a/Documentation/kbuild/kbuild.txt > +++ b/Documentation/kbuild/kbuild.txt > @@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and > modules). > > AFLAGS_MODULE > -- > -Additional module specific options to use for $(AS). > +Additional module specific options to use for assembler. > > AFLAGS_KERNEL > -- > -Additional options for $(AS) when used for assembler > -code for code that is compiled as built-in. > +Additional options when used for assembling code that is compiled as > built-in. > > KCFLAGS > -- > diff --git a/Documentation/kbuild/makefiles.txt > b/Documentation/kbuild/makefiles.txt > index d65ad5746f94..f0b3a30b985d 100644 > --- a/Documentation/kbuild/makefiles.txt > +++ b/Documentation/kbuild/makefiles.txt > @@ -306,7 +306,7 @@ more details, with real examples. > variable $(KBUILD_CFLAGS) and uses it for compilation flags for the > entire tree. > > - asflags-y specifies options for assembling with $(AS). > + asflags-y specifies options for assembling. > > Example: > #arch/sparc/kernel/Makefile > @@ -441,7 +441,7 @@ more details, with real examples. > as-instr checks if the assembler reports a specific instruction > and then outputs either option1 or option2 > C escapes are supported in the test instruction > - Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options > + Note: as-instr-option uses KBUILD_AFLAGS for assembler options > > cc-option > cc-option is used to check if $(CC) supports a given option, and if > @@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed > (roughly): > In this example, the binary $(obj)/image is a binary version of > vmlinux. The usage of $(call if_changed,xxx) will be described later. > > -KBUILD_AFLAGS$(AS) assembler flags > +KBUILD_AFLAGSassembler flags > > Default value - see top level Makefile > Append or modify as required per architecture. > @@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed > (roughly): > The first example utilises the trick that a config option expands > to 'y' when selected. > > -KBUILD_AFLAGS_KERNEL $(AS) options specific for built-in > +KBUILD_AFLAGS_KERNEL assembler options specific for built-in > > $(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile > resident kernel code. > > -KBUILD_AFLAGS_MODULE Options for $(AS) when building modules > +KBUILD_AFLAGS_MODULE Options for assembler when building modules > > $(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that > - are used for $(AS). > + are used for assembler. > From commandline AFLAGS_MODULE shall be used (see kbuild.txt). > > KBUILD_CFLAGS_KERNEL $(CC) options specific for built-in > -- > 2.17.1 >
Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
Hi will, On Thu, Jun 27, 2019 at 3:27 PM Will Deacon wrote: > > Hi Ganapat, > > On Fri, Jun 14, 2019 at 05:42:46PM +, Ganapatrao Kulkarni wrote: > > CCPI2 is a low-latency high-bandwidth serial interface for connecting > > ThunderX2 processors. This patch adds support to capture CCPI2 perf events. > > > > Signed-off-by: Ganapatrao Kulkarni > > --- > > drivers/perf/thunderx2_pmu.c | 179 ++- > > 1 file changed, 157 insertions(+), 22 deletions(-) > > > > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c > > index 43d76c85da56..3791ac9b897d 100644 > > --- a/drivers/perf/thunderx2_pmu.c > > +++ b/drivers/perf/thunderx2_pmu.c > > @@ -16,7 +16,9 @@ > > * they need to be sampled before overflow(i.e, at every 2 seconds). > > */ > > > > -#define TX2_PMU_MAX_COUNTERS 4 > > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4 > > I find it odd that you rename this... i am not sure, how to avoid this since dmc/l3c have 4 counters and ccpi2 has 8. i will try to make this better in v2. > > > +#define TX2_PMU_CCPI2_MAX_COUNTERS 8 > > + > > #define TX2_PMU_DMC_CHANNELS 8 > > #define TX2_PMU_L3_TILES 16 > > > > @@ -28,11 +30,22 @@ > >*/ > > #define DMC_EVENT_CFG(idx, val) ((val) << (((idx) * 8) + 1)) > > > > +#define GET_EVENTID_CCPI2(ev)((ev->hw.config) & 0x1ff) > > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */ > > +#define GET_COUNTERID_CCPI2(ev) ((ev->hw.idx) & 0x7) > > +#define CCPI2_COUNTER_OFFSET 8 > > > ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C > events. Saying that, if you have the event you can figure out its type, > so could you avoid the need for additional macros entirely and just use > the correct masks based on the corresponding PMU type? That might also > avoid some of the conditional control flow you're introducing elsewhere. sure, i will add mask as argument to macro. > > > #define L3C_COUNTER_CTL 0xA8 > > #define L3C_COUNTER_DATA 0xAC > > #define DMC_COUNTER_CTL 0x234 > > #define DMC_COUNTER_DATA 0x240 > > > > +#define CCPI2_PERF_CTL 0x108 > > +#define CCPI2_COUNTER_CTL0x10C > > +#define CCPI2_COUNTER_SEL0x12c > > +#define CCPI2_COUNTER_DATA_L 0x130 > > +#define CCPI2_COUNTER_DATA_H 0x134 > > + > > /* L3C event IDs */ > > #define L3_EVENT_READ_REQ0xD > > #define L3_EVENT_WRITEBACK_REQ 0xE > > @@ -51,9 +64,16 @@ > > #define DMC_EVENT_READ_TXNS 0xF > > #define DMC_EVENT_MAX0x10 > > > > +#define CCPI2_EVENT_REQ_PKT_SENT 0x3D > > +#define CCPI2_EVENT_SNOOP_PKT_SENT 0x65 > > +#define CCPI2_EVENT_DATA_PKT_SENT0x105 > > +#define CCPI2_EVENT_GIC_PKT_SENT 0x12D > > +#define CCPI2_EVENT_MAX 0x200 > > + > > enum tx2_uncore_type { > > PMU_TYPE_L3C, > > PMU_TYPE_DMC, > > + PMU_TYPE_CCPI2, > > PMU_TYPE_INVALID, > > }; > > > > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu { > > u32 max_events; > > u64 hrtimer_interval; > > void __iomem *base; > > - DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS); > > - struct perf_event *events[TX2_PMU_MAX_COUNTERS]; > > + DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS); > > + struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS]; > > Hmm, that looks very odd. Why are they now different sizes? events[ ] is used to hold reference to active events to use in timer callback, which is not applicable to ccpi2, hence 4. active_counters is set to max of both. i.e, 8. i will try to make it better readable in v2. > > > struct device *dev; > > struct hrtimer hrtimer; > > const struct attribute_group **attr_groups; > > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu > > *pmu_to_tx2_pmu(struct pmu *pmu) > > return container_of(pmu, struct tx2_uncore_pmu, pmu); > > } > > > > -PMU_FORMAT_ATTR(event, "config:0-4"); > > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)\ > > +static ssize_t > > \ > > +__tx2_pmu_##_var##_show(struct device *dev, \ > > +struct device_attribute *attr, \ > > +char *page) \ > > +{\ > > + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ > > + return sprintf(page, _format "\n"); \ > > +}\ > > + \ > > +static struct device_attribute format_attr_##_var = \ > > + __ATTR(_name
Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE
Hi Jarkko and Sasha, On Thu, 27 Jun 2019 at 18:47, Jarkko Sakkinen wrote: > > On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote: > > > You've used so much on this so shouldn't this have that somewhat new > > > co-developed-by tag? I'm also wondering can this work at all > > > > Honestly, I've just been massaging this patch more than "authoring" it. > > If you feel strongly about it feel free to add a Co-authored patch with > > my name, but in my mind this is just Thiru's work. > > This is just my subjective view but writing code is easier than making > it work in the mainline in 99% of cases. If this patch was doing > something revolutional, lets say a new outstanding scheduling algorithm, > then I would think otherwise. It is not. You without question deserve > both credit and also the blame (if this breaks everything) :-) > > > > process-wise if the original author of the patch is also the only tester > > > of the patch? > > > > There's not much we can do about this... Linaro folks have tested this > > without the fTPM firmware, so at the very least it won't explode for > > everyone. If for some reason non-microsoft folks see issues then we can > > submit patches on top to fix this, we're not just throwing this at you > > and running away. > > So why any of those Linaro folks can't do it? I can add after tested-by > tag parentheses something explaining that context of testing. It is > reasonable given the circumstances. Simply because the hardware I have (Developerbox) doesn't provide enough flash space (as per current memory map) for this fTPM driver to be loaded as early TA along with OP-TEE binary. So I can't get any further point than sanity probe failure check for which I think a tested-by won't be appropriate. -Sumit > > I can also give an explanation in my next PR along the lines what you > are saying. This would definitely work for me. > > /Jarkko >
Re: [PATCH] kbuild: get rid of misleading $(AS) from documents
On Fri, Jun 28, 2019 at 11:06 AM Masahiro Yamada wrote: > > The assembler files in the kernel are *.S instead of *.s, so they must > be preprocessed. Hence, we always use $(CC) as an assembler driver. > > $(AS) is almost unused in Kbuild. As of writing, there is just one user. > > $ git grep '$(AS)' -- :^Documentation > drivers/net/wan/Makefile: AS68K = $(AS) In Makefile, a variable is also referenced in a curly brace form like ${AS}. In shell scripts, it can be referenced in the form of $AR. I grepped more patterns. $ git grep -e '$(AS)' -e '${AS}' -e '$AS' -e '$(AS:' -e '${AS:' -- :^Documentation drivers/net/wan/Makefile: AS68K = $(AS) I found only user still. > The documentation about *_AFLAGS* sounds like the flags were passed > to $(AS). This is somewhat misleading since we do not invoke $(AS) > directly. > > Signed-off-by: Masahiro Yamada > --- > > Documentation/kbuild/kbuild.txt| 5 ++--- > Documentation/kbuild/makefiles.txt | 12 ++-- > 2 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt > index 9c230ea71963..7a7e2aa2fab5 100644 > --- a/Documentation/kbuild/kbuild.txt > +++ b/Documentation/kbuild/kbuild.txt > @@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and > modules). > > AFLAGS_MODULE > -- > -Additional module specific options to use for $(AS). > +Additional module specific options to use for assembler. > > AFLAGS_KERNEL > -- > -Additional options for $(AS) when used for assembler > -code for code that is compiled as built-in. > +Additional options when used for assembling code that is compiled as > built-in. > > KCFLAGS > -- > diff --git a/Documentation/kbuild/makefiles.txt > b/Documentation/kbuild/makefiles.txt > index d65ad5746f94..f0b3a30b985d 100644 > --- a/Documentation/kbuild/makefiles.txt > +++ b/Documentation/kbuild/makefiles.txt > @@ -306,7 +306,7 @@ more details, with real examples. > variable $(KBUILD_CFLAGS) and uses it for compilation flags for the > entire tree. > > - asflags-y specifies options for assembling with $(AS). > + asflags-y specifies options for assembling. > > Example: > #arch/sparc/kernel/Makefile > @@ -441,7 +441,7 @@ more details, with real examples. > as-instr checks if the assembler reports a specific instruction > and then outputs either option1 or option2 > C escapes are supported in the test instruction > - Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options > + Note: as-instr-option uses KBUILD_AFLAGS for assembler options > > cc-option > cc-option is used to check if $(CC) supports a given option, and if > @@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed > (roughly): > In this example, the binary $(obj)/image is a binary version of > vmlinux. The usage of $(call if_changed,xxx) will be described later. > > -KBUILD_AFLAGS $(AS) assembler flags > +KBUILD_AFLAGS assembler flags > > Default value - see top level Makefile > Append or modify as required per architecture. > @@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed > (roughly): > The first example utilises the trick that a config option expands > to 'y' when selected. > > -KBUILD_AFLAGS_KERNEL $(AS) options specific for built-in > +KBUILD_AFLAGS_KERNEL assembler options specific for built-in > > $(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to > compile > resident kernel code. > > -KBUILD_AFLAGS_MODULE Options for $(AS) when building modules > +KBUILD_AFLAGS_MODULE Options for assembler when building modules > > $(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that > - are used for $(AS). > + are used for assembler. > From commandline AFLAGS_MODULE shall be used (see kbuild.txt). > > KBUILD_CFLAGS_KERNEL $(CC) options specific for built-in > -- > 2.17.1 > -- Best Regards Masahiro Yamada
[PATCH] Doc : doc-guide : Fix a typo
fix the disjunction by replacing "of" with "or". Signed-off-by: Sheriff Esseson --- Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index f96059767..192c36af3 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -359,7 +359,7 @@ Domain`_ references. ``monospaced font``. Useful if you need to use special characters that would otherwise have some - meaning either by kernel-doc script of by reStructuredText. + meaning either by kernel-doc script or by reStructuredText. This is particularly useful if you need to use things like ``%ph`` inside a function description. -- 2.22.0
[linux-kernel-mentees] [PATCH] Doc : doc-guide : Fix a typo
fix the disjunction by replacing "of" with "or". Signed-off-by: Sheriff Esseson --- Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index f96059767..192c36af3 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -359,7 +359,7 @@ Domain`_ references. ``monospaced font``. Useful if you need to use special characters that would otherwise have some - meaning either by kernel-doc script of by reStructuredText. + meaning either by kernel-doc script or by reStructuredText. This is particularly useful if you need to use things like ``%ph`` inside a function description. -- 2.22.0
[linux-kernel-mentees] [PATCH v2] Doc : doc-guide : Fix a typo
fix the disjunction by replacing "of" with "or". Signed-off-by: Sheriff Esseson --- changes in v2: - cc-ed Corbet. Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index f96059767..192c36af3 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -359,7 +359,7 @@ Domain`_ references. ``monospaced font``. Useful if you need to use special characters that would otherwise have some - meaning either by kernel-doc script of by reStructuredText. + meaning either by kernel-doc script or by reStructuredText. This is particularly useful if you need to use things like ``%ph`` inside a function description. -- 2.22.0
Re: [linux-kernel-mentees] [PATCH v2] Doc : doc-guide : Fix a typo
fix the disjunction by replacing "of" with "or". Signed-off-by: Sheriff Esseson --- changes in v2: - cc-ed Corbet. Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index f96059767..192c36af3 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -359,7 +359,7 @@ Domain`_ references. ``monospaced font``. Useful if you need to use special characters that would otherwise have some - meaning either by kernel-doc script of by reStructuredText. + meaning either by kernel-doc script or by reStructuredText. This is particularly useful if you need to use things like ``%ph`` inside a function description. -- 2.22.0
Re: [PATCH next v2] softirq: enable MAX_SOFTIRQ_TIME tuning with sysctl, max_softirq_time_msecs
Friendly ping ... On 2019/6/25 11:13, Zhiqiang Liu wrote: > From: Zhiqiang liu > > In __do_softirq func, MAX_SOFTIRQ_TIME was set to 2ms via experimentation by > commit c10d73671 ("softirq: reduce latencies") in 2013, which was designed > to reduce latencies for various network workloads. The key reason is that the > maximum number of microseconds in one NAPI polling cycle in net_rx_action func > was set to 2 jiffies, so different HZ settting will lead to different > latencies. > > However, commit 7acf8a1e8 ("Replace 2 jiffies with sysctl netdev_budget_usecs > to enable softirq tuning") adopts netdev_budget_usecs to tun maximum number of > microseconds in one NAPI polling cycle. So the latencies of net_rx_action can > be > controlled by sysadmins to copy with hardware changes over time. > > Correspondingly, the MAX_SOFTIRQ_TIME should be able to be tunned by > sysadmins, > who knows best about hardware performance, for excepted tradeoff between > latence > and fairness. Here, we add sysctl variable max_softirq_time_msecs to replace > MAX_SOFTIRQ_TIME with 2ms default value. > > Note: max_softirq_time_msecs will be coverted to jiffies, and any budget > value will be rounded up to the next jiffies, which relates to CONFIG_HZ. > The time accuracy of jiffies will result in a certain difference > between the setting jiffies of max_softirq_time_msecs and the actual > value, which is in one jiffies range. > > Signed-off-by: Zhiqiang liu > --- > Documentation/sysctl/kernel.txt | 17 + > kernel/softirq.c| 8 +--- > kernel/sysctl.c | 9 + > 3 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index f0c86fbb3b48..23b36393f150 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -44,6 +44,7 @@ show up in /proc/sys/kernel: > - kexec_load_disabled > - kptr_restrict > - l2cr[ PPC only ] > +- max_softirq_time_msecs > - modprobe==> Documentation/debugging-modules.txt > - modules_disabled > - msg_next_id [ sysv ipc ] > @@ -445,6 +446,22 @@ This flag controls the L2 cache of G3 processor boards. > If > > == > > +max_softirq_time_msecs: > + > +Maximum number of milliseconds to break the loop of restarting softirq > +processing for at most MAX_SOFTIRQ_RESTART times in __do_softirq(). > +max_softirq_time_msecs will be coverted to jiffies, and any budget > +value will be rounded up to the next jiffies, which relates to CONFIG_HZ. > +The time accuracy of jiffies will result in a certain difference > +between the setting jiffies of max_softirq_time_msecs and the actual > +value, which is in one jiffies range. > + > +max_softirq_time_msecs is a non-negative integer value, and setting > +negative value is meaningless and will return error. > +Default: 2 > + > +== > + > modules_disabled: > > A toggle value indicating if modules are allowed to be loaded > diff --git a/kernel/softirq.c b/kernel/softirq.c > index a6b81c6b6bff..1e456db70093 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -199,7 +199,8 @@ EXPORT_SYMBOL(__local_bh_enable_ip); > > /* > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, > - * but break the loop if need_resched() is set or after 2 ms. > + * but break the loop if need_resched() is set or after > + * max_softirq_time_msecs msecs. > * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in > * certain cases, such as stop_machine(), jiffies may cease to > * increment and so we need the MAX_SOFTIRQ_RESTART limit as > @@ -210,7 +211,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); > * we want to handle softirqs as soon as possible, but they > * should not be able to lock up the box. > */ > -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +unsigned int __read_mostly max_softirq_time_msecs = 2; > #define MAX_SOFTIRQ_RESTART 10 > > #ifdef CONFIG_TRACE_IRQFLAGS > @@ -248,7 +249,8 @@ static inline void lockdep_softirq_end(bool in_hardirq) { > } > > asmlinkage __visible void __softirq_entry __do_softirq(void) > { > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > + unsigned long end = jiffies + > + msecs_to_jiffies(max_softirq_time_msecs); > unsigned long old_flags = current->flags; > int max_restart = MAX_SOFTIRQ_RESTART; > struct softirq_action *h; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 1beca96fb625..96ff292ce7f6 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -118,6 +118,7 @@ extern unsigned int sysctl_nr_open_min, > sysctl_nr_open_max; > #ifndef CONFIG_MMU > extern int sysctl_nr_trim_pages; > #endif > +extern unsigned int max_softirq_time_msecs; > > /* Constants used for minimum and maximu