Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver

2019-06-27 Thread Paul Cercueil




Le jeu. 27 juin 2019 à 8:58, Lee Jones  a écrit 
:

On Wed, 26 Jun 2019, Paul Cercueil wrote:
 Le mer. 26 juin 2019 à 15:18, Lee Jones  a 
écrit :

 > On Tue, 21 May 2019, Paul Cercueil wrote:
 >
 > >  This driver will provide a regmap that can be retrieved very 
early

 > > in
 > >  the boot process through the API function 
ingenic_tcu_get_regmap().

 > >
 > >  Additionally, it will call devm_of_platform_populate() so that 
all

 > > the
 > >  children devices will be probed.
 > >
 > >  Signed-off-by: Paul Cercueil 
 > >  ---
 > >
 > >  Notes:
 > >  v12: New patch
 > >
 > >   drivers/mfd/Kconfig |   8 +++
 > >   drivers/mfd/Makefile|   1 +
 > >   drivers/mfd/ingenic-tcu.c   | 113
 > > 
 > >   include/linux/mfd/ingenic-tcu.h |   8 +++
 > >   4 files changed, 130 insertions(+)
 > >   create mode 100644 drivers/mfd/ingenic-tcu.c


[...]


 > >  +static struct regmap * __init ingenic_tcu_create_regmap(struct
 > > device_node *np)
 > >  +{
 > >  +   struct resource res;
 > >  +   void __iomem *base;
 > >  +   struct regmap *map;
 > >  +
 > >  +   if (!of_match_node(ingenic_tcu_of_match, np))
 > >  +   return ERR_PTR(-EINVAL);


Drop this check.


 > >  +   base = of_io_request_and_map(np, 0, "TCU");
 > >  +   if (IS_ERR(base))
 > >  +   return ERR_PTR(PTR_ERR(base));
 > >  +
 > >  +	map = regmap_init_mmio(NULL, base, 
&ingenic_tcu_regmap_config);

 > >  +   if (IS_ERR(map))
 > >  +   goto err_iounmap;


Place this inside probe().


 > >  +   return map;
 > >  +
 > >  +err_iounmap:
 > >  +   iounmap(base);
 > >  +   of_address_to_resource(np, 0, &res);
 > >  +   release_mem_region(res.start, resource_size(&res));
 > >  +
 > >  +   return map;
 > >  +}
 >
 > Why does this need to be set-up earlier than probe()?

 See the explanation below.


I think the answer is, it doesn't.

 > >  +static int __init ingenic_tcu_probe(struct platform_device 
*pdev)

 > >  +{
 > >  +	struct regmap *map = 
ingenic_tcu_get_regmap(pdev->dev.of_node);

 > >  +
 > >  +   platform_set_drvdata(pdev, map);
 > >  +
 > >  +	regmap_attach_dev(&pdev->dev, map, 
&ingenic_tcu_regmap_config);

 > >  +
 > >  +   return devm_of_platform_populate(&pdev->dev);
 > >  +}
 > >  +
 > >  +static struct platform_driver ingenic_tcu_driver = {
 > >  +   .driver = {
 > >  +   .name = "ingenic-tcu",
 > >  +   .of_match_table = ingenic_tcu_of_match,
 > >  +   },
 > >  +};
 > >  +
 > >  +static int __init ingenic_tcu_platform_init(void)
 > >  +{
 > >  +   return platform_driver_probe(&ingenic_tcu_driver,
 > >  +ingenic_tcu_probe);
 >
 > What?  Why?

 The device driver probed here will populate the children devices,
 which will be able to retrieve the pointer to the regmap through
 device_get_regmap(dev->parent).


I've never heard of this call.  Where is it?


dev_get_regmap, in .


 The children devices are normal platform drivers that can be probed
 the normal way. These are the PWM driver, the watchdog driver, and 
the
 OST (OS Timer) clocksource driver, all part of the same hardware 
block

 (the Timer/Counter Unit or TCU).


If they are normal devices, then there is no need to roll your own
regmap-getter implementation like this.


 > >  +}
 > >  +subsys_initcall(ingenic_tcu_platform_init);
 > >  +
 > >  +struct regmap * __init ingenic_tcu_get_regmap(struct 
device_node

 > > *np)
 > >  +{
 > >  +   if (!tcu_regmap)
 > >  +   tcu_regmap = ingenic_tcu_create_regmap(np);
 > >  +
 > >  +   return tcu_regmap;
 > >  +}
 >
 > This makes me pretty uncomfortable.
 >
 > What calls it?

 The TCU IRQ driver (patch [06/13]), clocks driver (patch [05/13]), 
and the
 non-OST clocksource driver (patch [07/13]) all probe very early in 
the boot
 process, and share the same devicetree node. They call this 
function to get

 a pointer to the regmap.


Horrible!

Instead, you should send it through platform_set_drvdata() and collect
it in the child drivers with platform_get_drvdata(dev->parent).


The IRQ, clocks and clocksource driver do NOT have a "struct device" to
begin with. They are not platform drivers, and cannot be platform 
drivers,
as they must register so early in the boot process, before "struct 
device"

is even a thing.

All they get is a pointer to the same devicetree node. Since all of 
these

have to use the same registers, they need to use a shared regmap, which
they obtain by calling ingenic_tcu_get_regmap() below.

Then, when this driver's probe gets called, the regmap is retrieved and
attached to the struct device, and then the children devices will be
probed: the watchdog device, the PWM device, the OST device. These three
will retrieve the regmap by calling dev_get_regmap(dev->parent, NULL).

 > >  +bool ingenic_tcu_pwm_can_use_chn(struct device *dev, unsigned 
int

 > > channel)
 > >  +{
 > >  +   const struct ingenic_soc_info *soc =
 > > device_get_match_data(dev->parent);
 > >  +
 > >  +	/* Enable al

Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver

2019-06-27 Thread Lee Jones
On Thu, 27 Jun 2019, Paul Cercueil wrote:
> Le jeu. 27 juin 2019 à 8:58, Lee Jones  a écrit :
> > On Wed, 26 Jun 2019, Paul Cercueil wrote:
> > >  Le mer. 26 juin 2019 à 15:18, Lee Jones  a
> > > écrit :
> > >  > On Tue, 21 May 2019, Paul Cercueil wrote:
> > >  >
> > >  > >  This driver will provide a regmap that can be retrieved very
> > > early
> > >  > > in
> > >  > >  the boot process through the API function
> > > ingenic_tcu_get_regmap().
> > >  > >
> > >  > >  Additionally, it will call devm_of_platform_populate() so that
> > > all
> > >  > > the
> > >  > >  children devices will be probed.
> > >  > >
> > >  > >  Signed-off-by: Paul Cercueil 
> > >  > >  ---
> > >  > >
> > >  > >  Notes:
> > >  > >  v12: New patch
> > >  > >
> > >  > >   drivers/mfd/Kconfig |   8 +++
> > >  > >   drivers/mfd/Makefile|   1 +
> > >  > >   drivers/mfd/ingenic-tcu.c   | 113
> > >  > > 
> > >  > >   include/linux/mfd/ingenic-tcu.h |   8 +++
> > >  > >   4 files changed, 130 insertions(+)
> > >  > >   create mode 100644 drivers/mfd/ingenic-tcu.c
> > 
> > [...]
> > 
> > >  > >  +static struct regmap * __init ingenic_tcu_create_regmap(struct
> > >  > > device_node *np)
> > >  > >  +{
> > >  > >  +   struct resource res;
> > >  > >  +   void __iomem *base;
> > >  > >  +   struct regmap *map;
> > >  > >  +
> > >  > >  +   if (!of_match_node(ingenic_tcu_of_match, np))
> > >  > >  +   return ERR_PTR(-EINVAL);
> > 
> > Drop this check.
> > 
> > >  > >  +   base = of_io_request_and_map(np, 0, "TCU");
> > >  > >  +   if (IS_ERR(base))
> > >  > >  +   return ERR_PTR(PTR_ERR(base));
> > >  > >  +
> > >  > >  +   map = regmap_init_mmio(NULL, base,
> > > &ingenic_tcu_regmap_config);
> > >  > >  +   if (IS_ERR(map))
> > >  > >  +   goto err_iounmap;
> > 
> > Place this inside probe().
> > 
> > >  > >  +   return map;
> > >  > >  +
> > >  > >  +err_iounmap:
> > >  > >  +   iounmap(base);
> > >  > >  +   of_address_to_resource(np, 0, &res);
> > >  > >  +   release_mem_region(res.start, resource_size(&res));
> > >  > >  +
> > >  > >  +   return map;
> > >  > >  +}
> > >  >
> > >  > Why does this need to be set-up earlier than probe()?
> > > 
> > >  See the explanation below.
> > 
> > I think the answer is, it doesn't.
> > 
> > >  > >  +static int __init ingenic_tcu_probe(struct platform_device
> > > *pdev)
> > >  > >  +{
> > >  > >  +   struct regmap *map =
> > > ingenic_tcu_get_regmap(pdev->dev.of_node);
> > >  > >  +
> > >  > >  +   platform_set_drvdata(pdev, map);
> > >  > >  +
> > >  > >  +   regmap_attach_dev(&pdev->dev, map,
> > > &ingenic_tcu_regmap_config);
> > >  > >  +
> > >  > >  +   return devm_of_platform_populate(&pdev->dev);
> > >  > >  +}
> > >  > >  +
> > >  > >  +static struct platform_driver ingenic_tcu_driver = {
> > >  > >  +   .driver = {
> > >  > >  +   .name = "ingenic-tcu",
> > >  > >  +   .of_match_table = ingenic_tcu_of_match,
> > >  > >  +   },
> > >  > >  +};
> > >  > >  +
> > >  > >  +static int __init ingenic_tcu_platform_init(void)
> > >  > >  +{
> > >  > >  +   return platform_driver_probe(&ingenic_tcu_driver,
> > >  > >  +ingenic_tcu_probe);
> > >  >
> > >  > What?  Why?
> > > 
> > >  The device driver probed here will populate the children devices,
> > >  which will be able to retrieve the pointer to the regmap through
> > >  device_get_regmap(dev->parent).
> > 
> > I've never heard of this call.  Where is it?
> 
> dev_get_regmap, in .
> 
> > >  The children devices are normal platform drivers that can be probed
> > >  the normal way. These are the PWM driver, the watchdog driver, and
> > > the
> > >  OST (OS Timer) clocksource driver, all part of the same hardware
> > > block
> > >  (the Timer/Counter Unit or TCU).
> > 
> > If they are normal devices, then there is no need to roll your own
> > regmap-getter implementation like this.
> > 
> > >  > >  +}
> > >  > >  +subsys_initcall(ingenic_tcu_platform_init);
> > >  > >  +
> > >  > >  +struct regmap * __init ingenic_tcu_get_regmap(struct
> > > device_node
> > >  > > *np)
> > >  > >  +{
> > >  > >  +   if (!tcu_regmap)
> > >  > >  +   tcu_regmap = ingenic_tcu_create_regmap(np);
> > >  > >  +
> > >  > >  +   return tcu_regmap;
> > >  > >  +}
> > >  >
> > >  > This makes me pretty uncomfortable.
> > >  >
> > >  > What calls it?
> > > 
> > >  The TCU IRQ driver (patch [06/13]), clocks driver (patch [05/13]),
> > > and the
> > >  non-OST clocksource driver (patch [07/13]) all probe very early in
> > > the boot
> > >  process, and share the same devicetree node. They call this
> > > function to get
> > >  a pointer to the regmap.
> > 
> > Horrible!
> > 
> > Instead, you should send it through platform_set_drvdata() and collect
> > it in the child drivers with platform_get_drvdata(dev->parent).
> 
> The IRQ, clocks and clocksource driver do NOT have a "struct device" to
> begin with. They are not platform drivers, and c

Re: [PATCH v12 04/13] mfd: Add Ingenic TCU driver

2019-06-27 Thread Paul Cercueil




Le jeu. 27 juin 2019 à 11:01, Lee Jones  a 
écrit :

On Thu, 27 Jun 2019, Paul Cercueil wrote:
 Le jeu. 27 juin 2019 à 8:58, Lee Jones  a 
écrit :

 > On Wed, 26 Jun 2019, Paul Cercueil wrote:
 > >  Le mer. 26 juin 2019 à 15:18, Lee Jones 
 a

 > > écrit :
 > >  > On Tue, 21 May 2019, Paul Cercueil wrote:
 > >  >
 > >  > >  This driver will provide a regmap that can be retrieved 
very

 > > early
 > >  > > in
 > >  > >  the boot process through the API function
 > > ingenic_tcu_get_regmap().
 > >  > >
 > >  > >  Additionally, it will call devm_of_platform_populate() so 
that

 > > all
 > >  > > the
 > >  > >  children devices will be probed.
 > >  > >
 > >  > >  Signed-off-by: Paul Cercueil 
 > >  > >  ---
 > >  > >
 > >  > >  Notes:
 > >  > >  v12: New patch
 > >  > >
 > >  > >   drivers/mfd/Kconfig |   8 +++
 > >  > >   drivers/mfd/Makefile|   1 +
 > >  > >   drivers/mfd/ingenic-tcu.c   | 113
 > >  > > 
 > >  > >   include/linux/mfd/ingenic-tcu.h |   8 +++
 > >  > >   4 files changed, 130 insertions(+)
 > >  > >   create mode 100644 drivers/mfd/ingenic-tcu.c
 >
 > [...]
 >
 > >  > >  +static struct regmap * __init 
ingenic_tcu_create_regmap(struct

 > >  > > device_node *np)
 > >  > >  +{
 > >  > >  +struct resource res;
 > >  > >  +void __iomem *base;
 > >  > >  +struct regmap *map;
 > >  > >  +
 > >  > >  +if (!of_match_node(ingenic_tcu_of_match, np))
 > >  > >  +return ERR_PTR(-EINVAL);
 >
 > Drop this check.
 >
 > >  > >  +base = of_io_request_and_map(np, 0, "TCU");
 > >  > >  +if (IS_ERR(base))
 > >  > >  +return ERR_PTR(PTR_ERR(base));
 > >  > >  +
 > >  > >  +map = regmap_init_mmio(NULL, base,
 > > &ingenic_tcu_regmap_config);
 > >  > >  +if (IS_ERR(map))
 > >  > >  +goto err_iounmap;
 >
 > Place this inside probe().
 >
 > >  > >  +return map;
 > >  > >  +
 > >  > >  +err_iounmap:
 > >  > >  +iounmap(base);
 > >  > >  +of_address_to_resource(np, 0, &res);
 > >  > >  +release_mem_region(res.start, resource_size(&res));
 > >  > >  +
 > >  > >  +return map;
 > >  > >  +}
 > >  >
 > >  > Why does this need to be set-up earlier than probe()?
 > >
 > >  See the explanation below.
 >
 > I think the answer is, it doesn't.
 >
 > >  > >  +static int __init ingenic_tcu_probe(struct 
platform_device

 > > *pdev)
 > >  > >  +{
 > >  > >  +struct regmap *map =
 > > ingenic_tcu_get_regmap(pdev->dev.of_node);
 > >  > >  +
 > >  > >  +platform_set_drvdata(pdev, map);
 > >  > >  +
 > >  > >  +regmap_attach_dev(&pdev->dev, map,
 > > &ingenic_tcu_regmap_config);
 > >  > >  +
 > >  > >  +return devm_of_platform_populate(&pdev->dev);
 > >  > >  +}
 > >  > >  +
 > >  > >  +static struct platform_driver ingenic_tcu_driver = {
 > >  > >  +.driver = {
 > >  > >  +.name = "ingenic-tcu",
 > >  > >  +.of_match_table = ingenic_tcu_of_match,
 > >  > >  +},
 > >  > >  +};
 > >  > >  +
 > >  > >  +static int __init ingenic_tcu_platform_init(void)
 > >  > >  +{
 > >  > >  +return platform_driver_probe(&ingenic_tcu_driver,
 > >  > >  + ingenic_tcu_probe);
 > >  >
 > >  > What?  Why?
 > >
 > >  The device driver probed here will populate the children 
devices,
 > >  which will be able to retrieve the pointer to the regmap 
through

 > >  device_get_regmap(dev->parent).
 >
 > I've never heard of this call.  Where is it?

 dev_get_regmap, in .

 > >  The children devices are normal platform drivers that can be 
probed
 > >  the normal way. These are the PWM driver, the watchdog driver, 
and

 > > the
 > >  OST (OS Timer) clocksource driver, all part of the same 
hardware

 > > block
 > >  (the Timer/Counter Unit or TCU).
 >
 > If they are normal devices, then there is no need to roll your own
 > regmap-getter implementation like this.
 >
 > >  > >  +}
 > >  > >  +subsys_initcall(ingenic_tcu_platform_init);
 > >  > >  +
 > >  > >  +struct regmap * __init ingenic_tcu_get_regmap(struct
 > > device_node
 > >  > > *np)
 > >  > >  +{
 > >  > >  +if (!tcu_regmap)
 > >  > >  +tcu_regmap = ingenic_tcu_create_regmap(np);
 > >  > >  +
 > >  > >  +return tcu_regmap;
 > >  > >  +}
 > >  >
 > >  > This makes me pretty uncomfortable.
 > >  >
 > >  > What calls it?
 > >
 > >  The TCU IRQ driver (patch [06/13]), clocks driver (patch 
[05/13]),

 > > and the
 > >  non-OST clocksource driver (patch [07/13]) all probe very 
early in

 > > the boot
 > >  process, and share the same devicetree node. They call this
 > > function to get
 > >  a pointer to the regmap.
 >
 > Horrible!
 >
 > Instead, you should send it through platform_set_drvdata() and 
collect

 > it in the child drivers with platform_get_drvdata(dev->parent).

 The IRQ, clocks and clocksource driver do NOT have a "struct 

Re: [PATCH] binfmt_elf: Extract .note.gnu.property from an ELF file

2019-06-27 Thread Dave Martin
On Wed, Jun 26, 2019 at 10:14:07AM -0700, Andy Lutomirski wrote:
> On Thu, May 2, 2019 at 4:10 AM Dave Martin  wrote:

[...]

> > A couple of questions before I look in more detail:
> >
> > 1) Can we rely on PT_GNU_PROPERTY being present in the phdrs to describe
> > the NT_GNU_PROPERTY_TYPE_0 note?  If so, we can avoid trying to parse
> > irrelevant PT_NOTE segments.
> >
> >
> > 2) Are there standard types for things like the program property header?
> > If not, can we add something in elf.h?  We should try to coordinate with
> > libc on that.  Something like
> >
> 
> Where did PT_GNU_PROPERTY come from?  Are there actual docs for it?
> Can someone here tell us what the actual semantics of this new ELF
> thingy are?  From some searching, it seems like it's kind of an ELF
> note but kind of not.  An actual description would be fantastic.

https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf

I don't know _when_ it was added, and the description is minimal, but
it's there.

(I'd say it's fairly obvious how it should be used, but it could do with
some clarification...)

> Also, I don't think there's any actual requirement that the upstream
> kernel recognize existing CET-enabled RHEL 8 binaries as being
> CET-enabled.  I tend to think that RHEL 8 jumped the gun here.  While
> the upstream kernel should make some reasonble effort to make sure
> that RHEL 8 binaries will continue to run, I don't see why we need to
> go out of our way to keep the full set of mitigations available for
> binaries that were developed against a non-upstream kernel.

If that's an accpetable approach, it should certainly make our life
easier.

> In fact, if we handle the legacy bitmap differently from RHEL 8, we
> may *have* to make sure that we don't recognize existing RHEL 8
> binaries as CET-enabled.

Can't comment on that.  If the existing RHEL 8 binaries strictly don't
have the PT_GNU_PROPERTY phdr, then this might serve a dual purpose ...
otherwise, x86 might need some additional annotation for new binaries.

I'll leave it for others to comment.

Cheers
---Dave


Re: [PATCH] binfmt_elf: Extract .note.gnu.property from an ELF file

2019-06-27 Thread Florian Weimer
* Andy Lutomirski:

> Also, I don't think there's any actual requirement that the upstream
> kernel recognize existing CET-enabled RHEL 8 binaries as being
> CET-enabled.  I tend to think that RHEL 8 jumped the gun here.

The ABI was supposed to be finalized and everyone involved thought it
had been reviewed by the GNU gABI community and other interested
parties.  It had been included in binutils for several releases.

>From my point of view, the kernel is just a consumer of the ABI.  The
kernel would not change an instruction encoding if it doesn't like it
for some reason, either.

> While the upstream kernel should make some reasonble effort to make
> sure that RHEL 8 binaries will continue to run, I don't see why we
> need to go out of our way to keep the full set of mitigations
> available for binaries that were developed against a non-upstream
> kernel.

They were developed against the ABI specification.

I do not have a strong opinion what the kernel should do going forward.
I just want to make clear what happened.

Thanks,
Florian


Re: [PATCH 12/14] doc-rst: add ABI documentation to the admin-guide book

2019-06-27 Thread Jani Nikula
On Fri, 21 Jun 2019, Mauro Carvalho Chehab  wrote:
> Em Wed, 19 Jun 2019 13:37:39 -0300
> Mauro Carvalho Chehab  escreveu:
>
>> Em Tue, 18 Jun 2019 11:47:32 +0300
>> Jani Nikula  escreveu:
>> 
>> > On Mon, 17 Jun 2019, Mauro Carvalho Chehab  
>> > wrote:  
>> > > Yeah, I guess it should be possible to do that. How a python script
>> > > can identify if it was called by Sphinx, or if it was called directly?   
>> > >  
>> > 
>> > if __name__ == '__main__':
>> ># run on the command-line, not imported  
>> 
>> Ok, when I have some spare time, I may try to convert one script
>> to python and see how it behaves. 
>
> Did a quick test here... 
>
> Probably I'm doing something wrong (as I'm a rookie with Python), but,
> in order to be able to use the same script as command line and as an Sphinx
> extension, everything that it is currently there should be "escaped"
> by an:
>
>   if __name__ != '__main__':
>
> As event the class definition:
>
> class KernelCmd(Directive):
>
> depends on:
>
>   from docutils.parsers.rst import directives, Directive
>
> With is only required when one needs to parse ReST - e. g. only
> when the script runs as a Sphinx extension.
>
> If this is right, as we want a script that can run by command line
> to parse and search inside ABI files, at the end of the day, it will
> be a lot easier to maintain if the parser script is different from the
> Sphinx extension. 

Split it into two files, one has the nuts and bolts of parsing and has
the "if __name__ == '__main__':" bit to run on the command line, and the
other interfaces with Sphinx and imports the parser.

> If so, as the Sphinx extension script will need to call a parsing script
> anyway, it doesn't matter the language of the script with will be
> doing the ABI file parsing.

Calling the parser using an API will be easier to use, maintain and
extend than using pipes, with all the interleaved sideband information
to adjust line numbers and whatnot.

BR,
Jani.



>
> See the enclosing file. I didn't add anything from the ABI parsing
> script yet. It was just changed in order to not generate an error
> when trying to run it from command line.
>
>
> Thanks,
> Mauro
> #!/usr/bin/env python3
> # coding=utf-8
> # SPDX-License-Identifier: GPL-2.0
> #
> u"""
> kernel-abi
> ~~
>
> Implementation of the ``kernel-abi`` reST-directive.
>
> :copyright:  Copyright (C) 2016  Markus Heiser
> :copyright:  Copyright (C) 2016-2019  Mauro Carvalho Chehab
> :maintained-by: Mauro Carvalho Chehab 
> :license:GPL Version 2, June 1991 see Linux/COPYING for details.
>
> The ``kernel-abi`` (:py:class:`KernelCmd`) directive calls the
> scripts/get_abi.pl script to parse the Kernel ABI files.
>
> Overview of directive's argument and options.
>
> .. code-block:: rst
>
> .. kernel-abi:: 
> :debug:
>
> The argument  is required. It contains the
> location of the ABI files to be parsed.
>
> ``debug``
>   Inserts a code-block with the *raw* reST. Sometimes it is helpful to see
>   what reST is generated.
>
> """
>
> import codecs
> import os
> import subprocess
> import sys
>
> from os import path
>
> if __name__ != '__main__':
> from docutils import nodes, statemachine
> from docutils.statemachine import ViewList
> from docutils.parsers.rst import directives, Directive
> from docutils.utils.error_reporting import ErrorString
>
> #
> # AutodocReporter is only good up to Sphinx 1.7
> #
> import sphinx
>
> Use_SSI = sphinx.__version__[:3] >= '1.7'
> if Use_SSI:
> from sphinx.util.docutils import switch_source_input
> else:
> from sphinx.ext.autodoc import AutodocReporter
>
> __version__  = '1.0'
>
> if __name__ != '__main__':
> def setup(app):
>
> app.add_directive("kernel-abi", KernelCmd)
> return dict(
> version = __version__
> , parallel_read_safe = True
> , parallel_write_safe = True
> )
>
> class KernelCmd(Directive):
>
> u"""KernelABI (``kernel-abi``) directive"""
>
> required_arguments = 1
> optional_arguments = 2
> has_content = False
> final_argument_whitespace = True
>
> option_spec = {
> "debug" : directives.flag,
> "rst"   : directives.unchanged
> }
>
> def warn(self, message, **replace):
> replace["fname"]   = self.state.document.current_source
> replace["line_no"] = replace.get("line_no", self.lineno)
> message = ("%(fname)s:%(line_no)s: [kernel-abi WARN] : " + 
> message) % replace
> self.state.document.settings.env.app.warn(message, prefix="")
>
> def run(self):
>
> doc = self.state.document
> if not doc.settings.file_insertion_enabled:
> raise self.warning("docutils: file insertion disabled")
>
> env = doc.setting

Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.

2019-06-27 Thread Will Deacon
Hi Ganapat,

On Fri, Jun 14, 2019 at 05:42:46PM +, Ganapatrao Kulkarni wrote:
> CCPI2 is a low-latency high-bandwidth serial interface for connecting
> ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  drivers/perf/thunderx2_pmu.c | 179 ++-
>  1 file changed, 157 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> index 43d76c85da56..3791ac9b897d 100644
> --- a/drivers/perf/thunderx2_pmu.c
> +++ b/drivers/perf/thunderx2_pmu.c
> @@ -16,7 +16,9 @@
>   * they need to be sampled before overflow(i.e, at every 2 seconds).
>   */
>  
> -#define TX2_PMU_MAX_COUNTERS 4
> +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4

I find it odd that you rename this...

> +#define TX2_PMU_CCPI2_MAX_COUNTERS   8
> +
>  #define TX2_PMU_DMC_CHANNELS 8
>  #define TX2_PMU_L3_TILES 16
>  
> @@ -28,11 +30,22 @@
>*/
>  #define DMC_EVENT_CFG(idx, val)  ((val) << (((idx) * 8) + 1))
>  
> +#define GET_EVENTID_CCPI2(ev)((ev->hw.config) & 0x1ff)
> +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> +#define GET_COUNTERID_CCPI2(ev)  ((ev->hw.idx) & 0x7)
> +#define CCPI2_COUNTER_OFFSET 8


... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
events. Saying that, if you have the event you can figure out its type,
so could you avoid the need for additional macros entirely and just use
the correct masks based on the corresponding PMU type? That might also
avoid some of the conditional control flow you're introducing elsewhere.

>  #define L3C_COUNTER_CTL  0xA8
>  #define L3C_COUNTER_DATA 0xAC
>  #define DMC_COUNTER_CTL  0x234
>  #define DMC_COUNTER_DATA 0x240
>  
> +#define CCPI2_PERF_CTL   0x108
> +#define CCPI2_COUNTER_CTL0x10C
> +#define CCPI2_COUNTER_SEL0x12c
> +#define CCPI2_COUNTER_DATA_L 0x130
> +#define CCPI2_COUNTER_DATA_H 0x134
> +
>  /* L3C event IDs */
>  #define L3_EVENT_READ_REQ0xD
>  #define L3_EVENT_WRITEBACK_REQ   0xE
> @@ -51,9 +64,16 @@
>  #define DMC_EVENT_READ_TXNS  0xF
>  #define DMC_EVENT_MAX0x10
>  
> +#define CCPI2_EVENT_REQ_PKT_SENT 0x3D
> +#define CCPI2_EVENT_SNOOP_PKT_SENT   0x65
> +#define CCPI2_EVENT_DATA_PKT_SENT0x105
> +#define CCPI2_EVENT_GIC_PKT_SENT 0x12D
> +#define CCPI2_EVENT_MAX  0x200
> +
>  enum tx2_uncore_type {
>   PMU_TYPE_L3C,
>   PMU_TYPE_DMC,
> + PMU_TYPE_CCPI2,
>   PMU_TYPE_INVALID,
>  };
>  
> @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
>   u32 max_events;
>   u64 hrtimer_interval;
>   void __iomem *base;
> - DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> - struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> + DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> + struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];

Hmm, that looks very odd. Why are they now different sizes?

>   struct device *dev;
>   struct hrtimer hrtimer;
>   const struct attribute_group **attr_groups;
> @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu 
> *pmu_to_tx2_pmu(struct pmu *pmu)
>   return container_of(pmu, struct tx2_uncore_pmu, pmu);
>  }
>  
> -PMU_FORMAT_ATTR(event,   "config:0-4");
> +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)\
> +static ssize_t   
> \
> +__tx2_pmu_##_var##_show(struct device *dev,  \
> +struct device_attribute *attr,   \
> +char *page)  \
> +{\
> + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
> + return sprintf(page, _format "\n"); \
> +}\
> + \
> +static struct device_attribute format_attr_##_var =  \
> + __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
> +
> +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
> +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");

This doesn't look right. Can perf deal with overlapping fields? It seems
wrong that we'd allow the user to specify both, for example.

>  
>  static struct attribute *l3c_pmu_format_attrs[] = {
>   &format_attr_event.attr,
> @@ -104,6 +138,11 @@ static struct attribute *dmc_pmu_format_attrs[] = {
>   NULL,
>  };
>  
> +static struct attribute *ccpi2_pmu_format_attrs[] = {
> + &format_attr_event_ccpi2.attr,
> + NULL,
> +};
> +
>  static const struct attribute_group

Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver

2019-06-27 Thread Will Deacon
On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote:
> From: Ganapatrao Kulkarni 
> 
> Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  Documentation/perf/thunderx2-pmu.txt | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/perf/thunderx2-pmu.txt 
> b/Documentation/perf/thunderx2-pmu.txt
> index dffc57143736..62243230abc3 100644
> --- a/Documentation/perf/thunderx2-pmu.txt
> +++ b/Documentation/perf/thunderx2-pmu.txt
> @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU 
> UNCORE)
>  =
>  
>  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> +Cavium Coherent Processor Interconnect (CCPI2).
>  
>  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
>  Events are counted for the default channel (i.e. channel 0) and prorated
>  to the total number of channels/tiles.
>  
> -The DMC and L3C support up to 4 counters. Counters are independently
> -programmable and can be started and stopped individually. Each counter
> -can be set to a different event. Counters are 32-bit and do not support
> -an overflow interrupt; they are read every 2 seconds.
> +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.

The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
counters.

> +Counters are independently programmable and can be started and stopped
> +individually. Each counter can be set to a different event. DMC and L3C
> +Counters are 32-bit and do not support an overflow interrupt; they are read

Counters -> counters

> +every 2 seconds. CCPI2 counters are 64-bit.

Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
two sentences as:

  None of the counters support an overflow interrupt and therefore sampling
  events are unsupported. The DMC and L3C counters are 32-bit and read every
  2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
  normal operation.

>  PMU UNCORE (perf) driver:
>  
>  The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
> -L3C devices.  Each PMU can be used to count up to 4 events
> -simultaneously. The PMUs provide a description of their available events
> -and configuration options under sysfs, see
> -/sys/devices/uncore_; S is the socket id.
> +L3C devices.  Each PMU can be used to count up to 4(DMC/L3C) or up to 8

Space between 4 and (

Will


Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain

2019-06-27 Thread Will Deacon
On Mon, Jun 24, 2019 at 10:34:02AM +, qi.f...@fujitsu.com wrote:
> On 6/18/19 2:03 AM, Will Deacon wrote:
> > On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote:
> >> From: Takao Indoh 
> >>
> >> I found a performance issue related on the implementation of Linux's TLB
> >> flush for arm64.
> >>
> >> When I run a single-threaded test program on moderate environment, it
> >> usually takes 39ms to finish its work. However, when I put a small
> >> apprication, which just calls mprotest() continuously, on one of sibling
> >> cores and run it simultaneously, the test program slows down significantly.
> >> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
> >> ThunderX1 and Fujitsu A64FX.
> > This is a problem for any applications that share hardware resources with
> > each other, so I don't think it's something we should be too concerned about
> > addressing unless there is a practical DoS scenario, which there doesn't
> > appear to be in this case. It may be that the real answer is "don't call
> > mprotect() in a loop".
> I think there has been a misunderstanding, please let me explain.
> This application is just an example using for reproducing the 
> performance issue we found.
> Our original purpose is reducing OS jitter by this series.
> The OS jitter on massively parallel processing systems have been known 
> and studied for many years.
> The 2.5% OS jitter can result in over a factor of 20 slowdown for the 
> same application [1].

I think it's worth pointing out that the system in question was neither
ARM-based nor running Linux, so I'd be cautious in applying the conclusions
of that paper directly to our TLB invalidation code. Furthermore, the noise
being generated in their experiments uses a timer interrupt, which has a
/vastly/ different profile to a DVM message in terms of both system impact
and frequency.

> Though it may be an extreme example, reducing the OS jitter has been an 
> issue in HPC environment.
> 
> [1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell. 
> "Characterizing application sensitivity to OS interference using 
> kernel-level noise injection." Proceedings of the 2008 ACM/IEEE 
> conference on Supercomputing. IEEE Press, 2008.
> 
> >> I suppose the root cause of this issue is the implementation of Linux's TLB
> >> flush for arm64, especially use of TLBI-is instruction which is a broadcast
> >> to all processor core on the system. In case of the above situation,
> >> TLBI-is is called by mprotect().
> > On the flip side, Linux is providing the hardware with enough information
> > not to broadcast to cores for which the remote TLBs don't have entries
> > allocated for the ASID being invalidated. I would say that the root cause
> > of the issue is that this filtering is not taking place.
> 
> Do you mean that the filter should be implemented in hardware?

Yes. If you're building a large system and you care about "jitter", then
you either need to partition it in such a way that sources of noise are
contained, or you need to introduce filters to limit their scope. Rewriting
the low-level memory-management parts of the operating system is a red
herring and imposes a needless burden on everybody else without solving
the real problem, which is that contended use of shared resources doesn't
scale.

Will


Re: [PATCH 12/14] doc-rst: add ABI documentation to the admin-guide book

2019-06-27 Thread Mauro Carvalho Chehab
Em Thu, 27 Jun 2019 12:48:12 +0300
Jani Nikula  escreveu:

> On Fri, 21 Jun 2019, Mauro Carvalho Chehab  wrote:
> > Em Wed, 19 Jun 2019 13:37:39 -0300
> > Mauro Carvalho Chehab  escreveu:
> >  
> >> Em Tue, 18 Jun 2019 11:47:32 +0300
> >> Jani Nikula  escreveu:
> >>   
> >> > On Mon, 17 Jun 2019, Mauro Carvalho Chehab  
> >> > wrote:
> >> > > Yeah, I guess it should be possible to do that. How a python script
> >> > > can identify if it was called by Sphinx, or if it was called directly? 
> >> > >  
> >> > 
> >> > if __name__ == '__main__':
> >> >  # run on the command-line, not imported
> >> 
> >> Ok, when I have some spare time, I may try to convert one script
> >> to python and see how it behaves.   
> >
> > Did a quick test here... 
> >
> > Probably I'm doing something wrong (as I'm a rookie with Python), but,
> > in order to be able to use the same script as command line and as an Sphinx
> > extension, everything that it is currently there should be "escaped"
> > by an:
> >
> > if __name__ != '__main__':
> >
> > As event the class definition:
> >
> > class KernelCmd(Directive):
> >
> > depends on:
> >
> > from docutils.parsers.rst import directives, Directive
> >
> > With is only required when one needs to parse ReST - e. g. only
> > when the script runs as a Sphinx extension.
> >
> > If this is right, as we want a script that can run by command line
> > to parse and search inside ABI files, at the end of the day, it will
> > be a lot easier to maintain if the parser script is different from the
> > Sphinx extension.   
> 
> Split it into two files, one has the nuts and bolts of parsing and has
> the "if __name__ == '__main__':" bit to run on the command line, and the
> other interfaces with Sphinx and imports the parser.

It seems we have an agreement here: the best is indeed to have two
files, one with the Documentation/ABI parser, and another one with the 
Sphinx extension...

> 
> > If so, as the Sphinx extension script will need to call a parsing script
> > anyway, it doesn't matter the language of the script with will be
> > doing the ABI file parsing.  
> 
> Calling the parser using an API will be easier to use, maintain and
> extend than using pipes, with all the interleaved sideband information
> to adjust line numbers and whatnot.

... and here is where we have two different views.

From debug PoV, the Documentation/ABI parser script should be able to
print the results by a command line call. This is also a feature
that it is useful for the users: to be able to seek for a symbol
and output its ABI description. So, the "stdout" output will be
there anyway.

The only extra data for the extension side is the file name where
the information came and the line number.

From maintainership PoV, adding the sideband API for file+line is
one line at the parser script (a print) and two lines at the Sphinx
extension (a regex expression and a match line). That's simple to
maintain.

It is also simple to verify both sides independently, as what
you see when running the parser script is what you get at the
extension.

If we add a new ABI between the parser script and the extension
script, this would require to also maintain the ABI, and would
make harder to identify problems on ABI problems.

-

Another advantage of having those independent is that the
language of the parsing script can be different. Not being
python is a big advantage for me, as perl is a lot more
intuitive and easier to write parser scripts for my eyes.

I can write a perl parsing script in a matter of minutes.
It takes me a lot more time to do the same with python, and then
ensure that it will work with two similar but different languages
(python2 and python3) [1].

[1] btw, with that regards, I still don't know how to teach a
python script that it should "prefer" to run with python3 but would
fall back to python2. Adding this shebang:
# /usr/bin/env python
just do the opposite - at least on Fedora


Thanks,
Mauro


Re: [PATCH v2 2/4] kbuild: do not create wrappers for header-test-y

2019-06-27 Thread Jani Nikula
On Thu, 27 Jun 2019, Masahiro Yamada  wrote:
> header-test-y does not work with headers in sub-directories.
>
> For example, you can write a Makefile, like this:
>
> include/linux/Kbuild:
>
>   header-test-y += mtd/nand.h
>
> This entry creates a wrapper include/linux/mtd/nand.hdrtest.c with
> the following content:
>
>   #include "mtd/nand.h"
>
> To make this work, we need to add $(srctree)/include/linux to the
> header search path. It would be tedious to add ccflags-y.
>
> We could change the *.hdrtest.c rule to wrap:
>
>   #include "nand.h"
>
> This works for in-tree build since #include "..." searches in the
> relative path from the header with this directive. For O=... build,
> we need to add $(srctree)/include/linux/mtd to the header search path,
> which will be even more tedious.
>
> After all, I thought it would be handier to compile headers directly
> without creating wrappers.
>
> I added a new build rule to compile %.h into %.h.s
>
> I chose %.h.s instead of %.h.o because it was a little bit faster.
> Also, for GCC, an empty assembly is smaller than an empty object.
>
> I wrote the build rule:
>
>   $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $<
>
> instead of:
>
>   $(CC) $(c_flags) -S -o $@ -x c $<
>
> Both work fine with GCC, but the latter is not good for Clang.
>
> This comes down to the difference in the -Wunused-function policy.
> GCC does not warn about unused 'static inline' functions at all.
> Clang does not warn about the ones in included headers, but does
> about the ones in the source. So, we should handle headers as
> headers, not as source files.
>
> In fact, this has been hidden since commit abb2ea7dfd82 ("compiler,
> clang: suppress warning for unused static inline functions"), but we
> should not rely on that.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
> Changes in v2:
>   - New patch
>
>  .gitignore |  1 -
>  Documentation/dontdiff |  1 -
>  Documentation/kbuild/makefiles.txt |  3 +--
>  Makefile   |  1 -
>  scripts/Makefile.build | 10 +-
>  scripts/Makefile.lib   |  2 +-
>  6 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/.gitignore b/.gitignore
> index 4bb60f0fa23b..7587ef56b92d 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -22,7 +22,6 @@
>  *.elf
>  *.gcno
>  *.gz
> -*.hdrtest.c
>  *.i
>  *.ko
>  *.lex.c
> diff --git a/Documentation/dontdiff b/Documentation/dontdiff
> index 554dfe4883d2..5eba889ea84d 100644
> --- a/Documentation/dontdiff
> +++ b/Documentation/dontdiff
> @@ -19,7 +19,6 @@
>  *.grep
>  *.grp
>  *.gz
> -*.hdrtest.c
>  *.html
>  *.i
>  *.jpeg
> diff --git a/Documentation/kbuild/makefiles.txt 
> b/Documentation/kbuild/makefiles.txt
> index ca4b24ec0399..5080fec34609 100644
> --- a/Documentation/kbuild/makefiles.txt
> +++ b/Documentation/kbuild/makefiles.txt
> @@ -1023,8 +1023,7 @@ When kbuild executes, the following steps are followed 
> (roughly):
>   header-test-y specifies headers (*.h) in the current directory that
>   should be compile tested to ensure they are self-contained,
>   i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled,
> - this autogenerates dummy sources to include the headers, and builds them
> - as part of extra-y.
> + this builds them as part of extra-y.
>  
>  --- 6.7 Commands useful for building a boot image
>  
> diff --git a/Makefile b/Makefile
> index f23516980796..7f293b0431fe 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1648,7 +1648,6 @@ clean: $(clean-dirs)
>   -o -name '*.dwo' -o -name '*.lst' \
>   -o -name '*.su'  \
>   -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
> - -o -name '*.hdrtest.c' \
>   -o -name '*.lex.c' -o -name '*.tab.[ch]' \
>   -o -name '*.asn1.[ch]' \
>   -o -name '*.symtypes' -o -name 'modules.order' \
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index ee0319560513..776842b7e6a3 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -294,14 +294,14 @@ quiet_cmd_cc_lst_c = MKLST   $@
>  $(obj)/%.lst: $(src)/%.c FORCE
>   $(call if_changed_dep,cc_lst_c)
>  
> -# Dummy C sources for header test (header-test-y target)
> +# header test (header-test-y target)
>  # ---
>  
> -quiet_cmd_header_test = HDRTEST $@
> -  cmd_header_test = echo "\#include \"$*.h\"" > $@
> +quiet_cmd_cc_s_h = CC  $@
> +  cmd_cc_s_h = $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $<

I think I'd prefer HDRTEST or something more informative than just CC in
the quiet cmd to distinguish this from the usual build lines. Especially
now that the file name is also just .h.s.

Other than that, good job getting rid of the intermediate file. I
couldn't figure it out when I did the original.

Acked-by: Jani Nikula 
Tested-by: Jani Nikula 

>  
> -$(obj)/%.hdrtest.c:
> -   

Re: [PATCH v2 3/4] kbuild: support header-test-pattern-y

2019-06-27 Thread Jani Nikula
On Thu, 27 Jun 2019, Masahiro Yamada  wrote:
> In my view, most of headers can be self-contained. So, it would be
> tedious to add every header to header-test-y explicitly. We usually
> end up with "all headers with some exceptions".
>
> There are two types in exceptions:
>
> [1] headers that are never compiled as standalone units
>
>   For examples, include/linux/compiler-gcc.h is not intended to be
>   included directly. We should always exclude such ones.
>
> [2] headers that are conditionally compiled as standalone units
>
>   Some headers can be compiled only for particular architectures.
>   For example, include/linux/arm-cci.h can be compiled only for
>   arm/arm64 because it requires  to exist.
>   Clang can compile include/soc/nps/mtm.h only for arc because
>   it contains an arch-specific register in inline assembler.
>
> For [2], we can write Makefile like this:
>
>   header-test-$(CONFIG_ARM) += linux/arm-cci.h
>
> The new syntax header-test-pattern-y will be useful to specify
> "the rest".
>
> The typical usage is like this:
>
>   header-test-pattern-y += */*.h
>
> This adds all the headers in sub-directories to the test coverage,
> but headers added to header-test- are excluded. In this regards,
> header-test-pattern-y behaves like a weaker variant of header-test-y.
>
> Caveat:
> The patterns in header-test-pattern-y are prefixed with $(srctree)/$(src)/
> but not $(objtree)/$(obj)/. Stale generated patterns are often left over.
> For example, you will have ones when you traverse the git history for
> 'git bisect' without cleaning. If a wildcard is used for generated
> headers, it may match to stale headers.
>
> If you really want to compile-test generated headers, I recommend to
> add them to header-test-y explicitly. One pitfall is $(srctree)/$(src)/
> and $(objtree)/$(obj)/ point to the same directory for in-tree building.
> So, header-test-pattern-y should be used with care. It can potentially
> match to generated headers, which may be stale and fail to compile.
>
> Caveat2:
> You could use wildcard for header-test-. For example,
>
>   header-test- += asm-generic/%
>
> ... will exclude headers in asm-generic directory. Unfortunately, the
> wildcard character is '%' instead of '*' because this is evaluated by
> $(filter-out ...) whereas header-test-pattern-y is evaluated by
> $(wildcard ...). This is a kludge, but seems useful in some places...
>
> Signed-off-by: Masahiro Yamada 

Awesome! This will let us get rid of our local $(wildcard) hacks.

Tested-by: Jani Nikula 

> ---
>
> Changes in v2:
>   - New patch
>
>  Documentation/kbuild/makefiles.txt | 10 ++
>  scripts/Makefile.lib   | 10 ++
>  2 files changed, 20 insertions(+)
>
> diff --git a/Documentation/kbuild/makefiles.txt 
> b/Documentation/kbuild/makefiles.txt
> index 5080fec34609..b817e6cefb77 100644
> --- a/Documentation/kbuild/makefiles.txt
> +++ b/Documentation/kbuild/makefiles.txt
> @@ -1025,6 +1025,16 @@ When kbuild executes, the following steps are followed 
> (roughly):
>   i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled,
>   this builds them as part of extra-y.
>  
> +header-test-pattern-y
> +
> + This works as a weaker version of header-test-y, and accepts wildcard
> + patterns. The typical usage is:
> +
> +   header-test-pattern-y += *.h
> +
> + This specifies all the files that matches to '*.h' in the current
> + directory, but the files in 'header-test-' are excluded.
> +
>  --- 6.7 Commands useful for building a boot image
>  
>   Kbuild provides a few macros that are useful when building a
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 55ae1ec65342..5933bbab 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -67,6 +67,16 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, 
> $(dtb-))
>  endif
>  
>  # Test self-contained headers
> +
> +# Wildcard searches in $(srctree)/$(src)/, but not in $(objtree)/$(obj)/.
> +# Stale generated headers are often left over, so wildcard matching should
> +# be avoided. Please notice $(srctree)/$(src)/ and $(objtree)/$(obj) point
> +# to the same location for in-tree building.
> +header-test-y+= $(filter-out $(header-test-), \
> + $(patsubst $(srctree)/$(src)/%, %, \
> + $(wildcard $(addprefix $(srctree)/$(src)/, \
> + $(header-test-pattern-y)
> +
>  extra-$(CONFIG_HEADER_TEST) += $(addsuffix .s, $(header-test-y))
>  
>  # Add subdir path

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH v2 0/4] Compile-test UAPI and kernel headers

2019-06-27 Thread Jani Nikula
On Thu, 27 Jun 2019, Masahiro Yamada  wrote:
> 1/4: reworked v2.
>
> 2/4: fix a flaw I noticed when I was working on this series
>
> 3/4: maybe useful for 4/4 and in some other places
>
> 4/4: v2. compile as many headers as possible.
>
>
> Changes in v2:
>  - Add CONFIG_CPU_{BIG,LITTLE}_ENDIAN guard to avoid build error
>  - Use 'header-test-' instead of 'no-header-test'
>  - Avoid weird 'find' warning when cleaning
>   - New patch
>   - New patch
>   - Add everything to test coverage, and exclude broken ones
>   - Rename 'Makefile' to 'Kbuild'
>   - Add CONFIG_KERNEL_HEADER_TEST option
>
> Masahiro Yamada (4):
>   kbuild: compile-test UAPI headers to ensure they are self-contained
>   kbuild: do not create wrappers for header-test-y
>   kbuild: support header-test-pattern-y
>   kbuild: compile-test kernel headers to ensure they are self-contained

[responding here because I didn't receive the actual patch]

This looks like it's doing what it's supposed to, but I ran into a bunch
of build fails with CONFIG_OF=n. Sent a fix to one [1], but stopped at
the next. Looks like you'll have to exclude more. And I'm pretty sure
we'll uncover more configurations where this will fail.

But I do applaud the goal, and I'm committed to making all include/drm
headers self-contained. I wouldn't block this based on the issues, it's
pretty much the only way to expose them and get them fixed/excluded, and
it's behind a config knob after all.

With the caveat that I didn't finish the build, but OTOH tested the
rainy day scenario and had the patch find issues it's meant to find:

Tested-by: Jani Nikula 


[1] 
http://patchwork.freedesktop.org/patch/msgid/20190627110103.7539-1-jani.nik...@intel.com

>
>  .gitignore |1 -
>  Documentation/dontdiff |1 -
>  Documentation/kbuild/makefiles.txt |   13 +-
>  Makefile   |4 +-
>  include/Kbuild | 1134 
>  init/Kconfig   |   22 +
>  scripts/Makefile.build |   10 +-
>  scripts/Makefile.lib   |   12 +-
>  scripts/cc-system-headers.sh   |8 +
>  usr/.gitignore |1 -
>  usr/Makefile   |2 +
>  usr/include/.gitignore |3 +
>  usr/include/Makefile   |  133 
>  13 files changed, 1331 insertions(+), 13 deletions(-)
>  create mode 100644 include/Kbuild
>  create mode 100755 scripts/cc-system-headers.sh
>  create mode 100644 usr/include/.gitignore
>  create mode 100644 usr/include/Makefile

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE

2019-06-27 Thread Jarkko Sakkinen
On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote:
> > You've used so much on this so shouldn't this have that somewhat new
> > co-developed-by tag? I'm also wondering can this work at all
> 
> Honestly, I've just been massaging this patch more than "authoring" it.
> If you feel strongly about it feel free to add a Co-authored patch with
> my name, but in my mind this is just Thiru's work.

This is just my subjective view but writing code is easier than making
it work in the mainline in 99% of cases. If this patch was doing
something revolutional, lets say a new outstanding scheduling algorithm,
then I would think otherwise. It is not. You without question deserve
both credit and also the blame (if this breaks everything) :-)

> > process-wise if the original author of the patch is also the only tester
> > of the patch?
> 
> There's not much we can do about this... Linaro folks have tested this
> without the fTPM firmware, so at the very least it won't explode for
> everyone. If for some reason non-microsoft folks see issues then we can
> submit patches on top to fix this, we're not just throwing this at you
> and running away.

So why any of those Linaro folks can't do it? I can add after tested-by
tag parentheses something explaining that context of testing. It is
reasonable given the circumstances.

I can also give an explanation in my next PR along the lines what you
are saying. This would definitely work for me.

/Jarkko



Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE

2019-06-27 Thread Jarkko Sakkinen
On Thu, 2019-06-27 at 16:17 +0300, Jarkko Sakkinen wrote:
> On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote:
> > > You've used so much on this so shouldn't this have that somewhat new
> > > co-developed-by tag? I'm also wondering can this work at all
> > 
> > Honestly, I've just been massaging this patch more than "authoring" it.
> > If you feel strongly about it feel free to add a Co-authored patch with
> > my name, but in my mind this is just Thiru's work.
> 
> This is just my subjective view but writing code is easier than making
> it work in the mainline in 99% of cases. If this patch was doing
> something revolutional, lets say a new outstanding scheduling algorithm,
> then I would think otherwise. It is not. You without question deserve
> both credit and also the blame (if this breaks everything) :-)

Not like I'm putting the pressure on this. You make the call
with the tag. Put it if you wwant. I'm cool with either.

/Jarkko



Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE

2019-06-27 Thread Ilias Apalodimas
Hi Jarkko,
> On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote:
> > > You've used so much on this so shouldn't this have that somewhat new
> > > co-developed-by tag? I'm also wondering can this work at all
> > 
> > Honestly, I've just been massaging this patch more than "authoring" it.
> > If you feel strongly about it feel free to add a Co-authored patch with
> > my name, but in my mind this is just Thiru's work.
> 
> This is just my subjective view but writing code is easier than making
> it work in the mainline in 99% of cases. If this patch was doing
> something revolutional, lets say a new outstanding scheduling algorithm,
> then I would think otherwise. It is not. You without question deserve
> both credit and also the blame (if this breaks everything) :-)
> 
> > > process-wise if the original author of the patch is also the only tester
> > > of the patch?
> > 
> > There's not much we can do about this... Linaro folks have tested this
> > without the fTPM firmware, so at the very least it won't explode for
> > everyone. If for some reason non-microsoft folks see issues then we can
> > submit patches on top to fix this, we're not just throwing this at you
> > and running away.
> 
> So why any of those Linaro folks can't do it? I can add after tested-by
> tag parentheses something explaining that context of testing. It is
> reasonable given the circumstances.
There's 2 teams from Microsoft trying to do this [1]. We tested the previous
implementation (which problems on probing as built-in). We had to change some
stuff in the OP-TEE fTPM implementation [2] and test it in QEMU.

What i quickly did with this module was to replace the kernel of the previous
build with the new one. Unfortunately i couldn't get it to work, but i don't
know if it's the module or the changes in the fTPM OP-TEE part. Since you have
tested it my guess is that it has something to do with the OP-TEE part. I don't
have any objections in this going in. On the contrary i think the functionality
is really useful. I don't have hardware to test this at the moment, but once i
get it, i'll give it a spin.

The part i tested is that the probing works as expected when no fTPM
implementation is running on secure world.
Since it has been tested and doesn't break anything we can always fix corner,
cases afterwards with more extensive testing

[1]
https://github.com/ms-iot/linux/blob/ms-optee-ocalls-merge/drivers/char/tpm/tpm_ftpm_optee.c
[2] https://github.com/jbech-linaro/manifest/blob/ftpm/README.md

Thanks
/Ilias

> 
> I can also give an explanation in my next PR along the lines what you
> are saying. This would definitely work for me.
> 
> /Jarkko
> 


[PATCH] docs: format kernel-parameters -- as code

2019-06-27 Thread Stephen Kitt
The current ReStructuredText formatting results in "--", used to
indicate the end of the kernel command-line parameters, appearing as
an en-dash instead of two hyphens; this patch formats them as code,
"``--``", as done elsewhere in the documentation.

Signed-off-by: Stephen Kitt 
---
 Documentation/admin-guide/kernel-parameters.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index 0124980dca2d..b8d479b76648 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -9,11 +9,11 @@ and sorted into English Dictionary order (defined as ignoring 
all
 punctuation and sorting digits before letters in a case insensitive
 manner), and with descriptions where known.
 
-The kernel parses parameters from the kernel command line up to "--";
+The kernel parses parameters from the kernel command line up to "``--``";
 if it doesn't recognize a parameter and it doesn't contain a '.', the
 parameter gets passed to init: parameters with '=' go into init's
 environment, others are passed as command line arguments to init.
-Everything after "--" is passed as an argument to init.
+Everything after "``--``" is passed as an argument to init.
 
 Module parameters can be specified in two ways: via the kernel command
 line with a module name prefix, or via modprobe, e.g.::
-- 
2.11.0



Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST

2019-06-27 Thread Jonathan Corbet
On Wed, 26 Jun 2019 15:07:01 -0500
Jiunn Chang  wrote:

> RCU basic concepts reST markup.
> 
> Signed-off-by: Jiunn Chang 
> Reviewed-by: Joel Fernandes (Google) 

So this is a little detail but ... your signoff should be the last thing
in the set of tags on the patch.

This isn't worth making you do yet another revision, so I went ahead and
applied the patches and fixed the tag ordering on the way in.  I'll also
append a patch adding the new RCU stuff into the core-api manual so people
can actually get to it.

Thanks,

jon


Re: [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all()

2019-06-27 Thread Michal Hocko
On Mon 24-06-19 13:42:18, Waiman Long wrote:
> Add a memcg_iterate_all() function for iterating all the available
> memory cgroups and call the given callback function for each of the
> memory cgruops.

Why is a trivial wrapper any better than open coded usage of the
iterator?

> Signed-off-by: Waiman Long 
> ---
>  include/linux/memcontrol.h |  3 +++
>  mm/memcontrol.c| 13 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 1dcb763bb610..0e31418e5a47 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1268,6 +1268,9 @@ static inline bool 
> mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep);
>  void memcg_kmem_put_cache(struct kmem_cache *cachep);
>  
> +extern void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg,
> +void *arg), void *arg);
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order);
>  void __memcg_kmem_uncharge(struct page *page, int order);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba9138a4a1de..c1c4706f7696 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -443,6 +443,19 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> *memcg)
>  static void memcg_free_shrinker_maps(struct mem_cgroup *memcg) { }
>  #endif /* CONFIG_MEMCG_KMEM */
>  
> +/*
> + * Iterate all the memory cgroups and call the given callback function
> + * for each of the memory cgroups.
> + */
> +void memcg_iterate_all(void (*callback)(struct mem_cgroup *memcg, void *arg),
> +void *arg)
> +{
> + struct mem_cgroup *memcg;
> +
> + for_each_mem_cgroup(memcg)
> + callback(memcg, arg);
> +}
> +
>  /**
>   * mem_cgroup_css_from_page - css of the memcg associated with a page
>   * @page: page of interest
> -- 
> 2.18.1

-- 
Michal Hocko
SUSE Labs


Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST

2019-06-27 Thread Steven Rostedt
On Thu, 27 Jun 2019 08:34:43 -0600
Jonathan Corbet  wrote:

> On Wed, 26 Jun 2019 15:07:01 -0500
> Jiunn Chang  wrote:
> 
> > RCU basic concepts reST markup.
> > 
> > Signed-off-by: Jiunn Chang 
> > Reviewed-by: Joel Fernandes (Google)   
> 
> So this is a little detail but ... your signoff should be the last thing
> in the set of tags on the patch.

Note, I've been seeing this a lot lately, and then noticed, that when I
downloaded a patch directly from patchwork, it placed all the
Reviewed-by and Acked-by tags after the original Signed-off-by. I
checked the original patch on the mailing list, and it had no other
tags but the Signed-off-by. I then pulled one of my own patches, and it
did it to that patch as well.

I too prefer the Signed-off-by be last, but our tooling needs to do
this as well, otherwise it's a failure in our procedures.

-- Steve


Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Michal Hocko
On Mon 24-06-19 13:42:19, Waiman Long wrote:
> With the slub memory allocator, the numbers of active slab objects
> reported in /proc/slabinfo are not real because they include objects
> that are held by the per-cpu slab structures whether they are actually
> used or not.  The problem gets worse the more CPUs a system have. For
> instance, looking at the reported number of active task_struct objects,
> one will wonder where all the missing tasks gone.
> 
> I know it is hard and costly to get a real count of active objects.

What exactly is expensive? Why cannot slabinfo reduce the number of
active objects by per-cpu cached objects?

> So
> I am not advocating for that. Instead, this patch extends the
> /proc/sys/vm/drop_caches sysctl parameter by using a new bit (bit 3)
> to shrink all the kmem slabs which will flush out all the slabs in the
> per-cpu structures and give a more accurate view of how much memory are
> really used up by the active slab objects. This is a costly operation,
> of course, but it gives a way to have a clearer picture of the actual
> number of slab objects used, if the need arises.

drop_caches is a terrible interface. It destroys all the caching and
people are just too easy in using it to solve any kind of problem they
think they might have and cause others they might not see immediately.
I am strongly discouraging anybody - except for some tests which really
do want to see reproducible results without cache effects - from using
this interface and therefore I am not really happy to paper over
something that might be a real problem with yet another mode. If SLUB
indeed caches too aggressively on large machines then this should be
fixed.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver

2019-06-27 Thread Will Deacon
On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote:
> On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote:
> > From: Ganapatrao Kulkarni 
> > 
> > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> > 
> > Signed-off-by: Ganapatrao Kulkarni 
> > ---
> >  Documentation/perf/thunderx2-pmu.txt | 20 +++-
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Documentation/perf/thunderx2-pmu.txt 
> > b/Documentation/perf/thunderx2-pmu.txt
> > index dffc57143736..62243230abc3 100644
> > --- a/Documentation/perf/thunderx2-pmu.txt
> > +++ b/Documentation/perf/thunderx2-pmu.txt
> > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU 
> > UNCORE)
> >  =
> >  
> >  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> > +Cavium Coherent Processor Interconnect (CCPI2).
> >  
> >  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
> >  Events are counted for the default channel (i.e. channel 0) and prorated
> >  to the total number of channels/tiles.
> >  
> > -The DMC and L3C support up to 4 counters. Counters are independently
> > -programmable and can be started and stopped individually. Each counter
> > -can be set to a different event. Counters are 32-bit and do not support
> > -an overflow interrupt; they are read every 2 seconds.
> > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.
> 
> The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
> counters.
> 
> > +Counters are independently programmable and can be started and stopped
> > +individually. Each counter can be set to a different event. DMC and L3C
> > +Counters are 32-bit and do not support an overflow interrupt; they are read
> 
> Counters -> counters
> 
> > +every 2 seconds. CCPI2 counters are 64-bit.
> 
> Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
> two sentences as:
> 
>   None of the counters support an overflow interrupt and therefore sampling
>   events are unsupported. The DMC and L3C counters are 32-bit and read every
>   2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
>   normal operation.

Mark reminded me that these are system PMUs and therefore sampling is
unsupported irrespective of the presence of an overflow interrupt, so you
can drop that part from the text.

Sorry for the confusion,

Will


Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST

2019-06-27 Thread Paul E. McKenney
On Thu, Jun 27, 2019 at 08:34:43AM -0600, Jonathan Corbet wrote:
> On Wed, 26 Jun 2019 15:07:01 -0500
> Jiunn Chang  wrote:
> 
> > RCU basic concepts reST markup.
> > 
> > Signed-off-by: Jiunn Chang 
> > Reviewed-by: Joel Fernandes (Google) 
> 
> So this is a little detail but ... your signoff should be the last thing
> in the set of tags on the patch.
> 
> This isn't worth making you do yet another revision, so I went ahead and
> applied the patches and fixed the tag ordering on the way in.  I'll also
> append a patch adding the new RCU stuff into the core-api manual so people
> can actually get to it.

Please feel free to add my ack:

Acked-by: Paul E. McKenney 

Thanx, Paul



Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE

2019-06-27 Thread Jarkko Sakkinen
On Thu, 2019-06-27 at 16:30 +0300, Ilias Apalodimas wrote:
> is really useful. I don't have hardware to test this at the moment, but once i
> get it, i'll give it a spin.

Thank you for responding, really appreciate it.

Please note, however, that I already did my v5.3 PR so there is a lot of
time to give it a spin. In all cases, we will find a way to put this to
my v5.4 PR. I don't see any reason why not.

As soon as the cosmetic stuff is fixed that I remarked in v7 I'm ready
to take this to my tree and after that soonish make it available on
linux-next.

/Jarkko



[PATCH v3 3/4] kbuild: support header-test-pattern-y

2019-06-27 Thread Masahiro Yamada
In my view, most of headers can be self-contained. So, it would be
tedious to add every header to header-test-y explicitly. We usually
end up with "all headers with some exceptions".

There are two types in exceptions:

[1] headers that are never compiled as standalone units

  For examples, include/linux/compiler-gcc.h is not intended for
  direct inclusion. We should always exclude such ones.

[2] headers that are conditionally compiled as standalone units

  Some headers can be compiled only for particular architectures.
  For example, include/linux/arm-cci.h can be compiled only for
  arm/arm64 because it requires  to exist.
  Clang can compile include/soc/nps/mtm.h only for arc because
  it contains an arch-specific register in inline assembler.

So, you can write Makefile like this:

  header-test-+= linux/compiler-gcc.h
  header-test-$(CONFIG_ARM)   += linux/arm-cci.h
  header-test-$(CONFIG_ARM64) += linux/arm-cci.h
  header-test-$(CONFIG_ARC)   += soc/nps/mtm.h

The new syntax header-test-pattern-y will be useful to specify
"the rest".

The typical usage is like this:

  header-test-pattern-y += */*.h

This will add all the headers in sub-directories to the test coverage,
excluding $(header-test-). In this regards, header-test-pattern-y
behaves like a weaker variant of header-test-y.

Caveat:
The patterns in header-test-pattern-y are prefixed with $(srctree)/$(src)/
but not $(objtree)/$(obj)/. Stale generated headers are often left over
when you traverse the git history without cleaning. Wildcard patterns for
$(objtree) may match to stale headers, which could fail to compile.
One pitfall is $(srctree)/$(src)/ and $(objtree)/$(obj)/ point to the
same directory for in-tree building. So, header-test-pattern-y should
be used with care since it can potentially match to stale headers.

Caveat2:
You could use wildcard for header-test-. For example,

  header-test- += asm-generic/%

... will exclude headers in asm-generic directory. Unfortunately, the
wildcard character is '%' instead of '*' here because this is evaluated
by $(filter-out ...) whereas header-test-pattern-y is evaluated by
$(wildcard ...). This is a kludge, but seems useful in some places...

Signed-off-by: Masahiro Yamada 
Tested-by: Jani Nikula 
---

Changes in v3: None
Changes in v2:
  - New patch

 Documentation/kbuild/makefiles.txt | 10 ++
 scripts/Makefile.lib   | 11 +++
 2 files changed, 21 insertions(+)

diff --git a/Documentation/kbuild/makefiles.txt 
b/Documentation/kbuild/makefiles.txt
index 5080fec34609..b817e6cefb77 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -1025,6 +1025,16 @@ When kbuild executes, the following steps are followed 
(roughly):
i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled,
this builds them as part of extra-y.
 
+header-test-pattern-y
+
+   This works as a weaker version of header-test-y, and accepts wildcard
+   patterns. The typical usage is:
+
+ header-test-pattern-y += *.h
+
+   This specifies all the files that matches to '*.h' in the current
+   directory, but the files in 'header-test-' are excluded.
+
 --- 6.7 Commands useful for building a boot image
 
Kbuild provides a few macros that are useful when building a
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 55ae1ec65342..281864fcf0fe 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -67,6 +67,17 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, 
$(dtb-))
 endif
 
 # Test self-contained headers
+
+# Wildcard searches in $(srctree)/$(src)/, but not in $(objtree)/$(obj)/.
+# Stale generated headers are often left over, so pattern matching should
+# be avoided. Please notice $(srctree)/$(src)/ and $(objtree)/$(obj) point
+# to the same location for in-tree building. So, header-test-pattern-y should
+# be used with care.
+header-test-y  += $(filter-out $(header-test-), \
+   $(patsubst $(srctree)/$(src)/%, %, \
+   $(wildcard $(addprefix $(srctree)/$(src)/, \
+   $(header-test-pattern-y)
+
 extra-$(CONFIG_HEADER_TEST) += $(addsuffix .s, $(header-test-y))
 
 # Add subdir path
-- 
2.17.1



[PATCH v3 2/4] kbuild: do not create wrappers for header-test-y

2019-06-27 Thread Masahiro Yamada
header-test-y does not work with headers in sub-directories.

For example, you may want to write a Makefile, like this:

include/linux/Kbuild:

  header-test-y += mtd/nand.h

This entry will create a wrapper include/linux/mtd/nand.hdrtest.c
with the following content:

  #include "mtd/nand.h"

To make this work, we need to add $(srctree)/include/linux to the
header search path. It would be tedious to add ccflags-y.

Instead, we could change the *.hdrtest.c rule to wrap:

  #include "nand.h"

This works for in-tree build since #include "..." searches in the
relative path from the header with this directive. For O=... build,
we need to add $(srctree)/include/linux/mtd to the header search path,
which will be even more tedious.

After all, I thought it would be handier to compile headers directly
without creating wrappers.

I added a new build rule to compile %.h into %.h.s

The target is %.h.s instead of %.h.o because it is slightly faster.
Also, as for GCC, an empty assembly is smaller than an empty object.

I wrote the build rule:

  $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $<

instead of:

  $(CC) $(c_flags) -S -o $@ -x c $<

Both work fine with GCC, but the latter is bad for Clang.

This comes down to the difference in the -Wunused-function policy.
GCC does not warn about unused 'static inline' functions at all.
Clang does not warn about the ones in included headers, but does
about the ones in the source. So, we should handle headers as
headers, not as source files.

In fact, this has been hidden since commit abb2ea7dfd82 ("compiler,
clang: suppress warning for unused static inline functions"), but we
should not rely on that.

Signed-off-by: Masahiro Yamada 
Acked-by: Jani Nikula 
Tested-by: Jani Nikula 
---

Changes in v3: None
Changes in v2:
  - New patch

 .gitignore |  1 -
 Documentation/dontdiff |  1 -
 Documentation/kbuild/makefiles.txt |  3 +--
 Makefile   |  1 -
 scripts/Makefile.build | 10 +-
 scripts/Makefile.lib   |  2 +-
 6 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/.gitignore b/.gitignore
index 4bb60f0fa23b..7587ef56b92d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,7 +22,6 @@
 *.elf
 *.gcno
 *.gz
-*.hdrtest.c
 *.i
 *.ko
 *.lex.c
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 554dfe4883d2..5eba889ea84d 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -19,7 +19,6 @@
 *.grep
 *.grp
 *.gz
-*.hdrtest.c
 *.html
 *.i
 *.jpeg
diff --git a/Documentation/kbuild/makefiles.txt 
b/Documentation/kbuild/makefiles.txt
index ca4b24ec0399..5080fec34609 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -1023,8 +1023,7 @@ When kbuild executes, the following steps are followed 
(roughly):
header-test-y specifies headers (*.h) in the current directory that
should be compile tested to ensure they are self-contained,
i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled,
-   this autogenerates dummy sources to include the headers, and builds them
-   as part of extra-y.
+   this builds them as part of extra-y.
 
 --- 6.7 Commands useful for building a boot image
 
diff --git a/Makefile b/Makefile
index f23516980796..7f293b0431fe 100644
--- a/Makefile
+++ b/Makefile
@@ -1648,7 +1648,6 @@ clean: $(clean-dirs)
-o -name '*.dwo' -o -name '*.lst' \
-o -name '*.su'  \
-o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
-   -o -name '*.hdrtest.c' \
-o -name '*.lex.c' -o -name '*.tab.[ch]' \
-o -name '*.asn1.[ch]' \
-o -name '*.symtypes' -o -name 'modules.order' \
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ee0319560513..776842b7e6a3 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -294,14 +294,14 @@ quiet_cmd_cc_lst_c = MKLST   $@
 $(obj)/%.lst: $(src)/%.c FORCE
$(call if_changed_dep,cc_lst_c)
 
-# Dummy C sources for header test (header-test-y target)
+# header test (header-test-y target)
 # ---
 
-quiet_cmd_header_test = HDRTEST $@
-  cmd_header_test = echo "\#include \"$*.h\"" > $@
+quiet_cmd_cc_s_h = CC  $@
+  cmd_cc_s_h = $(CC) $(c_flags) -S -o $@ -x c /dev/null -include $<
 
-$(obj)/%.hdrtest.c:
-   $(call cmd,header_test)
+$(obj)/%.h.s: $(src)/%.h FORCE
+   $(call if_changed_dep,cc_s_h)
 
 # Compile assembler sources (.S)
 # ---
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3e630fcaffd1..55ae1ec65342 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -67,7 +67,7 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, 
$(dtb-))
 endif
 
 # Test self-contained headers
-extra-$(CONFIG_HEADER_TEST) += $(patsubst %.

[PATCH v3 0/4] Compile-test UAPI and kernel headers

2019-06-27 Thread Masahiro Yamada
1/4: Compile-test exported headers (reworked in v2)

2/4: fix a flaw I noticed when I was working on this series.
 Avoid generating intermediate wrappers.

3/4: maybe useful for 4/4 and in some other places.
 Add header-test-pattern-y syntax.

4/4: Compile-test kernel-space headers in include/.
 v2: compile as many headers as possible.
 v3: exclude more headers causing build errors


Masahiro Yamada (4):
  kbuild: compile-test UAPI headers to ensure they are self-contained
  kbuild: do not create wrappers for header-test-y
  kbuild: support header-test-pattern-y
  kbuild: compile-test kernel headers to ensure they are self-contained

 .gitignore |1 -
 Documentation/dontdiff |1 -
 Documentation/kbuild/makefiles.txt |   13 +-
 Makefile   |4 +-
 include/Kbuild | 1250 
 init/Kconfig   |   22 +
 scripts/Makefile.build |   10 +-
 scripts/Makefile.lib   |   13 +-
 scripts/cc-system-headers.sh   |8 +
 usr/.gitignore |1 -
 usr/Makefile   |2 +
 usr/include/.gitignore |3 +
 usr/include/Makefile   |  134 +++
 13 files changed, 1449 insertions(+), 13 deletions(-)
 create mode 100644 include/Kbuild
 create mode 100755 scripts/cc-system-headers.sh
 create mode 100644 usr/include/.gitignore
 create mode 100644 usr/include/Makefile

-- 
2.17.1



Re: [PATCH v2 0/4] Compile-test UAPI and kernel headers

2019-06-27 Thread Masahiro Yamada
On Thu, Jun 27, 2019 at 8:36 PM Jani Nikula  wrote:
>
> On Thu, 27 Jun 2019, Masahiro Yamada  wrote:
> > 1/4: reworked v2.
> >
> > 2/4: fix a flaw I noticed when I was working on this series
> >
> > 3/4: maybe useful for 4/4 and in some other places
> >
> > 4/4: v2. compile as many headers as possible.
> >
> >
> > Changes in v2:
> >  - Add CONFIG_CPU_{BIG,LITTLE}_ENDIAN guard to avoid build error
> >  - Use 'header-test-' instead of 'no-header-test'
> >  - Avoid weird 'find' warning when cleaning
> >   - New patch
> >   - New patch
> >   - Add everything to test coverage, and exclude broken ones
> >   - Rename 'Makefile' to 'Kbuild'
> >   - Add CONFIG_KERNEL_HEADER_TEST option
> >
> > Masahiro Yamada (4):
> >   kbuild: compile-test UAPI headers to ensure they are self-contained
> >   kbuild: do not create wrappers for header-test-y
> >   kbuild: support header-test-pattern-y
> >   kbuild: compile-test kernel headers to ensure they are self-contained
>
> [responding here because I didn't receive the actual patch]
>
> This looks like it's doing what it's supposed to, but I ran into a bunch
> of build fails with CONFIG_OF=n. Sent a fix to one [1], but stopped at
> the next. Looks like you'll have to exclude more. And I'm pretty sure
> we'll uncover more configurations where this will fail.

Thanks for testing.

I did more compile-tests, and excluded more headers in v3.

Thanks.



-- 
Best Regards
Masahiro Yamada


Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST

2019-06-27 Thread Jiunn Chang
On Thu, Jun 27, 2019 at 08:34:43AM -0600, Jonathan Corbet wrote:
> On Wed, 26 Jun 2019 15:07:01 -0500
> Jiunn Chang  wrote:
> 
> > RCU basic concepts reST markup.
> > 
> > Signed-off-by: Jiunn Chang 
> > Reviewed-by: Joel Fernandes (Google) 
> 
> So this is a little detail but ... your signoff should be the last thing
> in the set of tags on the patch.
> 
> This isn't worth making you do yet another revision, so I went ahead and
> applied the patches and fixed the tag ordering on the way in.  I'll also
> append a patch adding the new RCU stuff into the core-api manual so people
> can actually get to it.
> 
> Thanks,
> 
> jon

Hello Jon,

I will keep this in mind for next time.  I would like to thank you, Joel, Paul
and everyone else who has helped me learn the Linux kernel patch process.

I will send a patch for the UP systems change Paul sent me for _bh suffix.

THX,

Jiunn


Re: [Linux-kernel-mentees][PATCH v5 1/5] Documentation: RCU: Convert RCU basic concepts to reST

2019-06-27 Thread Shuah Khan

On 6/27/19 9:13 AM, Steven Rostedt wrote:

On Thu, 27 Jun 2019 08:34:43 -0600
Jonathan Corbet  wrote:


On Wed, 26 Jun 2019 15:07:01 -0500
Jiunn Chang  wrote:


RCU basic concepts reST markup.

Signed-off-by: Jiunn Chang 
Reviewed-by: Joel Fernandes (Google) 


So this is a little detail but ... your signoff should be the last thing
in the set of tags on the patch.


Note, I've been seeing this a lot lately, and then noticed, that when I
downloaded a patch directly from patchwork, it placed all the
Reviewed-by and Acked-by tags after the original Signed-off-by. I
checked the original patch on the mailing list, and it had no other
tags but the Signed-off-by. I then pulled one of my own patches, and it
did it to that patch as well.

I too prefer the Signed-off-by be last, but our tooling needs to do
this as well, otherwise it's a failure in our procedures.



Thanks Steve for pointing this out. I am seeing some odd behavior with tags.

It appears some maintainers want the tags in chronological order, which
is Reviewed-by after Signed-off which doesn't make sense to me.

I prefer Signed-off-by last.

I am working on FAQ (Frequently Answered Questions) section for mentees.
I will add this to it.

thanks,
-- Shuah




[PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

2019-06-27 Thread Dave Chiluk
It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota. This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.

This has been root caused to threads being allocated per cpu bandwidth
slices, and then not fully using that slice within the period. At which
point the slice and quota expires. This expiration of unused slice
results in applications not being able to utilize the quota for which
they are allocated.

The expiration of per-cpu slices was recently fixed by
'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
condition")'. Prior to that it appears that this has been broken since
at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
added the following conditional which resulted in slices never being
expired.

if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;

Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.

This allows quota already allocated to per-cpu run-queues to live longer
than the period boundary. This allows threads on runqueues that do not
use much CPU to continue to use their remaining slice over a longer
period of time than cpu.cfs_period_us. However, this helps prevents the
above condition of hitting throttling while also not fully utilizing
your cpu quota.

This theoretically allows a machine to use slightly more than its
allotted quota in some periods. This overflow would be bounded by the
remaining quota left on each per-cpu runqueueu. This is typically no
more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
change nothing, as they should theoretically fully utilize all of their
quota in each period. For user-interactive tasks as described above this
provides a much better user/application experience as their cpu
utilization will more closely match the amount they requested when they
hit throttling. This means that cpu limits no longer strictly apply per
period for non-cpu bound applications, but that they are still accurate
over longer timeframes.

This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase (10ms/100ms of quota on
80 CPU machine), this commit resulted in almost 30x performance
improvement, while still maintaining correct cpu quota restrictions.
That testcase is available at https://github.com/indeedeng/fibtest.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk 
---
 Documentation/scheduler/sched-bwc.txt | 73 ---
 kernel/sched/fair.c   | 71 +++---
 kernel/sched/sched.h  |  4 --
 3 files changed, 65 insertions(+), 83 deletions(-)

diff --git a/Documentation/scheduler/sched-bwc.txt 
b/Documentation/scheduler/sched-bwc.txt
index f6b1873..4c19c94 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension 
which allows the
 specification of the maximum CPU bandwidth available to a group or hierarchy.
 
 The bandwidth allowed for a group is specified using a quota and period. Within
-each given "period" (microseconds), a group is allowed to consume only up to
-"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
-group exceeds this limit (for that period), the tasks belonging to its
-hierarchy will be throttled and are not allowed to run again until the next
-period.
-
-A group's unused runtime is globally tracked, being refreshed with quota units
-above at each period boundary.  As threads consume this bandwidth it is
-transferred to cpu-local "silos" on a demand basis.  The amount transferred
+each given "period" (microseconds), a task group is allocated up to "quota"
+microseconds of CPU time. That quota is assigned to per cpu run queues in
+slices as threads in the cgroup become runnable. Once all quota has been
+assigned any additional requests for quota will result in those threads being
+throttled. Throttled threads will not be able to run again until the next
+period when the quota is replenished.
+
+A group's unassigned quota is globally tracked, being refreshed back to
+cfs_quota units at each period boundary. As threads consume this bandw

[PATCH v5 0/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

2019-06-27 Thread Dave Chiluk
Changelog v5
- Based on this comment from Ben Segall's comment on v4
> If the cost of taking this global lock across all cpus without a
> ratelimit was somehow not a problem, I'd much prefer to just set
> min_cfs_rq_runtime = 0. (Assuming it is, I definitely prefer the "lie
> and sorta have 2x period 2x runtime" solution of removing expiration)
I'm resubmitting my v3 patchset, with the requested changes.
- Updated Commit log given review comments
- Update sched-bwc.txt give my new understanding of the slack timer.

Changelog v4
- Rewrote patchset around the concept of returning all of runtime_remaining
when cfs_b nears the end of available quota.

Changelog v3
- Reworked documentation to better describe behavior of slice expiration per
feedback from Peter Oskolkov

Changelog v2
- Fixed some checkpatch errors in the commit message.


[PATCH v5 1/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

2019-06-27 Thread Dave Chiluk
It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota. This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.

This has been root caused to threads being allocated per cpu bandwidth
slices, and then not fully using that slice within the period. At which
point the slice and quota expires. This expiration of unused slice
results in applications not being able to utilize the quota for which
they are allocated.

The expiration of per-cpu slices was recently fixed by
'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
condition")'. Prior to that it appears that this has been broken since
at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
added the following conditional which resulted in slices never being
expired.

if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;

Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.

This allows quota already allocated to per-cpu run-queues to live longer
than the period boundary. This allows threads on runqueues that do not
use much CPU to continue to use their remaining slice over a longer
period of time than cpu.cfs_period_us. However, this helps prevents the
above condition of hitting throttling while also not fully utilizing
your cpu quota.

This theoretically allows a machine to use slightly more than its
allotted quota in some periods. This overflow would be bounded by the
remaining quota left on each per-cpu runqueueu. This is typically no
more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
change nothing, as they should theoretically fully utilize all of their
quota in each period. For user-interactive tasks as described above this
provides a much better user/application experience as their cpu
utilization will more closely match the amount they requested when they
hit throttling. This means that cpu limits no longer strictly apply per
period for non-cpu bound applications, but that they are still accurate
over longer timeframes.

This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase (10ms/100ms of quota on
80 CPU machine), this commit resulted in almost 30x performance
improvement, while still maintaining correct cpu quota restrictions.
That testcase is available at https://github.com/indeedeng/fibtest.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk 
---
 Documentation/scheduler/sched-bwc.txt | 73 ---
 kernel/sched/fair.c   | 71 +++---
 kernel/sched/sched.h  |  4 --
 3 files changed, 65 insertions(+), 83 deletions(-)

diff --git a/Documentation/scheduler/sched-bwc.txt 
b/Documentation/scheduler/sched-bwc.txt
index f6b1873..4c19c94 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension 
which allows the
 specification of the maximum CPU bandwidth available to a group or hierarchy.
 
 The bandwidth allowed for a group is specified using a quota and period. Within
-each given "period" (microseconds), a group is allowed to consume only up to
-"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
-group exceeds this limit (for that period), the tasks belonging to its
-hierarchy will be throttled and are not allowed to run again until the next
-period.
-
-A group's unused runtime is globally tracked, being refreshed with quota units
-above at each period boundary.  As threads consume this bandwidth it is
-transferred to cpu-local "silos" on a demand basis.  The amount transferred
+each given "period" (microseconds), a task group is allocated up to "quota"
+microseconds of CPU time. That quota is assigned to per cpu run queues in
+slices as threads in the cgroup become runnable. Once all quota has been
+assigned any additional requests for quota will result in those threads being
+throttled. Throttled threads will not be able to run again until the next
+period when the quota is replenished.
+
+A group's unassigned quota is globally tracked, being refreshed back to
+cfs_quota units at each period boundary. As threads consume this bandw

Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Waiman Long
On 6/26/19 4:19 PM, Roman Gushchin wrote:
>>  
>> +#ifdef CONFIG_MEMCG_KMEM
>> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
>> +void __maybe_unused *arg)
>> +{
>> +struct kmem_cache *s;
>> +
>> +if (memcg == root_mem_cgroup)
>> +return;
>> +mutex_lock(&slab_mutex);
>> +list_for_each_entry(s, &memcg->kmem_caches,
>> +memcg_params.kmem_caches_node) {
>> +kmem_cache_shrink(s);
>> +}
>> +mutex_unlock(&slab_mutex);
>> +cond_resched();
>> +}
> A couple of questions:
> 1) how about skipping already offlined kmem_caches? They are already shrunk,
>so you probably won't get much out of them. Or isn't it true?

I have been thinking about that. This patch is based on the linux tree
and so don't have an easy to find out if the kmem caches have been
shrinked. Rebasing this on top of linux-next, I can use the
SLAB_DEACTIVATED flag as a marker for skipping the shrink.

With all the latest patches, I am still seeing 121 out of a total of 726
memcg kmem caches (1/6) that are deactivated caches after system bootup
one of the test systems. My system is still using cgroup v1 and so the
number may be different in a v2 setup. The next step is probably to
figure out why those deactivated caches are still there.

> 2) what's your long-term vision here? do you think that we need to shrink
>kmem_caches periodically, depending on memory pressure? how a user
>will use this new sysctl?
Shrinking the kmem caches under extreme memory pressure can be one way
to free up extra pages, but the effect will probably be temporary.
> What's the problem you're trying to solve in general?

At least for the slub allocator, shrinking the caches allow the number
of active objects reported in slabinfo to be more accurate. In addition,
this allow to know the real slab memory consumption. I have been working
on a BZ about continuous memory leaks with a container based workloads.
The ability to shrink caches allow us to get a more accurate memory
consumption picture. Another alternative is to turn on slub_debug which
will then disables all the per-cpu slabs.

Anyway, I think this can be useful to others that is why I posted the patch.

Cheers,
Longman



[Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Jiunn Chang
The UP.rst file calls for locks acquired within RCU callback functions
to use _irq variants (spin_lock_irqsave() or similar), which does work,
but can be overkill.  This commit therefore instead calls for _bh variants
(spin_lock_bh() or similar), while noting that _irq does work.

Signed-off-by: Paul E. McKenney 
Signed-off-by: Jiunn Chang 
---
 Documentation/RCU/UP.rst | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
index 67715a47ae89..e26dda27430c 100644
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@@ -113,12 +113,13 @@ Answer to Quick Quiz #1:
 Answer to Quick Quiz #2:
What locking restriction must RCU callbacks respect?
 
-   Any lock that is acquired within an RCU callback must be
-   acquired elsewhere using an _irq variant of the spinlock
-   primitive.  For example, if "mylock" is acquired by an
-   RCU callback, then a process-context acquisition of this
-   lock must use something like spin_lock_irqsave() to
-   acquire the lock.
+   Any lock that is acquired within an RCU callback must be acquired
+   elsewhere using an _bh variant of the spinlock primitive.
+   For example, if "mylock" is acquired by an RCU callback, then
+   a process-context acquisition of this lock must use something
+   like spin_lock_bh() to acquire the lock.  Please note that
+   it is also OK to use _irq variants of spinlocks, for example,
+   spin_lock_irqsave().
 
If the process-context code were to simply use spin_lock(),
then, since RCU callbacks can be invoked from softirq context,
-- 
2.22.0



Re: [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all()

2019-06-27 Thread Waiman Long
On 6/27/19 11:07 AM, Michal Hocko wrote:
> On Mon 24-06-19 13:42:18, Waiman Long wrote:
>> Add a memcg_iterate_all() function for iterating all the available
>> memory cgroups and call the given callback function for each of the
>> memory cgruops.
> Why is a trivial wrapper any better than open coded usage of the
> iterator?

Because the iterator is only defined within memcontrol.c. So an
alternative may be to put the iterator into a header file that can be
used by others. Will take a look at that.

Cheers,
Longman



Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Waiman Long
On 6/27/19 11:15 AM, Michal Hocko wrote:
> On Mon 24-06-19 13:42:19, Waiman Long wrote:
>> With the slub memory allocator, the numbers of active slab objects
>> reported in /proc/slabinfo are not real because they include objects
>> that are held by the per-cpu slab structures whether they are actually
>> used or not.  The problem gets worse the more CPUs a system have. For
>> instance, looking at the reported number of active task_struct objects,
>> one will wonder where all the missing tasks gone.
>>
>> I know it is hard and costly to get a real count of active objects.
> What exactly is expensive? Why cannot slabinfo reduce the number of
> active objects by per-cpu cached objects?
>
The number of cachelines that needs to be accessed in order to get an
accurate count will be much higher if we need to iterate through all the
per-cpu structures. In addition, accessing the per-cpu partial list will
be racy.


>> So
>> I am not advocating for that. Instead, this patch extends the
>> /proc/sys/vm/drop_caches sysctl parameter by using a new bit (bit 3)
>> to shrink all the kmem slabs which will flush out all the slabs in the
>> per-cpu structures and give a more accurate view of how much memory are
>> really used up by the active slab objects. This is a costly operation,
>> of course, but it gives a way to have a clearer picture of the actual
>> number of slab objects used, if the need arises.
> drop_caches is a terrible interface. It destroys all the caching and
> people are just too easy in using it to solve any kind of problem they
> think they might have and cause others they might not see immediately.
> I am strongly discouraging anybody - except for some tests which really
> do want to see reproducible results without cache effects - from using
> this interface and therefore I am not really happy to paper over
> something that might be a real problem with yet another mode. If SLUB
> indeed caches too aggressively on large machines then this should be
> fixed.
>
OK, as explained in another thread, the main reason for doing this patch
is to be able to do more accurate measurement of changes in kmem cache
memory consumption. Yes, I do agree that drop_caches is not a general
purpose interface that should be used lightly.

Cheers,
Longman



Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Roman Gushchin
On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote:
> On 6/26/19 4:19 PM, Roman Gushchin wrote:
> >>  
> >> +#ifdef CONFIG_MEMCG_KMEM
> >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
> >> +  void __maybe_unused *arg)
> >> +{
> >> +  struct kmem_cache *s;
> >> +
> >> +  if (memcg == root_mem_cgroup)
> >> +  return;
> >> +  mutex_lock(&slab_mutex);
> >> +  list_for_each_entry(s, &memcg->kmem_caches,
> >> +  memcg_params.kmem_caches_node) {
> >> +  kmem_cache_shrink(s);
> >> +  }
> >> +  mutex_unlock(&slab_mutex);
> >> +  cond_resched();
> >> +}
> > A couple of questions:
> > 1) how about skipping already offlined kmem_caches? They are already shrunk,
> >so you probably won't get much out of them. Or isn't it true?
> 
> I have been thinking about that. This patch is based on the linux tree
> and so don't have an easy to find out if the kmem caches have been
> shrinked. Rebasing this on top of linux-next, I can use the
> SLAB_DEACTIVATED flag as a marker for skipping the shrink.
> 
> With all the latest patches, I am still seeing 121 out of a total of 726
> memcg kmem caches (1/6) that are deactivated caches after system bootup
> one of the test systems. My system is still using cgroup v1 and so the
> number may be different in a v2 setup. The next step is probably to
> figure out why those deactivated caches are still there.

It's not a secret: these kmem_caches are holding objects, which are in use.
It's a drawback of the current slab accounting implementation: every
object holds a whole page and the corresponding kmem_cache. It's optimized
for a large number of objects, which are created and destroyed within
the life of the cgroup (e.g. task_structs), and it works worse for long-living
objects like vfs cache.

Long-term I think we need a different implementation for long-living objects,
so that objects belonging to different memory cgroups can share the same page
and kmem_caches.

It's a fairly big change though.

> 
> > 2) what's your long-term vision here? do you think that we need to shrink
> >kmem_caches periodically, depending on memory pressure? how a user
> >will use this new sysctl?
> Shrinking the kmem caches under extreme memory pressure can be one way
> to free up extra pages, but the effect will probably be temporary.
> > What's the problem you're trying to solve in general?
> 
> At least for the slub allocator, shrinking the caches allow the number
> of active objects reported in slabinfo to be more accurate. In addition,
> this allow to know the real slab memory consumption. I have been working
> on a BZ about continuous memory leaks with a container based workloads.
> The ability to shrink caches allow us to get a more accurate memory
> consumption picture. Another alternative is to turn on slub_debug which
> will then disables all the per-cpu slabs.

I see... I agree with Michal here, that extending drop_caches sysctl isn't
the best idea. Isn't it possible to achieve the same effect using slub sysfs?

Thanks!


Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Luis Chamberlain
On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote:
> On 6/26/19 4:19 PM, Roman Gushchin wrote:
> >>  
> >> +#ifdef CONFIG_MEMCG_KMEM
> >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
> >> +  void __maybe_unused *arg)
> >> +{
> >> +  struct kmem_cache *s;
> >> +
> >> +  if (memcg == root_mem_cgroup)
> >> +  return;
> >> +  mutex_lock(&slab_mutex);
> >> +  list_for_each_entry(s, &memcg->kmem_caches,
> >> +  memcg_params.kmem_caches_node) {
> >> +  kmem_cache_shrink(s);
> >> +  }
> >> +  mutex_unlock(&slab_mutex);
> >> +  cond_resched();
> >> +}
> > A couple of questions:
> > 1) how about skipping already offlined kmem_caches? They are already shrunk,
> >so you probably won't get much out of them. Or isn't it true?
> 
> I have been thinking about that. This patch is based on the linux tree
> and so don't have an easy to find out if the kmem caches have been
> shrinked. Rebasing this on top of linux-next, I can use the
> SLAB_DEACTIVATED flag as a marker for skipping the shrink.
> 
> With all the latest patches, I am still seeing 121 out of a total of 726
> memcg kmem caches (1/6) that are deactivated caches after system bootup
> one of the test systems. My system is still using cgroup v1 and so the
> number may be different in a v2 setup. The next step is probably to
> figure out why those deactivated caches are still there.
> 
> > 2) what's your long-term vision here? do you think that we need to shrink
> >kmem_caches periodically, depending on memory pressure? how a user
> >will use this new sysctl?
> Shrinking the kmem caches under extreme memory pressure can be one way
> to free up extra pages, but the effect will probably be temporary.
> > What's the problem you're trying to solve in general?
> 
> At least for the slub allocator, shrinking the caches allow the number
> of active objects reported in slabinfo to be more accurate. In addition,
> this allow to know the real slab memory consumption. I have been working
> on a BZ about continuous memory leaks with a container based workloads.

So.. this is still a work around?

> The ability to shrink caches allow us to get a more accurate memory
> consumption picture. Another alternative is to turn on slub_debug which
> will then disables all the per-cpu slabs.

So this is a debugging mechanism?

> Anyway, I think this can be useful to others that is why I posted the patch.

Since this is debug stuff, please add this to /proc/sys/debug/ instead.
That would reflect the intention, and would avoid the concern that folks
in production would use these things.

Since we only have 2 users of /proc/sys/debug/ I am now wondering if
would be best to add a new sysctl debug taint flag. This way bug
reports with these stupid knobs can got to /dev/null inbox for bug
reports.

Masami, /proc/sys/debug/kprobes-optimization is debug. Would you be OK
to add the taint for it too?

Masoud, /proc/sys/debug/exception-trace seems to actually be enabled
by default, and its goal seems to be to enable disabling it. So I
don't think it would make sense to taint there.

So.. maybe we need something /proc/sys/taints/ or
/proc/sys/debug/taints/ so its *very* clear this is by no way ever
expected to be used in production.

May even be good to long term add a symlink for vm/drop_caches there
as well?

  Luis


Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-27 Thread Waiman Long
On 6/27/19 5:24 PM, Roman Gushchin wrote:
>>> 2) what's your long-term vision here? do you think that we need to shrink
>>>kmem_caches periodically, depending on memory pressure? how a user
>>>will use this new sysctl?
>> Shrinking the kmem caches under extreme memory pressure can be one way
>> to free up extra pages, but the effect will probably be temporary.
>>> What's the problem you're trying to solve in general?
>> At least for the slub allocator, shrinking the caches allow the number
>> of active objects reported in slabinfo to be more accurate. In addition,
>> this allow to know the real slab memory consumption. I have been working
>> on a BZ about continuous memory leaks with a container based workloads.
>> The ability to shrink caches allow us to get a more accurate memory
>> consumption picture. Another alternative is to turn on slub_debug which
>> will then disables all the per-cpu slabs.
> I see... I agree with Michal here, that extending drop_caches sysctl isn't
> the best idea. Isn't it possible to achieve the same effect using slub sysfs?

Yes, using the slub sysfs interface can be a possible alternative.

Cheers,
Longman



[tip:timers/core] hrtimer: Use a bullet for the returns bullet list

2019-06-27 Thread tip-bot for Mauro Carvalho Chehab
Commit-ID:  516337048fa40496ae5ca9863c367ec991a44d9a
Gitweb: https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a
Author: Mauro Carvalho Chehab 
AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300
Committer:  Thomas Gleixner 
CommitDate: Thu, 27 Jun 2019 23:30:04 +0200

hrtimer: Use a bullet for the returns bullet list

That gets rid of this warning:

   ./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank 
line; unexpected unindent.

and displays nicely both at the source code and at the produced
documentation.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Thomas Gleixner 
Cc: Linux Doc Mailing List 
Cc: Mauro Carvalho Chehab 
Cc: Jonathan Corbet 
Link: 
https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+sams...@kernel.org

---
 kernel/time/hrtimer.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index edb230aba3d1..5ee77f1a8a92 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
  * @timer: hrtimer to stop
  *
  * Returns:
- *  0 when the timer was not active
- *  1 when the timer was active
- * -1 when the timer is currently executing the callback function and
+ *
+ *  *  0 when the timer was not active
+ *  *  1 when the timer was active
+ *  * -1 when the timer is currently executing the callback function and
  *cannot be stopped
  */
 int hrtimer_try_to_cancel(struct hrtimer *timer)


Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Shuah Khan

On 6/27/19 3:01 PM, Jiunn Chang wrote:

The UP.rst file calls for locks acquired within RCU callback functions
to use _irq variants (spin_lock_irqsave() or similar), which does work,
but can be overkill.  This commit therefore instead calls for _bh variants
(spin_lock_bh() or similar), while noting that _irq does work.

Signed-off-by: Paul E. McKenney 


Should this by Suggested-by?


Signed-off-by: Jiunn Chang 
---
  Documentation/RCU/UP.rst | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
index 67715a47ae89..e26dda27430c 100644
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@@ -113,12 +113,13 @@ Answer to Quick Quiz #1:
  Answer to Quick Quiz #2:
What locking restriction must RCU callbacks respect?
  
-	Any lock that is acquired within an RCU callback must be

-   acquired elsewhere using an _irq variant of the spinlock
-   primitive.  For example, if "mylock" is acquired by an
-   RCU callback, then a process-context acquisition of this
-   lock must use something like spin_lock_irqsave() to
-   acquire the lock.
+   Any lock that is acquired within an RCU callback must be acquired
+   elsewhere using an _bh variant of the spinlock primitive.
+   For example, if "mylock" is acquired by an RCU callback, then
+   a process-context acquisition of this lock must use something
+   like spin_lock_bh() to acquire the lock.  Please note that
+   it is also OK to use _irq variants of spinlocks, for example,
+   spin_lock_irqsave().
  
  	If the process-context code were to simply use spin_lock(),

then, since RCU callbacks can be invoked from softirq context,



thanks,
-- Shuah


Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list

2019-06-27 Thread Joe Perches
On Thu, 2019-06-27 at 14:46 -0700, tip-bot for Mauro Carvalho Chehab
wrote:
> Commit-ID:  516337048fa40496ae5ca9863c367ec991a44d9a
> Gitweb: 
> https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a
> Author: Mauro Carvalho Chehab 
> AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300
> Committer:  Thomas Gleixner 
> CommitDate: Thu, 27 Jun 2019 23:30:04 +0200
> 
> hrtimer: Use a bullet for the returns bullet list
> 
> That gets rid of this warning:
> 
>./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank 
> line; unexpected unindent.

Doesn't this form occur multiple dozens of times in
kernel sources?

For instance:

$ git grep -B3 -A5 -P "^ \* Returns:?$" | \
  grep -P -A8 '\-\s+\*\s*@\w+:'

I think the warning is odd at best and docutils might
be updated or the warning ignored or suppressed.

> and displays nicely both at the source code and at the produced
> documentation.

> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
[]
> @@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
>   * @timer:   hrtimer to stop
>   *
>   * Returns:
> - *  0 when the timer was not active
> - *  1 when the timer was active
> - * -1 when the timer is currently executing the callback function and
> + *
> + *  *  0 when the timer was not active
> + *  *  1 when the timer was active
> + *  * -1 when the timer is currently executing the callback function and
>   *cannot be stopped
>   */
>  int hrtimer_try_to_cancel(struct hrtimer *timer)



Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Paul E. McKenney
On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote:
> On 6/27/19 3:01 PM, Jiunn Chang wrote:
> >The UP.rst file calls for locks acquired within RCU callback functions
> >to use _irq variants (spin_lock_irqsave() or similar), which does work,
> >but can be overkill.  This commit therefore instead calls for _bh variants
> >(spin_lock_bh() or similar), while noting that _irq does work.
> >
> >Signed-off-by: Paul E. McKenney 
> 
> Should this by Suggested-by?

I wrote it and Jiunn converted my change to .rst, so I believe that
this is correct as is.

Thanx, Paul

> >Signed-off-by: Jiunn Chang 
> >---
> >  Documentation/RCU/UP.rst | 13 +++--
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> >
> >diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
> >index 67715a47ae89..e26dda27430c 100644
> >--- a/Documentation/RCU/UP.rst
> >+++ b/Documentation/RCU/UP.rst
> >@@ -113,12 +113,13 @@ Answer to Quick Quiz #1:
> >  Answer to Quick Quiz #2:
> > What locking restriction must RCU callbacks respect?
> >-Any lock that is acquired within an RCU callback must be
> >-acquired elsewhere using an _irq variant of the spinlock
> >-primitive.  For example, if "mylock" is acquired by an
> >-RCU callback, then a process-context acquisition of this
> >-lock must use something like spin_lock_irqsave() to
> >-acquire the lock.
> >+Any lock that is acquired within an RCU callback must be acquired
> >+elsewhere using an _bh variant of the spinlock primitive.
> >+For example, if "mylock" is acquired by an RCU callback, then
> >+a process-context acquisition of this lock must use something
> >+like spin_lock_bh() to acquire the lock.  Please note that
> >+it is also OK to use _irq variants of spinlocks, for example,
> >+spin_lock_irqsave().
> > If the process-context code were to simply use spin_lock(),
> > then, since RCU callbacks can be invoked from softirq context,
> >
> 
> thanks,
> -- Shuah
> 



Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Shuah Khan

On 6/27/19 4:10 PM, Paul E. McKenney wrote:

On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote:

On 6/27/19 3:01 PM, Jiunn Chang wrote:

The UP.rst file calls for locks acquired within RCU callback functions
to use _irq variants (spin_lock_irqsave() or similar), which does work,
but can be overkill.  This commit therefore instead calls for _bh variants
(spin_lock_bh() or similar), while noting that _irq does work.

Signed-off-by: Paul E. McKenney 


Should this by Suggested-by?


I wrote it and Jiunn converted my change to .rst, so I believe that
this is correct as is.



Great.

thanks,
-- Shuah


Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Steven Rostedt
On Thu, 27 Jun 2019 15:10:45 -0700
"Paul E. McKenney"  wrote:

> On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote:
> > On 6/27/19 3:01 PM, Jiunn Chang wrote:  
> > >The UP.rst file calls for locks acquired within RCU callback functions
> > >to use _irq variants (spin_lock_irqsave() or similar), which does work,
> > >but can be overkill.  This commit therefore instead calls for _bh variants
> > >(spin_lock_bh() or similar), while noting that _irq does work.
> > >
> > >Signed-off-by: Paul E. McKenney   
> > 
> > Should this by Suggested-by?  
> 
> I wrote it and Jiunn converted my change to .rst, so I believe that
> this is correct as is.

Note, you did send Jiunn an explicit Signed-off-by when you wrote it,
correct? As Signed-off-by is equivalent to a signature.

-- Steve


Re: [Linux-kernel-mentees][PATCH] doc: RCU callback locks need only _bh, not necessarily _irq

2019-06-27 Thread Paul E. McKenney
On Thu, Jun 27, 2019 at 06:29:38PM -0400, Steven Rostedt wrote:
> On Thu, 27 Jun 2019 15:10:45 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Thu, Jun 27, 2019 at 04:01:35PM -0600, Shuah Khan wrote:
> > > On 6/27/19 3:01 PM, Jiunn Chang wrote:  
> > > >The UP.rst file calls for locks acquired within RCU callback functions
> > > >to use _irq variants (spin_lock_irqsave() or similar), which does work,
> > > >but can be overkill.  This commit therefore instead calls for _bh 
> > > >variants
> > > >(spin_lock_bh() or similar), while noting that _irq does work.
> > > >
> > > >Signed-off-by: Paul E. McKenney   
> > > 
> > > Should this by Suggested-by?  
> > 
> > I wrote it and Jiunn converted my change to .rst, so I believe that
> > this is correct as is.
> 
> Note, you did send Jiunn an explicit Signed-off-by when you wrote it,
> correct? As Signed-off-by is equivalent to a signature.

Indeed I did, but I now see that it was via private email.  Here it is
again for public consumption, and Jiunn's patch is based on this one,
just translated to .rst.  I once again verified that the Jiunn's version
is word-for-word identical to mine, so Jiunn's patch should be good.  ;-)

Thanx, Paul



commit a293734a310b463a0dba68409943a4e6065cd39d
Author: Paul E. McKenney 
Date:   Wed Jun 26 10:16:19 2019 -0700

doc: RCU callback locks need only _bh, not necessarily _irq

The UP.txt file calls for locks acquired within RCU callback functions
to use _irq variants (spin_lock_irqsave() or similar), which does work,
but can be overkill.  This commit therefore instead calls for _bh variants
(spin_lock_bh() or similar), while noting that _irq does work.

Signed-off-by: Paul E. McKenney 

diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt
index 53bde717017b..0edd8c5af0b5 100644
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
@@ -104,12 +104,13 @@ Answer to Quick Quiz #1:
 Answer to Quick Quiz #2:
What locking restriction must RCU callbacks respect?
 
-   Any lock that is acquired within an RCU callback must be
-   acquired elsewhere using an _irq variant of the spinlock
-   primitive.  For example, if "mylock" is acquired by an
-   RCU callback, then a process-context acquisition of this
-   lock must use something like spin_lock_irqsave() to
-   acquire the lock.
+   Any lock that is acquired within an RCU callback must be acquired
+   elsewhere using an _bh variant of the spinlock primitive.
+   For example, if "mylock" is acquired by an RCU callback, then
+   a process-context acquisition of this lock must use something
+   like spin_lock_bh() to acquire the lock.  Please note that
+   it is also OK to use _irq variants of spinlocks, for example,
+   spin_lock_irqsave().
 
If the process-context code were to simply use spin_lock(),
then, since RCU callbacks can be invoked from softirq context,


Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list

2019-06-27 Thread Mauro Carvalho Chehab
Em Thu, 27 Jun 2019 15:08:59 -0700
Joe Perches  escreveu:

> On Thu, 2019-06-27 at 14:46 -0700, tip-bot for Mauro Carvalho Chehab
> wrote:
> > Commit-ID:  516337048fa40496ae5ca9863c367ec991a44d9a
> > Gitweb: 
> > https://git.kernel.org/tip/516337048fa40496ae5ca9863c367ec991a44d9a
> > Author: Mauro Carvalho Chehab 
> > AuthorDate: Mon, 24 Jun 2019 07:33:26 -0300
> > Committer:  Thomas Gleixner 
> > CommitDate: Thu, 27 Jun 2019 23:30:04 +0200
> > 
> > hrtimer: Use a bullet for the returns bullet list
> > 
> > That gets rid of this warning:
> > 
> >./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank 
> > line; unexpected unindent.  
> 
> Doesn't this form occur multiple dozens of times in
> kernel sources?
> 
> For instance:
> 
> $ git grep -B3 -A5 -P "^ \* Returns:?$" | \
>   grep -P -A8 '\-\s+\*\s*@\w+:'

Yes, this is a common pattern, but not all patterns that match the above
regex are broken.

> 
> I think the warning is odd at best and docutils might
> be updated or the warning ignored or suppressed.
> 
> > and displays nicely both at the source code and at the produced
> > documentation.  

The warnings are painful - and they're the main reason why I wrote this
change: - I wanted to avoid new warnings actually unrelated to my
changes that were sometimes appearing while doing incremental
"make htmldocs" on a big patchset that I've been rebasing almost every
week over the last two months.

-

Yet, did you try to look how this pattern will appear at the html and pdf
output? Something like this:

sound/soc/codecs/wm8960.c: * Returns:
sound/soc/codecs/wm8960.c- *  -1, in case no sysclk frequency available 
found
sound/soc/codecs/wm8960.c- * >=0, in case we could derive bclk and 
lrclk from sysclk using
sound/soc/codecs/wm8960.c- *  (@sysclk_idx, @dac_idx, @bclk_idx) 
dividers


Will be displayed as:

**Returns:**
  -1, in case no sysclk frequency available found **>=0, in case we 
could derive bclk and lrclk from sysclk using** (@sysclk_idx, @dac_idx, 
@bclk_idx) dividers


(where **foo**) means that "foo" will be printed in bold.

E. g. it will just merge all returns values into a single line and, if 
there are alignment differences, it will make the previous line bold
and produce a warning.

On some places, however, what's there will be properly displayed,
like this one:
**
 * wimax_reset - Reset a WiMAX device
 *
 * @wimax_dev: WiMAX device descriptor
 *
 * Returns:
 *
 * %0 if ok and a warm reset was done (the device still exists in
 * the system).
 *
 * -%ENODEV if a cold/bus reset had to be done (device has
 * disconnected and reconnected, so current handle is not valid
 * any more).
 *
 * -%EINVAL if the device is not even registered.
 *
 * Any other negative error code shall be considered as
 * non-recoverable.
 *

As there are blank lines between each value, making each return code a
different line.

This one:

tools/lib/traceevent/parse-filter.c: * Returns:
tools/lib/traceevent/parse-filter.c- *  1 if the two filters hold the same 
content.
tools/lib/traceevent/parse-filter.c- *  0 if they do not.

will also not mangle too much, as the dots will help for someone to 
understand, if reading the html/pdf output, like this:

**Returns:**
  1 if the two filters hold the same content. 0 if they do not.

So, it all depends on the context.

-

While it would likely be possible to improve kernel-doc to present better
results, I'm afraid that it would be too complex for simple regex
expressions, and hard to tune, as it would be a hint-based approach,
and doing a natural language processing would be too much effort.


> 
> > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c  
> []
> > @@ -1114,9 +1114,10 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
> >   * @timer: hrtimer to stop
> >   *
> >   * Returns:
> > - *  0 when the timer was not active
> > - *  1 when the timer was active
> > - * -1 when the timer is currently executing the callback function and
> > + *
> > + *  *  0 when the timer was not active
> > + *  *  1 when the timer was active
> > + *  * -1 when the timer is currently executing the callback function and
> >   *cannot be stopped
> >   */
> >  int hrtimer_try_to_cancel(struct hrtimer *timer)  
> 



Thanks,
Mauro


[PATCH 00/15] FPGA DFL updates

2019-06-27 Thread Moritz Fischer
Hi Greg,

can you please take the following patches. They're mostly new features
and some cleanup of the DFL internals.

They've been on the mailing list and have been reviewed.

Note: I've seen that Mauro touched Documentation/fpga/dfl.rst in linux-next
  commit c220a1fae6c5d ("docs: fpga: convert docs to ReST and rename to 
*.rst")
  and fixed up PATCH 05/15 to apply on top of that.

If you prefer the original series against char-misc-next let me know,
and I'll resubmit.

Thanks,

Moritz

Wu Hao (15):
  fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.
  fpga: dfl: fme: remove copy_to_user() in ioctl for PR
  fpga: dfl: fme: align PR buffer size per PR datawidth
  fpga: dfl: fme: support 512bit data width PR
  Documentation: fpga: dfl: add descriptions for virtualization and new
interfaces.
  fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
  fpga: dfl: pci: enable SRIOV support.
  fpga: dfl: afu: add AFU state related sysfs interfaces
  fpga: dfl: afu: add userclock sysfs interfaces.
  fpga: dfl: add id_table for dfl private feature driver
  fpga: dfl: afu: export __port_enable/disable function.
  fpga: dfl: afu: add error reporting support.
  fpga: dfl: afu: add STP (SignalTap) support
  fpga: dfl: fme: add capability sysfs interfaces
  fpga: dfl: fme: add global error reporting support

 .../ABI/testing/sysfs-platform-dfl-fme|  98 +
 .../ABI/testing/sysfs-platform-dfl-port   | 104 +
 Documentation/fpga/dfl.rst| 100 +
 drivers/fpga/Makefile |   3 +-
 drivers/fpga/dfl-afu-error.c  | 225 ++
 drivers/fpga/dfl-afu-main.c   | 330 ++-
 drivers/fpga/dfl-afu.h|   7 +
 drivers/fpga/dfl-fme-error.c  | 385 ++
 drivers/fpga/dfl-fme-main.c   | 120 +-
 drivers/fpga/dfl-fme-mgr.c| 117 +-
 drivers/fpga/dfl-fme-pr.c |  65 +--
 drivers/fpga/dfl-fme.h|   7 +-
 drivers/fpga/dfl-pci.c|  40 ++
 drivers/fpga/dfl.c| 169 +++-
 drivers/fpga/dfl.h|  54 ++-
 include/uapi/linux/fpga-dfl.h |  32 ++
 16 files changed, 1776 insertions(+), 80 deletions(-)
 create mode 100644 drivers/fpga/dfl-afu-error.c
 create mode 100644 drivers/fpga/dfl-fme-error.c

-- 
2.22.0



[PATCH 12/15] fpga: dfl: afu: add error reporting support.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

Error reporting is one important private feature, it reports error
detected on port and accelerated function unit (AFU). It introduces
several sysfs interfaces to allow userspace to check and clear
errors detected by hardware.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 .../ABI/testing/sysfs-platform-dfl-port   |  39 +++
 drivers/fpga/Makefile |   1 +
 drivers/fpga/dfl-afu-error.c  | 225 ++
 drivers/fpga/dfl-afu-main.c   |   4 +
 drivers/fpga/dfl-afu.h|   4 +
 5 files changed, 273 insertions(+)
 create mode 100644 drivers/fpga/dfl-afu-error.c

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port 
b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 04ea7f2971c7..4aeca94856b6 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -79,3 +79,42 @@ KernelVersion:   5.3
 Contact:   Wu Hao 
 Description:   Read-only. Read this file to get the status of issued command
to userclck_freqcntrcmd.
+
+What:  /sys/bus/platform/devices/dfl-port.0/errors/revision
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the revision of this error
+   reporting private feature.
+
+What:  /sys/bus/platform/devices/dfl-port.0/errors/errors
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get errors detected on port and
+   Accelerated Function Unit (AFU).
+
+What:  /sys/bus/platform/devices/dfl-port.0/errors/first_error
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the first error detected by
+   hardware.
+
+What:  /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the first malformed request
+   captured by hardware.
+
+What:  /sys/bus/platform/devices/dfl-port.0/errors/clear
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Write-only. Write error code to this file to clear errors.
+   Write fails with -EINVAL if input parsing fails or input error
+   code doesn't match.
+   Write fails with -EBUSY or -ETIMEDOUT if error can't be cleared
+   as hardware is in low power state (-EBUSY) or not responding
+   (-ETIMEDOUT).
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 312b9371742f..72558914a29c 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_FPGA_DFL_AFU)+= dfl-afu.o
 
 dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
 dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
+dfl-afu-objs += dfl-afu-error.o
 
 # Drivers for FPGAs which implement DFL
 obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
new file mode 100644
index ..f20dbdf5805d
--- /dev/null
+++ b/drivers/fpga/dfl-afu-error.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ *   Wu Hao 
+ *   Xiao Guangrong 
+ *   Joseph Grecco 
+ *   Enno Luebbers 
+ *   Tim Whisonant 
+ *   Ananda Ravuri 
+ *   Mitchel Henry 
+ */
+
+#include 
+
+#include "dfl-afu.h"
+
+#define PORT_ERROR_MASK0x8
+#define PORT_ERROR 0x10
+#define PORT_FIRST_ERROR   0x18
+#define PORT_MALFORMED_REQ00x20
+#define PORT_MALFORMED_REQ10x28
+
+#define ERROR_MASK GENMASK_ULL(63, 0)
+
+/* mask or unmask port errors by the error mask register. */
+static void __port_err_mask(struct device *dev, bool mask)
+{
+   void __iomem *base;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+   writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
+}
+
+/* clear port errors. */
+static int __port_err_clear(struct device *dev, u64 err)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   void __iomem *base_err, *base_hdr;
+   int ret;
+   u64 v;
+
+   base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+   base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   /*
+* clear Port Errors
+*
+* - Check for AP6 State
+* - Halt Port by keeping Port in reset
+* - Set PORT Error mask to all 1 to mask errors
+* - Clear all errors
+* - Set Port mask to all 0 to enable errors
+* - All errors start capturing

[PATCH 07/15] fpga: dfl: pci: enable SRIOV support.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch enables the standard sriov support. It allows user to
enable SRIOV (and VFs), then user could pass through accelerators
(VFs) into virtual machine or use VFs directly in host.

Signed-off-by: Zhang Yi Z 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-pci.c | 40 
 drivers/fpga/dfl.c | 41 +
 drivers/fpga/dfl.h |  1 +
 3 files changed, 82 insertions(+)

diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c
index 66b5720582bb..2fa571b0fdea 100644
--- a/drivers/fpga/dfl-pci.c
+++ b/drivers/fpga/dfl-pci.c
@@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct 
pci_device_id *pcidevid)
return ret;
 }
 
+static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs)
+{
+   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
+   struct dfl_fpga_cdev *cdev = drvdata->cdev;
+   int ret = 0;
+
+   mutex_lock(&cdev->lock);
+
+   if (!num_vfs) {
+   /*
+* disable SRIOV and then put released ports back to default
+* PF access mode.
+*/
+   pci_disable_sriov(pcidev);
+
+   __dfl_fpga_cdev_config_port_vf(cdev, false);
+
+   } else if (cdev->released_port_num == num_vfs) {
+   /*
+* only enable SRIOV if cdev has matched released ports, put
+* released ports into VF access mode firstly.
+*/
+   __dfl_fpga_cdev_config_port_vf(cdev, true);
+
+   ret = pci_enable_sriov(pcidev, num_vfs);
+   if (ret)
+   __dfl_fpga_cdev_config_port_vf(cdev, false);
+   } else {
+   ret = -EINVAL;
+   }
+
+   mutex_unlock(&cdev->lock);
+   return ret;
+}
+
 static void cci_pci_remove(struct pci_dev *pcidev)
 {
+   if (dev_is_pf(&pcidev->dev))
+   cci_pci_sriov_configure(pcidev, 0);
+
cci_remove_feature_devs(pcidev);
pci_disable_pcie_error_reporting(pcidev);
 }
@@ -234,6 +272,7 @@ static struct pci_driver cci_pci_driver = {
.id_table = cci_pcie_id_tbl,
.probe = cci_pci_probe,
.remove = cci_pci_remove,
+   .sriov_configure = cci_pci_sriov_configure,
 };
 
 module_pci_driver(cci_pci_driver);
@@ -241,3 +280,4 @@ module_pci_driver(cci_pci_driver);
 MODULE_DESCRIPTION("FPGA DFL PCIe Device Driver");
 MODULE_AUTHOR("Intel Corporation");
 MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 308c80868af4..28d61b611165 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
 }
 EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port);
 
+static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf)
+{
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER);
+
+   v = readq(base + FME_HDR_PORT_OFST(port_id));
+
+   v &= ~FME_PORT_OFST_ACC_CTRL;
+   v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL,
+   is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF);
+
+   writeq(v, base + FME_HDR_PORT_OFST(port_id));
+}
+
+/**
+ * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode
+ *
+ * @cdev: parent container device.
+ * @if_vf: true for VF access mode, and false for PF access mode
+ *
+ * Return: 0 on success, negative error code otherwise.
+ *
+ * This function is needed in sriov configuration routine. It could be used to
+ * configures the released ports access mode to VF or PF.
+ * The caller needs to hold lock for protection.
+ */
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf)
+{
+   struct dfl_feature_platform_data *pdata;
+
+   list_for_each_entry(pdata, &cdev->port_dev_list, node) {
+   if (device_is_registered(&pdata->dev->dev))
+   continue;
+
+   config_port_vf(cdev->fme_dev, pdata->id, is_vf);
+   }
+}
+EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf);
+
 static int __init dfl_fpga_init(void)
 {
int ret;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 63f39ab08905..1350e8eb9e59 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -421,5 +421,6 @@ dfl_fpga_cdev_find_port(struct dfl_fpga_cdev *cdev, void 
*data,
 
 int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
  u32 port_id, bool release);
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf);
 
 #endif /* __FPGA_DFL_H */
-- 
2.22.0



[PATCH 02/15] fpga: dfl: fme: remove copy_to_user() in ioctl for PR

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch removes copy_to_user() code in partial reconfiguration
ioctl, as it's useless as user never needs to read the data
structure after ioctl.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Moritz Fischer 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-pr.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index d9ca9554844a..6ec0f09e5994 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -159,9 +159,6 @@ static int fme_pr(struct platform_device *pdev, unsigned 
long arg)
mutex_unlock(&pdata->lock);
 free_exit:
vfree(buf);
-   if (copy_to_user((void __user *)arg, &port_pr, minsz))
-   return -EFAULT;
-
return ret;
 }
 
-- 
2.22.0



[PATCH 05/15] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch adds virtualization support description for DFL based
FPGA devices (based on PCIe SRIOV), and introductions to new
interfaces added by new dfl private feature drivers.

[m...@kernel.org: Fixed up to make it work with new reStructuredText docs]
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 Documentation/fpga/dfl.rst | 100 +
 1 file changed, 100 insertions(+)

diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
index 2f125abd777f..be9929dd7251 100644
--- a/Documentation/fpga/dfl.rst
+++ b/Documentation/fpga/dfl.rst
@@ -87,6 +87,8 @@ The following functions are exposed through ioctls:
 - Get driver API version (DFL_FPGA_GET_API_VERSION)
 - Check for extensions (DFL_FPGA_CHECK_EXTENSION)
 - Program bitstream (DFL_FPGA_FME_PORT_PR)
+- Assign port to PF (DFL_FPGA_FME_PORT_ASSIGN)
+- Release port from PF (DFL_FPGA_FME_PORT_RELEASE)
 
 More functions are exposed through sysfs
 (/sys/class/fpga_region/regionX/dfl-fme.n/):
@@ -143,6 +145,9 @@ More functions are exposed through sysfs:
  Read Accelerator GUID (afu_id)
  afu_id indicates which PR bitstream is programmed to this AFU.
 
+ Global error reporting management (errors/)
+ error reporting sysfs interfaces allow user to read errors detected by the
+ hardware, and clear the logged errors.
 
 DFL Framework Overview
 ==
@@ -218,6 +223,101 @@ the compat_id exposed by the target FPGA region. This 
check is usually done by
 userspace before calling the reconfiguration IOCTL.
 
 
+FPGA virtualization - PCIe SRIOV
+
+This section describes the virtualization support on DFL based FPGA device to
+enable accessing an accelerator from applications running in a virtual machine
+(VM). This section only describes the PCIe based FPGA device with SRIOV 
support.
+
+Features supported by the particular FPGA device are exposed through Device
+Feature Lists, as illustrated below:
+
+::
+
++---+  +-+
+|  PF   |  | VF  |
++---+  +-+
+^^ ^  ^
+|| |  |
+  +-||-|--|---+
+  | || |  |   |
+  |  +-+ +---+ +---+  +---+   |
+  |  | FME | | Port0 | | Port1 |  | Port2 |   |
+  |  +-+ +---+ +---+  +---+   |
+  |  ^ ^  ^   |
+  |  | |  |   |
+  |  +---+ +--+   +---+   |
+  |  |  AFU  | |  AFU |   |  AFU  |   |
+  |  +---+ +--+   +---+   |
+  |   |
+  |DFL based FPGA PCIe Device |
+  +---+
+
+FME is always accessed through the physical function (PF).
+
+Ports (and related AFUs) are accessed via PF by default, but could be exposed
+through virtual function (VF) devices via PCIe SRIOV. Each VF only contains
+1 Port and 1 AFU for isolation. Users could assign individual VFs 
(accelerators)
+created via PCIe SRIOV interface, to virtual machines.
+
+The driver organization in virtualization case is illustrated below:
+::
+
++---++--++--+ |
+| FME   || FME  || FME  | |
+| FPGA  || FPGA || FPGA | |
+|Manager||Bridge||Region| |
++---++--++--+ |
++---+  ++ | ++
+|  FME  |  |  AFU   | | |  AFU   |
+| Module|  | Module | | | Module |
++---+  ++ | ++
+  +---+   |   +---+
+  | FPGA Container Device |   |   | FPGA Container Device |
+  |  (FPGA Base Region)   |   |   |  (FPGA Base Region)   |
+  +---+   |   +---+
++--+  | +--+
+| FPGA PCIE Module |  | Virtual | FPGA PCIE Module |
++--+   Host   | Machine +--+
+   -- | --
+ +---+|  +---+
+ | PCI PF Device ||  | PCI VF Device |
+ +---+|  +---+
+
+FPGA PCIe device driver is always loaded first once a FPGA PCIe PF or VF device
+is detected. It:
+
+* Finishes enumeration on both FPGA PCIe PF and VF device using common
+  int

[PATCH 03/15] fpga: dfl: fme: align PR buffer size per PR datawidth

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

Current driver checks if input bitstream file size is aligned or
not per PR data width (default 32bits). It requires one additional
step for end user when they generate the bitstream file, padding
extra zeros to bitstream file to align its size per PR data width,
but they don't have to as hardware will drop extra padding bytes
automatically.

In order to simplify the user steps, this patch aligns PR buffer
size per PR data width in driver, to allow user to pass unaligned
size bitstream files to driver.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-pr.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index 6ec0f09e5994..3c71dc3faaf5 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long 
arg)
struct dfl_fme *fme;
unsigned long minsz;
void *buf = NULL;
+   size_t length;
int ret = 0;
u64 v;
 
@@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long 
arg)
if (port_pr.argsz < minsz || port_pr.flags)
return -EINVAL;
 
-   if (!IS_ALIGNED(port_pr.buffer_size, 4))
-   return -EINVAL;
-
/* get fme header region */
fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev,
   FME_FEATURE_ID_HEADER);
@@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned 
long arg)
   port_pr.buffer_size))
return -EFAULT;
 
-   buf = vmalloc(port_pr.buffer_size);
+   /*
+* align PR buffer per PR bandwidth, as HW ignores the extra padding
+* data automatically.
+*/
+   length = ALIGN(port_pr.buffer_size, 4);
+
+   buf = vmalloc(length);
if (!buf)
return -ENOMEM;
 
@@ -140,7 +144,7 @@ static int fme_pr(struct platform_device *pdev, unsigned 
long arg)
fpga_image_info_free(region->info);
 
info->buf = buf;
-   info->count = port_pr.buffer_size;
+   info->count = length;
info->region_id = port_pr.port_id;
region->info = info;
 
-- 
2.22.0



[PATCH 08/15] fpga: dfl: afu: add AFU state related sysfs interfaces

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch introduces more sysfs interfaces for Accelerated
Function Unit (AFU). These interfaces allow users to read
current AFU Power State (APx), read / clear AFU Power (APx)
events which are sticky to identify transient APx state,
and manage AFU's LTR (latency tolerance reporting).

Signed-off-by: Ananda Ravuri 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 .../ABI/testing/sysfs-platform-dfl-port   |  30 
 drivers/fpga/dfl-afu-main.c   | 140 ++
 drivers/fpga/dfl.h|  11 ++
 3 files changed, 181 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port 
b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 6a92dda517b0..17b37d110618 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -14,3 +14,33 @@ Description: Read-only. User can program different PR 
bitstreams to FPGA
Accelerator Function Unit (AFU) for different functions. It
returns uuid which could be used to identify which PR bitstream
is programmed in this AFU.
+
+What:  /sys/bus/platform/devices/dfl-port.0/power_state
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. It reports the APx (AFU Power) state, different APx
+   means different throttling level. When reading this file, it
+   returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6.
+
+What:  /sys/bus/platform/devices/dfl-port.0/ap1_event
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-write. Read or set 1 to clear AP1 (AFU Power State 1)
+   event. It's used to indicate transient AP1 state.
+
+What:  /sys/bus/platform/devices/dfl-port.0/ap2_event
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-write. Read or set 1 to clear AP2 (AFU Power State 2)
+   event. It's used to indicate transient AP2 state.
+
+What:  /sys/bus/platform/devices/dfl-port.0/ltr
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-write. Read and set AFU latency tolerance reporting value.
+   Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
+   to 0 if it is latency sensitive.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 02baa6a227c0..040ed8ad16e5 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -21,6 +21,8 @@
 
 #include "dfl-afu.h"
 
+#define DRV_VERSION"0.8"
+
 /**
  * port_enable - enable a port
  * @pdev: port platform device.
@@ -141,8 +143,145 @@ id_show(struct device *dev, struct device_attribute 
*attr, char *buf)
 }
 static DEVICE_ATTR_RO(id);
 
+static ssize_t
+ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   mutex_lock(&pdata->lock);
+   v = readq(base + PORT_HDR_CTRL);
+   mutex_unlock(&pdata->lock);
+
+   return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v));
+}
+
+static ssize_t
+ltr_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   void __iomem *base;
+   u8 ltr;
+   u64 v;
+
+   if (kstrtou8(buf, 0,  1)
+   return -EINVAL;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   mutex_lock(&pdata->lock);
+   v = readq(base + PORT_HDR_CTRL);
+   v &= ~PORT_CTRL_LATENCY;
+   v |= FIELD_PREP(PORT_CTRL_LATENCY, ltr);
+   writeq(v, base + PORT_HDR_CTRL);
+   mutex_unlock(&pdata->lock);
+
+   return count;
+}
+static DEVICE_ATTR_RW(ltr);
+
+static ssize_t
+ap1_event_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   mutex_lock(&pdata->lock);
+   v = readq(base + PORT_HDR_STS);
+   mutex_unlock(&pdata->lock);
+
+   return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_STS_AP1_EVT, v));
+}
+
+static ssize_t
+ap1_event_store(struct device *dev, struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   void __iomem *base;
+   u8 ap1_event;
+
+   if (kstrtou8(buf, 0, &ap1_event) || ap1_event != 1)
+   return -EINVAL;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEAT

[PATCH 10/15] fpga: dfl: add id_table for dfl private feature driver

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch adds id_table for each dfl private feature driver,
it allows to reuse same private feature driver to match and support
multiple dfl private features.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Moritz Fischer 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-afu-main.c | 14 --
 drivers/fpga/dfl-fme-main.c | 11 ---
 drivers/fpga/dfl-fme-pr.c   |  7 ++-
 drivers/fpga/dfl-fme.h  |  3 ++-
 drivers/fpga/dfl.c  | 21 +++--
 drivers/fpga/dfl.h  | 21 +++--
 6 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 8b434a405498..65b3e895e364 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -435,6 +435,11 @@ port_hdr_ioctl(struct platform_device *pdev, struct 
dfl_feature *feature,
return ret;
 }
 
+static const struct dfl_feature_id port_hdr_id_table[] = {
+   {.id = PORT_FEATURE_ID_HEADER,},
+   {0,}
+};
+
 static const struct dfl_feature_ops port_hdr_ops = {
.init = port_hdr_init,
.uinit = port_hdr_uinit,
@@ -495,6 +500,11 @@ static void port_afu_uinit(struct platform_device *pdev,
sysfs_remove_files(&pdev->dev.kobj, port_afu_attrs);
 }
 
+static const struct dfl_feature_id port_afu_id_table[] = {
+   {.id = PORT_FEATURE_ID_AFU,},
+   {0,}
+};
+
 static const struct dfl_feature_ops port_afu_ops = {
.init = port_afu_init,
.uinit = port_afu_uinit,
@@ -502,11 +512,11 @@ static const struct dfl_feature_ops port_afu_ops = {
 
 static struct dfl_feature_driver port_feature_drvs[] = {
{
-   .id = PORT_FEATURE_ID_HEADER,
+   .id_table = port_hdr_id_table,
.ops = &port_hdr_ops,
},
{
-   .id = PORT_FEATURE_ID_AFU,
+   .id_table = port_afu_id_table,
.ops = &port_afu_ops,
},
{
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 8b2a33760483..38c6342e1865 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -158,6 +158,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
return -ENODEV;
 }
 
+static const struct dfl_feature_id fme_hdr_id_table[] = {
+   {.id = FME_FEATURE_ID_HEADER,},
+   {0,}
+};
+
 static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
@@ -166,12 +171,12 @@ static const struct dfl_feature_ops fme_hdr_ops = {
 
 static struct dfl_feature_driver fme_feature_drvs[] = {
{
-   .id = FME_FEATURE_ID_HEADER,
+   .id_table = fme_hdr_id_table,
.ops = &fme_hdr_ops,
},
{
-   .id = FME_FEATURE_ID_PR_MGMT,
-   .ops = &pr_mgmt_ops,
+   .id_table = fme_pr_mgmt_id_table,
+   .ops = &fme_pr_mgmt_ops,
},
{
.ops = NULL,
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index cd94ba870094..52f1745dfb25 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -483,7 +483,12 @@ static long fme_pr_ioctl(struct platform_device *pdev,
return ret;
 }
 
-const struct dfl_feature_ops pr_mgmt_ops = {
+const struct dfl_feature_id fme_pr_mgmt_id_table[] = {
+   {.id = FME_FEATURE_ID_PR_MGMT,},
+   {0}
+};
+
+const struct dfl_feature_ops fme_pr_mgmt_ops = {
.init = pr_mgmt_init,
.uinit = pr_mgmt_uinit,
.ioctl = fme_pr_ioctl,
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index de207556b70a..7a021c483e9b 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -35,6 +35,7 @@ struct dfl_fme {
struct dfl_feature_platform_data *pdata;
 };
 
-extern const struct dfl_feature_ops pr_mgmt_ops;
+extern const struct dfl_feature_ops fme_pr_mgmt_ops;
+extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
 
 #endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 28d61b611165..1bb2b582e4b0 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -14,6 +14,8 @@
 
 #include "dfl.h"
 
+#define DRV_VERSION"0.8"
+
 static DEFINE_MUTEX(dfl_id_mutex);
 
 /*
@@ -281,6 +283,21 @@ static int dfl_feature_instance_init(struct 
platform_device *pdev,
return ret;
 }
 
+static bool dfl_feature_drv_match(struct dfl_feature *feature,
+ struct dfl_feature_driver *driver)
+{
+   const struct dfl_feature_id *ids = driver->id_table;
+
+   if (ids) {
+   while (ids->id) {
+   if (ids->id == feature->id)
+   return true;
+   ids++;
+   }
+   }
+   return false;
+}
+
 /**
  * dfl_fpga_dev_feature_init - init for sub features of dfl feature device
  * @pdev: feature device.
@@ -301,8 +318,7 @@ int 

[PATCH 11/15] fpga: dfl: afu: export __port_enable/disable function.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

As these two functions are used by other private features. e.g.
in error reporting private feature, it requires to check port status
and reset port for error clearing.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Moritz Fischer 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-afu-main.c | 25 ++---
 drivers/fpga/dfl-afu.h  |  3 +++
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 65b3e895e364..c8bc0b5d9c16 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -24,14 +24,16 @@
 #define DRV_VERSION"0.8"
 
 /**
- * port_enable - enable a port
+ * __port_enable - enable a port
  * @pdev: port platform device.
  *
  * Enable Port by clear the port soft reset bit, which is set by default.
  * The AFU is unable to respond to any MMIO access while in reset.
- * port_enable function should only be used after port_disable function.
+ * __port_enable function should only be used after __port_disable function.
+ *
+ * The caller needs to hold lock for protection.
  */
-static void port_enable(struct platform_device *pdev)
+void __port_enable(struct platform_device *pdev)
 {
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -54,13 +56,14 @@ static void port_enable(struct platform_device *pdev)
 #define RST_POLL_TIMEOUT 1000 /* us */
 
 /**
- * port_disable - disable a port
+ * __port_disable - disable a port
  * @pdev: port platform device.
  *
- * Disable Port by setting the port soft reset bit, it puts the port into
- * reset.
+ * Disable Port by setting the port soft reset bit, it puts the port into 
reset.
+ *
+ * The caller needs to hold lock for protection.
  */
-static int port_disable(struct platform_device *pdev)
+int __port_disable(struct platform_device *pdev)
 {
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -106,9 +109,9 @@ static int __port_reset(struct platform_device *pdev)
 {
int ret;
 
-   ret = port_disable(pdev);
+   ret = __port_disable(pdev);
if (!ret)
-   port_enable(pdev);
+   __port_enable(pdev);
 
return ret;
 }
@@ -805,9 +808,9 @@ static int port_enable_set(struct platform_device *pdev, 
bool enable)
 
mutex_lock(&pdata->lock);
if (enable)
-   port_enable(pdev);
+   __port_enable(pdev);
else
-   ret = port_disable(pdev);
+   ret = __port_disable(pdev);
mutex_unlock(&pdata->lock);
 
return ret;
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 0c7630ae3cda..35e60c5859a4 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -79,6 +79,9 @@ struct dfl_afu {
struct dfl_feature_platform_data *pdata;
 };
 
+void __port_enable(struct platform_device *pdev);
+int __port_disable(struct platform_device *pdev);
+
 void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
 int afu_mmio_region_add(struct dfl_feature_platform_data *pdata,
u32 region_index, u64 region_size, u64 phys, u32 flags);
-- 
2.22.0



[PATCH 14/15] fpga: dfl: fme: add capability sysfs interfaces

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch adds 3 read-only sysfs interfaces for FPGA Management Engine
(FME) block for capabilities including cache_size, fabric_version and
socket_id.

Signed-off-by: Luwei Kang 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 .../ABI/testing/sysfs-platform-dfl-fme| 23 +
 drivers/fpga/dfl-fme-main.c   | 48 +++
 2 files changed, 71 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme 
b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 8fa4febfa4b2..99cd3b2acff5 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -21,3 +21,26 @@ Contact: Wu Hao 
 Description:   Read-only. It returns Bitstream (static FPGA region) meta
data, which includes the synthesis date, seed and other
information of this static FPGA region.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/cache_size
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. It returns cache size of this FPGA device.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/fabric_version
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. It returns fabric version of this FPGA device.
+   Userspace applications need this information to select
+   best data channels per different fabric design.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/socket_id
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. It returns socket_id to indicate which socket
+   this FPGA belongs to, only valid for integrated solution.
+   User only needs this information, in case standard numa node
+   can't provide correct information.
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 38c6342e1865..2d69b8fd0137 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -75,10 +75,58 @@ static ssize_t bitstream_metadata_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(bitstream_metadata);
 
+static ssize_t cache_size_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+   v = readq(base + FME_HDR_CAP);
+
+   return sprintf(buf, "%u\n",
+  (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v));
+}
+static DEVICE_ATTR_RO(cache_size);
+
+static ssize_t fabric_version_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+   v = readq(base + FME_HDR_CAP);
+
+   return sprintf(buf, "%u\n",
+  (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v));
+}
+static DEVICE_ATTR_RO(fabric_version);
+
+static ssize_t socket_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   void __iomem *base;
+   u64 v;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+   v = readq(base + FME_HDR_CAP);
+
+   return sprintf(buf, "%u\n",
+  (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v));
+}
+static DEVICE_ATTR_RO(socket_id);
+
 static const struct attribute *fme_hdr_attrs[] = {
&dev_attr_ports_num.attr,
&dev_attr_bitstream_id.attr,
&dev_attr_bitstream_metadata.attr,
+   &dev_attr_cache_size.attr,
+   &dev_attr_fabric_version.attr,
+   &dev_attr_socket_id.attr,
NULL,
 };
 
-- 
2.22.0



[PATCH 04/15] fpga: dfl: fme: support 512bit data width PR

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

In early partial reconfiguration private feature, it only
supports 32bit data width when writing data to hardware for
PR. 512bit data width PR support is an important optimization
for some specific solutions (e.g. XEON with FPGA integrated),
it allows driver to use AVX512 instruction to improve the
performance of partial reconfiguration. e.g. programming one
100MB bitstream image via this 512bit data width PR hardware
only takes ~300ms, but 32bit revision requires ~3s per test
result.

Please note now this optimization is only done on revision 2
of this PR private feature which is only used in integrated
solution that AVX512 is always supported. This revision 2
hardware doesn't support 32bit PR.

Signed-off-by: Ananda Ravuri 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-main.c |   3 +
 drivers/fpga/dfl-fme-mgr.c  | 113 +++-
 drivers/fpga/dfl-fme-pr.c   |  43 +-
 drivers/fpga/dfl-fme.h  |   2 +
 drivers/fpga/dfl.h  |   5 ++
 5 files changed, 135 insertions(+), 31 deletions(-)

diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 086ad2420ade..076d74f6416d 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -21,6 +21,8 @@
 #include "dfl.h"
 #include "dfl-fme.h"
 
+#define DRV_VERSION"0.8"
+
 static ssize_t ports_num_show(struct device *dev,
  struct device_attribute *attr, char *buf)
 {
@@ -277,3 +279,4 @@ MODULE_DESCRIPTION("FPGA Management Engine driver");
 MODULE_AUTHOR("Intel Corporation");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("platform:dfl-fme");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index b3f7eee3c93f..d1a4ba5d1d3d 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -22,14 +22,18 @@
 #include 
 #include 
 
+#include "dfl.h"
 #include "dfl-fme-pr.h"
 
+#define DRV_VERSION"0.8"
+
 /* FME Partial Reconfiguration Sub Feature Register Set */
 #define FME_PR_DFH 0x0
 #define FME_PR_CTRL0x8
 #define FME_PR_STS 0x10
 #define FME_PR_DATA0x18
 #define FME_PR_ERR 0x20
+#define FME_PR_512_DATA0x40 /* Data Register for 512bit 
datawidth PR */
 #define FME_PR_INTFC_ID_L  0xA8
 #define FME_PR_INTFC_ID_H  0xB0
 
@@ -67,8 +71,43 @@
 #define PR_WAIT_TIMEOUT   800
 #define PR_HOST_STATUS_IDLE0
 
+#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
+
+#include 
+#include 
+
+static inline int is_cpu_avx512_enabled(void)
+{
+   return cpu_feature_enabled(X86_FEATURE_AVX512F);
+}
+
+static inline void copy512(const void *src, void __iomem *dst)
+{
+   kernel_fpu_begin();
+
+   asm volatile("vmovdqu64 (%0), %%zmm0;"
+"vmovntdq %%zmm0, (%1);"
+:
+: "r"(src), "r"(dst)
+: "memory");
+
+   kernel_fpu_end();
+}
+#else
+static inline int is_cpu_avx512_enabled(void)
+{
+   return 0;
+}
+
+static inline void copy512(const void *src, void __iomem *dst)
+{
+   WARN_ON_ONCE(1);
+}
+#endif
+
 struct fme_mgr_priv {
void __iomem *ioaddr;
+   unsigned int pr_datawidth;
u64 pr_error;
 };
 
@@ -169,7 +208,7 @@ static int fme_mgr_write(struct fpga_manager *mgr,
struct fme_mgr_priv *priv = mgr->priv;
void __iomem *fme_pr = priv->ioaddr;
u64 pr_ctrl, pr_status, pr_data;
-   int delay = 0, pr_credit, i = 0;
+   int ret = 0, delay = 0, pr_credit;
 
dev_dbg(dev, "start request\n");
 
@@ -181,9 +220,9 @@ static int fme_mgr_write(struct fpga_manager *mgr,
 
/*
 * driver can push data to PR hardware using PR_DATA register once HW
-* has enough pr_credit (> 1), pr_credit reduces one for every 32bit
-* pr data write to PR_DATA register. If pr_credit <= 1, driver needs
-* to wait for enough pr_credit from hardware by polling.
+* has enough pr_credit (> 1), pr_credit reduces one for every pr data
+* width write to PR_DATA register. If pr_credit <= 1, driver needs to
+* wait for enough pr_credit from hardware by polling.
 */
pr_status = readq(fme_pr + FME_PR_STS);
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
@@ -192,7 +231,8 @@ static int fme_mgr_write(struct fpga_manager *mgr,
while (pr_credit <= 1) {
if (delay++ > PR_WAIT_TIMEOUT) {
dev_err(dev, "PR_CREDIT timeout\n");
-   return -ETIMEDOUT;
+   ret = -ETIMEDOUT;
+   goto done;
}
udelay(1);
 
@@ -200,21 +240,27 @@ static int fme_mgr_write(struct fpga_manager *mgr,
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr

[PATCH 13/15] fpga: dfl: afu: add STP (SignalTap) support

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

STP (SignalTap) is one of the private features under the port for
debugging. This patch adds private feature driver support for it
to allow userspace applications to mmap related mmio region and
provide STP service.

Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Moritz Fischer 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-afu-main.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index bcf6e285a854..8241aced2d5d 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -513,6 +513,36 @@ static const struct dfl_feature_ops port_afu_ops = {
.uinit = port_afu_uinit,
 };
 
+static int port_stp_init(struct platform_device *pdev,
+struct dfl_feature *feature)
+{
+   struct resource *res = &pdev->resource[feature->resource_index];
+
+   dev_dbg(&pdev->dev, "PORT STP Init.\n");
+
+   return afu_mmio_region_add(dev_get_platdata(&pdev->dev),
+  DFL_PORT_REGION_INDEX_STP,
+  resource_size(res), res->start,
+  DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ |
+  DFL_PORT_REGION_WRITE);
+}
+
+static void port_stp_uinit(struct platform_device *pdev,
+  struct dfl_feature *feature)
+{
+   dev_dbg(&pdev->dev, "PORT STP UInit.\n");
+}
+
+static const struct dfl_feature_id port_stp_id_table[] = {
+   {.id = PORT_FEATURE_ID_STP,},
+   {0,}
+};
+
+static const struct dfl_feature_ops port_stp_ops = {
+   .init = port_stp_init,
+   .uinit = port_stp_uinit,
+};
+
 static struct dfl_feature_driver port_feature_drvs[] = {
{
.id_table = port_hdr_id_table,
@@ -526,6 +556,10 @@ static struct dfl_feature_driver port_feature_drvs[] = {
.id_table = port_err_id_table,
.ops = &port_err_ops,
},
+   {
+   .id_table = port_stp_id_table,
+   .ops = &port_stp_ops,
+   },
{
.ops = NULL,
}
-- 
2.22.0



[PATCH 09/15] fpga: dfl: afu: add userclock sysfs interfaces.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch introduces userclock sysfs interfaces for AFU, user
could use these interfaces for clock setting to AFU.

Please note that, this is only working for port header feature
with revision 0, for later revisions, userclock setting is moved
to a separated private feature, so one revision sysfs interface
is exposed to userspace application for this purpose too.

Signed-off-by: Ananda Ravuri 
Signed-off-by: Russ Weight 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 .../ABI/testing/sysfs-platform-dfl-port   |  35 ++
 drivers/fpga/dfl-afu-main.c   | 113 +-
 drivers/fpga/dfl.h|   4 +
 3 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port 
b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 17b37d110618..04ea7f2971c7 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -44,3 +44,38 @@ Contact: Wu Hao 
 Description:   Read-write. Read and set AFU latency tolerance reporting value.
Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
to 0 if it is latency sensitive.
+
+What:  /sys/bus/platform/devices/dfl-port.0/revision
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the revision of port header
+   feature.
+
+What:  /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Write-only. User writes command to this interface to set
+   userclock to AFU.
+
+What:  /sys/bus/platform/devices/dfl-port.0/userclk_freqsts
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the status of issued command
+   to userclck_freqcmd.
+
+What:  /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Write-only. User writes command to this interface to set
+   userclock counter.
+
+What:  /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the status of issued command
+   to userclck_freqcntrcmd.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 040ed8ad16e5..8b434a405498 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -143,6 +143,17 @@ id_show(struct device *dev, struct device_attribute *attr, 
char *buf)
 }
 static DEVICE_ATTR_RO(id);
 
+static ssize_t
+revision_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   void __iomem *base;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   return sprintf(buf, "%x\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
 static ssize_t
 ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
@@ -278,6 +289,7 @@ static DEVICE_ATTR_RO(power_state);
 
 static const struct attribute *port_hdr_attrs[] = {
&dev_attr_id.attr,
+   &dev_attr_revision.attr,
&dev_attr_ltr.attr,
&dev_attr_ap1_event.attr,
&dev_attr_ap2_event.attr,
@@ -285,14 +297,112 @@ static const struct attribute *port_hdr_attrs[] = {
NULL,
 };
 
+static ssize_t
+userclk_freqcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   u64 userclk_freq_cmd;
+   void __iomem *base;
+
+   if (kstrtou64(buf, 0, &userclk_freq_cmd))
+   return -EINVAL;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   mutex_lock(&pdata->lock);
+   writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0);
+   mutex_unlock(&pdata->lock);
+
+   return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcmd);
+
+static ssize_t
+userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+   u64 userclk_freqcntr_cmd;
+   void __iomem *base;
+
+   if (kstrtou64(buf, 0, &userclk_freqcntr_cmd))
+   return -EINVAL;
+
+   base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+   mutex_lock(&pdata->lock);
+   writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1);
+   mutex_unlock(&pdata->lock);
+
+   return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcntrcmd);
+
+static ssize_t
+userclk_freqsts_show(struct device *dev, struct device

[PATCH 15/15] fpga: dfl: fme: add global error reporting support

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

This patch adds support for global error reporting for FPGA
Management Engine (FME), it introduces sysfs interfaces to
report different error detected by the hardware, and allow
user to clear errors or inject error for testing purpose.

Signed-off-by: Luwei Kang 
Signed-off-by: Ananda Ravuri 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Signed-off-by: Moritz Fischer 
---
 .../ABI/testing/sysfs-platform-dfl-fme|  75 
 drivers/fpga/Makefile |   2 +-
 drivers/fpga/dfl-fme-error.c  | 385 ++
 drivers/fpga/dfl-fme-main.c   |   4 +
 drivers/fpga/dfl-fme.h|   2 +
 drivers/fpga/dfl.h|   2 +
 6 files changed, 469 insertions(+), 1 deletion(-)
 create mode 100644 drivers/fpga/dfl-fme-error.c

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme 
b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 99cd3b2acff5..86eef83938b2 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -44,3 +44,78 @@ Description: Read-only. It returns socket_id to indicate 
which socket
this FPGA belongs to, only valid for integrated solution.
User only needs this information, in case standard numa node
can't provide correct information.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/revision
+Date:  June 2019
+KernelVersion: 5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the revision of this global
+   error reporting private feature.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-Write. Read this file for errors detected on pcie0 link.
+   Write this file to clear errors logged in pcie0_errors. Write
+   fails with -EINVAL if input parsing fails or input error code
+   doesn't match.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-Write. Read this file for errors detected on pcie1 link.
+   Write this file to clear errors logged in pcie1_errors. Write
+   fails with -EINVAL if input parsing fails or input error code
+   doesn't match.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. It returns non-fatal errors detected.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. It returns catastrophic and fatal errors detected.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/inject_error
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-Write. Read this file to check errors injected. Write this
+   file to inject errors for testing purpose. Write fails with
+   -EINVAL if input parsing fails or input inject error code isn't
+   supported.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get errors detected by hardware.
+
+What:  
/sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the first error detected by
+   hardware.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Read-only. Read this file to get the second error detected by
+   hardware.
+
+What:  /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear
+Date:  June 2019
+KernelVersion:  5.3
+Contact:   Wu Hao 
+Description:   Write-only. Write error code to this file to clear all errors
+   logged in errors, first_error and next_error. Write fails with
+   -EINVAL if input parsing fails or input error code doesn't
+   match.
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 72558914a29c..4865b74b00a4 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -39,7 +39,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o
 obj-$(CONFIG_FPGA_DFL_FME_REGION)  += dfl-fme-region.o
 obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
 
-dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
+dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
 dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o

[PATCH 06/15] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

In order to support virtualization usage via PCIe SRIOV, this patch
adds two ioctls under FPGA Management Engine (FME) to release and
assign back the port device. In order to safely turn Port from PF
into VF and enable PCIe SRIOV, it requires user to invoke this
PORT_RELEASE ioctl to release port firstly to remove userspace
interfaces, and then configure the PF/VF access register in FME.
After disable SRIOV, it requires user to invoke this PORT_ASSIGN
ioctl to attach the port back to PF.

 Ioctl interfaces:
 * DFL_FPGA_FME_PORT_RELEASE
   Release platform device of given port, it deletes port platform
   device to remove related userspace interfaces on PF, then
   configures PF/VF access mode to VF.

 * DFL_FPGA_FME_PORT_ASSIGN
   Assign platform device of given port back to PF, it configures
   PF/VF access mode to PF, then adds port platform device back to
   re-enable related userspace interfaces on PF.

Signed-off-by: Zhang Yi Z 
Signed-off-by: Xu Yilun 
Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-main.c   |  54 +
 drivers/fpga/dfl.c| 107 ++
 drivers/fpga/dfl.h|  10 
 include/uapi/linux/fpga-dfl.h |  32 ++
 4 files changed, 191 insertions(+), 12 deletions(-)

diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 076d74f6416d..8b2a33760483 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "dfl.h"
@@ -105,9 +106,62 @@ static void fme_hdr_uinit(struct platform_device *pdev,
sysfs_remove_files(&pdev->dev.kobj, fme_hdr_attrs);
 }
 
+static long fme_hdr_ioctl_release_port(struct dfl_feature_platform_data *pdata,
+  void __user *arg)
+{
+   struct dfl_fpga_cdev *cdev = pdata->dfl_cdev;
+   struct dfl_fpga_fme_port_release release;
+   unsigned long minsz;
+
+   minsz = offsetofend(struct dfl_fpga_fme_port_release, port_id);
+
+   if (copy_from_user(&release, arg, minsz))
+   return -EFAULT;
+
+   if (release.argsz < minsz || release.flags)
+   return -EINVAL;
+
+   return dfl_fpga_cdev_config_port(cdev, release.port_id, true);
+}
+
+static long fme_hdr_ioctl_assign_port(struct dfl_feature_platform_data *pdata,
+ void __user *arg)
+{
+   struct dfl_fpga_cdev *cdev = pdata->dfl_cdev;
+   struct dfl_fpga_fme_port_assign assign;
+   unsigned long minsz;
+
+   minsz = offsetofend(struct dfl_fpga_fme_port_assign, port_id);
+
+   if (copy_from_user(&assign, arg, minsz))
+   return -EFAULT;
+
+   if (assign.argsz < minsz || assign.flags)
+   return -EINVAL;
+
+   return dfl_fpga_cdev_config_port(cdev, assign.port_id, false);
+}
+
+static long fme_hdr_ioctl(struct platform_device *pdev,
+ struct dfl_feature *feature,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+
+   switch (cmd) {
+   case DFL_FPGA_FME_PORT_RELEASE:
+   return fme_hdr_ioctl_release_port(pdata, (void __user *)arg);
+   case DFL_FPGA_FME_PORT_ASSIGN:
+   return fme_hdr_ioctl_assign_port(pdata, (void __user *)arg);
+   }
+
+   return -ENODEV;
+}
+
 static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
+   .ioctl = fme_hdr_ioctl,
 };
 
 static struct dfl_feature_driver fme_feature_drvs[] = {
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 4b66aaa32b5a..308c80868af4 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -231,16 +231,20 @@ EXPORT_SYMBOL_GPL(dfl_fpga_port_ops_del);
  */
 int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id)
 {
-   struct dfl_fpga_port_ops *port_ops = dfl_fpga_port_ops_get(pdev);
-   int port_id;
+   struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+   struct dfl_fpga_port_ops *port_ops;
+
+   if (pdata->id != FEATURE_DEV_ID_UNUSED)
+   return pdata->id == *(int *)pport_id;
 
+   port_ops = dfl_fpga_port_ops_get(pdev);
if (!port_ops || !port_ops->get_id)
return 0;
 
-   port_id = port_ops->get_id(pdev);
+   pdata->id = port_ops->get_id(pdev);
dfl_fpga_port_ops_put(port_ops);
 
-   return port_id == *(int *)pport_id;
+   return pdata->id == *(int *)pport_id;
 }
 EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id);
 
@@ -474,6 +478,7 @@ static int build_info_commit_dev(struct 
build_feature_devs_info *binfo)
pdata->dev = fdev;
pdata->num = binfo->feature_num;
pdata->dfl_cdev = binfo->cdev;
+   pdata->id = FEATURE_DEV_ID_UNUSED;
mutex_init(&pdata->lock);

[PATCH 01/15] fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.

2019-06-27 Thread Moritz Fischer
From: Wu Hao 

FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
but high 64 bits and low 64 bits of the compat_id are swapped by
mistake. This patch fixes this problem by fixing register address.

Signed-off-by: Wu Hao 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-mgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index 76f37709dd1a..b3f7eee3c93f 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -30,8 +30,8 @@
 #define FME_PR_STS 0x10
 #define FME_PR_DATA0x18
 #define FME_PR_ERR 0x20
-#define FME_PR_INTFC_ID_H  0xA8
-#define FME_PR_INTFC_ID_L  0xB0
+#define FME_PR_INTFC_ID_L  0xA8
+#define FME_PR_INTFC_ID_H  0xB0
 
 /* FME PR Control Register Bitfield */
 #define FME_PR_CTRL_PR_RST BIT_ULL(0)  /* Reset PR engine */
-- 
2.22.0



[PATCH] kbuild: get rid of misleading $(AS) from documents

2019-06-27 Thread Masahiro Yamada
The assembler files in the kernel are *.S instead of *.s, so they must
be preprocessed. Hence, we always use $(CC) as an assembler driver.

$(AS) is almost unused in Kbuild. As of writing, there is just one user.

  $ git grep '$(AS)' -- :^Documentation
  drivers/net/wan/Makefile:  AS68K = $(AS)

The documentation about *_AFLAGS* sounds like the flags were passed
to $(AS). This is somewhat misleading since we do not invoke $(AS)
directly.

Signed-off-by: Masahiro Yamada 
---

 Documentation/kbuild/kbuild.txt|  5 ++---
 Documentation/kbuild/makefiles.txt | 12 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
index 9c230ea71963..7a7e2aa2fab5 100644
--- a/Documentation/kbuild/kbuild.txt
+++ b/Documentation/kbuild/kbuild.txt
@@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and 
modules).
 
 AFLAGS_MODULE
 --
-Additional module specific options to use for $(AS).
+Additional module specific options to use for assembler.
 
 AFLAGS_KERNEL
 --
-Additional options for $(AS) when used for assembler
-code for code that is compiled as built-in.
+Additional options when used for assembling code that is compiled as built-in.
 
 KCFLAGS
 --
diff --git a/Documentation/kbuild/makefiles.txt 
b/Documentation/kbuild/makefiles.txt
index d65ad5746f94..f0b3a30b985d 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -306,7 +306,7 @@ more details, with real examples.
variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
entire tree.
 
-   asflags-y specifies options for assembling with $(AS).
+   asflags-y specifies options for assembling.
 
Example:
#arch/sparc/kernel/Makefile
@@ -441,7 +441,7 @@ more details, with real examples.
as-instr checks if the assembler reports a specific instruction
and then outputs either option1 or option2
C escapes are supported in the test instruction
-   Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options
+   Note: as-instr-option uses KBUILD_AFLAGS for assembler options
 
 cc-option
cc-option is used to check if $(CC) supports a given option, and if
@@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed 
(roughly):
In this example, the binary $(obj)/image is a binary version of
vmlinux. The usage of $(call if_changed,xxx) will be described later.
 
-KBUILD_AFLAGS  $(AS) assembler flags
+KBUILD_AFLAGS  assembler flags
 
Default value - see top level Makefile
Append or modify as required per architecture.
@@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed 
(roughly):
The first example utilises the trick that a config option expands
to 'y' when selected.
 
-KBUILD_AFLAGS_KERNEL   $(AS) options specific for built-in
+KBUILD_AFLAGS_KERNEL   assembler options specific for built-in
 
$(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile
resident kernel code.
 
-KBUILD_AFLAGS_MODULE   Options for $(AS) when building modules
+KBUILD_AFLAGS_MODULE   Options for assembler when building modules
 
$(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
-   are used for $(AS).
+   are used for assembler.
From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
 
 KBUILD_CFLAGS_KERNEL   $(CC) options specific for built-in
-- 
2.17.1



Re: [tip:timers/core] hrtimer: Use a bullet for the returns bullet list

2019-06-27 Thread Joe Perches
On Thu, 2019-06-27 at 21:39 -0300, Mauro Carvalho Chehab wrote:
> Em Thu, 27 Jun 2019 15:08:59 -0700
> Joe Perches  escreveu:
[]
> > > hrtimer: Use a bullet for the returns bullet list
> > > 
> > > That gets rid of this warning:
> > > 
> > >./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a 
> > > blank line; unexpected unindent.  
> > 
> > Doesn't this form occur multiple dozens of times in
> > kernel sources?
> > 
> > For instance:
> > 
> > $ git grep -B3 -A5 -P "^ \* Returns:?$" | \
> >   grep -P -A8 '\-\s+\*\s*@\w+:'
> 
> Yes, this is a common pattern, but not all patterns that match the above
> regex are broken.
> 
> > I think the warning is odd at best and docutils might
> > be updated or the warning ignored or suppressed.
> > 
> > > and displays nicely both at the source code and at the produced
> > > documentation.  
> 
> The warnings are painful - and they're the main reason why I wrote this
> change: - I wanted to avoid new warnings actually unrelated to my
> changes that were sometimes appearing while doing incremental
> "make htmldocs" on a big patchset that I've been rebasing almost every
> week over the last two months.
> 
> -
> 
> Yet, did you try to look how this pattern will appear at the html and pdf
> output?

No I did not.

I just would like to avoid changing perfectly intelligible
kernel-doc content into something less directly readable for
the sake of external output.

I don't use the externally generated formatted output docs.
I read and use the source when necessary.

Automatic creation of bulleted blocks from relatively
unformatted content is a hard problem.

I appreciate the work Mauro, I just would like to minimize
the necessary changes if possible.

The grep I did was trivial, I'm sure there are better tools
to isolate the kernel-doc bits where the Return: block
is emitted.


>  Something like this:
> 
>   sound/soc/codecs/wm8960.c: * Returns:
>   sound/soc/codecs/wm8960.c- *  -1, in case no sysclk frequency available 
> found
>   sound/soc/codecs/wm8960.c- * >=0, in case we could derive bclk and 
> lrclk from sysclk using
>   sound/soc/codecs/wm8960.c- *  (@sysclk_idx, @dac_idx, @bclk_idx) 
> dividers
> 
> 
> Will be displayed as:
> 
>   **Returns:**
> -1, in case no sysclk frequency available found **>=0, in case we 
> could derive bclk and lrclk from sysclk using** (@sysclk_idx, @dac_idx, 
> @bclk_idx) dividers
> (where **foo**) means that "foo" will be printed in bold.> 

That's a yuck from me.

> While it would likely be possible to improve kernel-doc to present better
> results, I'm afraid that it would be too complex for simple regex
> expressions, and hard to tune, as it would be a hint-based approach,
> and doing a natural language processing would be too much effort.

Yeah, tough problem.  I don't envy it.

cheers and g'luck...



Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver

2019-06-27 Thread Ganapatrao Kulkarni
Hi Will,

On Thu, Jun 27, 2019 at 9:52 PM Will Deacon  wrote:
>
> On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote:
> > On Fri, Jun 14, 2019 at 05:42:45PM +, Ganapatrao Kulkarni wrote:
> > > From: Ganapatrao Kulkarni 
> > >
> > > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni 
> > > ---
> > >  Documentation/perf/thunderx2-pmu.txt | 20 +++-
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/Documentation/perf/thunderx2-pmu.txt 
> > > b/Documentation/perf/thunderx2-pmu.txt
> > > index dffc57143736..62243230abc3 100644
> > > --- a/Documentation/perf/thunderx2-pmu.txt
> > > +++ b/Documentation/perf/thunderx2-pmu.txt
> > > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU 
> > > UNCORE)
> > >  =
> > >
> > >  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> > > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> > > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> > > +Cavium Coherent Processor Interconnect (CCPI2).
> > >
> > >  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
> > >  Events are counted for the default channel (i.e. channel 0) and prorated
> > >  to the total number of channels/tiles.
> > >
> > > -The DMC and L3C support up to 4 counters. Counters are independently
> > > -programmable and can be started and stopped individually. Each counter
> > > -can be set to a different event. Counters are 32-bit and do not support
> > > -an overflow interrupt; they are read every 2 seconds.
> > > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.
> >
> > The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
> > counters.
> >
> > > +Counters are independently programmable and can be started and stopped
> > > +individually. Each counter can be set to a different event. DMC and L3C
> > > +Counters are 32-bit and do not support an overflow interrupt; they are 
> > > read
> >
> > Counters -> counters
> >
> > > +every 2 seconds. CCPI2 counters are 64-bit.
> >
> > Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
> > two sentences as:
> >
> >   None of the counters support an overflow interrupt and therefore sampling
> >   events are unsupported. The DMC and L3C counters are 32-bit and read every
> >   2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
> >   normal operation.
>

Thanks for the comments, will update in v2.
Yes, CCPI2 is 64bit counter and there is no overflow issue.

> Mark reminded me that these are system PMUs and therefore sampling is
> unsupported irrespective of the presence of an overflow interrupt, so you
> can drop that part from the text.

sure.
>
> Sorry for the confusion,
>
> Will

Thanks,
Ganapat


Re: [PATCH v3 0/4] Compile-test UAPI and kernel headers

2019-06-27 Thread Masahiro Yamada
On Fri, Jun 28, 2019 at 1:41 AM Masahiro Yamada
 wrote:
>
> 1/4: Compile-test exported headers (reworked in v2)
>
> 2/4: fix a flaw I noticed when I was working on this series.
>  Avoid generating intermediate wrappers.
>
> 3/4: maybe useful for 4/4 and in some other places.
>  Add header-test-pattern-y syntax.
>
> 4/4: Compile-test kernel-space headers in include/.
>  v2: compile as many headers as possible.
>  v3: exclude more headers causing build errors


I push this series to
 git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
 header-test-v3
for somebody who wants to test it.



>
> Masahiro Yamada (4):
>   kbuild: compile-test UAPI headers to ensure they are self-contained
>   kbuild: do not create wrappers for header-test-y
>   kbuild: support header-test-pattern-y
>   kbuild: compile-test kernel headers to ensure they are self-contained
>
>  .gitignore |1 -
>  Documentation/dontdiff |1 -
>  Documentation/kbuild/makefiles.txt |   13 +-
>  Makefile   |4 +-
>  include/Kbuild | 1250 
>  init/Kconfig   |   22 +
>  scripts/Makefile.build |   10 +-
>  scripts/Makefile.lib   |   13 +-
>  scripts/cc-system-headers.sh   |8 +
>  usr/.gitignore |1 -
>  usr/Makefile   |2 +
>  usr/include/.gitignore |3 +
>  usr/include/Makefile   |  134 +++
>  13 files changed, 1449 insertions(+), 13 deletions(-)
>  create mode 100644 include/Kbuild
>  create mode 100755 scripts/cc-system-headers.sh
>  create mode 100644 usr/include/.gitignore
>  create mode 100644 usr/include/Makefile
>
> --
> 2.17.1
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: get rid of misleading $(AS) from documents

2019-06-27 Thread Nathan Chancellor
On Fri, Jun 28, 2019 at 11:04:33AM +0900, Masahiro Yamada wrote:
> The assembler files in the kernel are *.S instead of *.s, so they must
> be preprocessed. Hence, we always use $(CC) as an assembler driver.
> 
> $(AS) is almost unused in Kbuild. As of writing, there is just one user.
> 
>   $ git grep '$(AS)' -- :^Documentation
>   drivers/net/wan/Makefile:  AS68K = $(AS)
> 
> The documentation about *_AFLAGS* sounds like the flags were passed
> to $(AS). This is somewhat misleading since we do not invoke $(AS)
> directly.
> 
> Signed-off-by: Masahiro Yamada 

I did notice this when I grepped for $(AS) in the tree. Certainly makes
sense and looks good to me.

Reviewed-by: Nathan Chancellor 

> ---
> 
>  Documentation/kbuild/kbuild.txt|  5 ++---
>  Documentation/kbuild/makefiles.txt | 12 ++--
>  2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
> index 9c230ea71963..7a7e2aa2fab5 100644
> --- a/Documentation/kbuild/kbuild.txt
> +++ b/Documentation/kbuild/kbuild.txt
> @@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and 
> modules).
>  
>  AFLAGS_MODULE
>  --
> -Additional module specific options to use for $(AS).
> +Additional module specific options to use for assembler.
>  
>  AFLAGS_KERNEL
>  --
> -Additional options for $(AS) when used for assembler
> -code for code that is compiled as built-in.
> +Additional options when used for assembling code that is compiled as 
> built-in.
>  
>  KCFLAGS
>  --
> diff --git a/Documentation/kbuild/makefiles.txt 
> b/Documentation/kbuild/makefiles.txt
> index d65ad5746f94..f0b3a30b985d 100644
> --- a/Documentation/kbuild/makefiles.txt
> +++ b/Documentation/kbuild/makefiles.txt
> @@ -306,7 +306,7 @@ more details, with real examples.
>   variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
>   entire tree.
>  
> - asflags-y specifies options for assembling with $(AS).
> + asflags-y specifies options for assembling.
>  
>   Example:
>   #arch/sparc/kernel/Makefile
> @@ -441,7 +441,7 @@ more details, with real examples.
>   as-instr checks if the assembler reports a specific instruction
>   and then outputs either option1 or option2
>   C escapes are supported in the test instruction
> - Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options
> + Note: as-instr-option uses KBUILD_AFLAGS for assembler options
>  
>  cc-option
>   cc-option is used to check if $(CC) supports a given option, and if
> @@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed 
> (roughly):
>   In this example, the binary $(obj)/image is a binary version of
>   vmlinux. The usage of $(call if_changed,xxx) will be described later.
>  
> -KBUILD_AFLAGS$(AS) assembler flags
> +KBUILD_AFLAGSassembler flags
>  
>   Default value - see top level Makefile
>   Append or modify as required per architecture.
> @@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed 
> (roughly):
>   The first example utilises the trick that a config option expands
>   to 'y' when selected.
>  
> -KBUILD_AFLAGS_KERNEL $(AS) options specific for built-in
> +KBUILD_AFLAGS_KERNEL assembler options specific for built-in
>  
>   $(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile
>   resident kernel code.
>  
> -KBUILD_AFLAGS_MODULE   Options for $(AS) when building modules
> +KBUILD_AFLAGS_MODULE   Options for assembler when building modules
>  
>   $(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
> - are used for $(AS).
> + are used for assembler.
>   From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
>  
>  KBUILD_CFLAGS_KERNEL $(CC) options specific for built-in
> -- 
> 2.17.1
> 


Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.

2019-06-27 Thread Ganapatrao Kulkarni
Hi will,

On Thu, Jun 27, 2019 at 3:27 PM Will Deacon  wrote:
>
> Hi Ganapat,
>
> On Fri, Jun 14, 2019 at 05:42:46PM +, Ganapatrao Kulkarni wrote:
> > CCPI2 is a low-latency high-bandwidth serial interface for connecting
> > ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> >
> > Signed-off-by: Ganapatrao Kulkarni 
> > ---
> >  drivers/perf/thunderx2_pmu.c | 179 ++-
> >  1 file changed, 157 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> > index 43d76c85da56..3791ac9b897d 100644
> > --- a/drivers/perf/thunderx2_pmu.c
> > +++ b/drivers/perf/thunderx2_pmu.c
> > @@ -16,7 +16,9 @@
> >   * they need to be sampled before overflow(i.e, at every 2 seconds).
> >   */
> >
> > -#define TX2_PMU_MAX_COUNTERS 4
> > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4
>
> I find it odd that you rename this...

i am not sure, how to avoid this since dmc/l3c have 4 counters and ccpi2 has 8.
i will try to make this better in v2.
>
> > +#define TX2_PMU_CCPI2_MAX_COUNTERS   8
> > +
> >  #define TX2_PMU_DMC_CHANNELS 8
> >  #define TX2_PMU_L3_TILES 16
> >
> > @@ -28,11 +30,22 @@
> >*/
> >  #define DMC_EVENT_CFG(idx, val)  ((val) << (((idx) * 8) + 1))
> >
> > +#define GET_EVENTID_CCPI2(ev)((ev->hw.config) & 0x1ff)
> > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> > +#define GET_COUNTERID_CCPI2(ev)  ((ev->hw.idx) & 0x7)
> > +#define CCPI2_COUNTER_OFFSET 8
>
>
> ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
> events. Saying that, if you have the event you can figure out its type,
> so could you avoid the need for additional macros entirely and just use
> the correct masks based on the corresponding PMU type? That might also
> avoid some of the conditional control flow you're introducing elsewhere.

sure, i will add mask as argument to macro.
>
> >  #define L3C_COUNTER_CTL  0xA8
> >  #define L3C_COUNTER_DATA 0xAC
> >  #define DMC_COUNTER_CTL  0x234
> >  #define DMC_COUNTER_DATA 0x240
> >
> > +#define CCPI2_PERF_CTL   0x108
> > +#define CCPI2_COUNTER_CTL0x10C
> > +#define CCPI2_COUNTER_SEL0x12c
> > +#define CCPI2_COUNTER_DATA_L 0x130
> > +#define CCPI2_COUNTER_DATA_H 0x134
> > +
> >  /* L3C event IDs */
> >  #define L3_EVENT_READ_REQ0xD
> >  #define L3_EVENT_WRITEBACK_REQ   0xE
> > @@ -51,9 +64,16 @@
> >  #define DMC_EVENT_READ_TXNS  0xF
> >  #define DMC_EVENT_MAX0x10
> >
> > +#define CCPI2_EVENT_REQ_PKT_SENT 0x3D
> > +#define CCPI2_EVENT_SNOOP_PKT_SENT   0x65
> > +#define CCPI2_EVENT_DATA_PKT_SENT0x105
> > +#define CCPI2_EVENT_GIC_PKT_SENT 0x12D
> > +#define CCPI2_EVENT_MAX  0x200
> > +
> >  enum tx2_uncore_type {
> >   PMU_TYPE_L3C,
> >   PMU_TYPE_DMC,
> > + PMU_TYPE_CCPI2,
> >   PMU_TYPE_INVALID,
> >  };
> >
> > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
> >   u32 max_events;
> >   u64 hrtimer_interval;
> >   void __iomem *base;
> > - DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> > - struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> > + DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> > + struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];
>
> Hmm, that looks very odd. Why are they now different sizes?

events[ ] is used to hold reference to active events to use in timer
callback, which is not applicable to ccpi2, hence 4.
active_counters is set to max of both. i.e, 8. i will try to make it
better readable in v2.

>
> >   struct device *dev;
> >   struct hrtimer hrtimer;
> >   const struct attribute_group **attr_groups;
> > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu 
> > *pmu_to_tx2_pmu(struct pmu *pmu)
> >   return container_of(pmu, struct tx2_uncore_pmu, pmu);
> >  }
> >
> > -PMU_FORMAT_ATTR(event,   "config:0-4");
> > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)\
> > +static ssize_t 
> >   \
> > +__tx2_pmu_##_var##_show(struct device *dev,  \
> > +struct device_attribute *attr,   \
> > +char *page)  \
> > +{\
> > + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
> > + return sprintf(page, _format "\n"); \
> > +}\
> > + \
> > +static struct device_attribute format_attr_##_var =  \
> > + __ATTR(_name

Re: [PATCH v7 1/2] fTPM: firmware TPM running in TEE

2019-06-27 Thread Sumit Garg
Hi Jarkko and Sasha,

On Thu, 27 Jun 2019 at 18:47, Jarkko Sakkinen
 wrote:
>
> On Wed, 2019-06-26 at 19:56 -0400, Sasha Levin wrote:
> > > You've used so much on this so shouldn't this have that somewhat new
> > > co-developed-by tag? I'm also wondering can this work at all
> >
> > Honestly, I've just been massaging this patch more than "authoring" it.
> > If you feel strongly about it feel free to add a Co-authored patch with
> > my name, but in my mind this is just Thiru's work.
>
> This is just my subjective view but writing code is easier than making
> it work in the mainline in 99% of cases. If this patch was doing
> something revolutional, lets say a new outstanding scheduling algorithm,
> then I would think otherwise. It is not. You without question deserve
> both credit and also the blame (if this breaks everything) :-)
>
> > > process-wise if the original author of the patch is also the only tester
> > > of the patch?
> >
> > There's not much we can do about this... Linaro folks have tested this
> > without the fTPM firmware, so at the very least it won't explode for
> > everyone. If for some reason non-microsoft folks see issues then we can
> > submit patches on top to fix this, we're not just throwing this at you
> > and running away.
>
> So why any of those Linaro folks can't do it? I can add after tested-by
> tag parentheses something explaining that context of testing. It is
> reasonable given the circumstances.

Simply because the hardware I have (Developerbox) doesn't provide
enough flash space (as per current memory map) for this fTPM driver to
be loaded as early TA along with OP-TEE binary. So I can't get any
further point than sanity probe failure check for which I think a
tested-by won't be appropriate.

-Sumit

>
> I can also give an explanation in my next PR along the lines what you
> are saying. This would definitely work for me.
>
> /Jarkko
>


Re: [PATCH] kbuild: get rid of misleading $(AS) from documents

2019-06-27 Thread Masahiro Yamada
On Fri, Jun 28, 2019 at 11:06 AM Masahiro Yamada
 wrote:
>
> The assembler files in the kernel are *.S instead of *.s, so they must
> be preprocessed. Hence, we always use $(CC) as an assembler driver.
>
> $(AS) is almost unused in Kbuild. As of writing, there is just one user.
>
>   $ git grep '$(AS)' -- :^Documentation
>   drivers/net/wan/Makefile:  AS68K = $(AS)


In Makefile, a variable is also referenced in a curly brace form like ${AS}.
In shell scripts, it can be referenced in the form of $AR.

I grepped more patterns.

$ git grep -e '$(AS)' -e '${AS}' -e '$AS' -e '$(AS:' -e '${AS:' --
:^Documentation
drivers/net/wan/Makefile:  AS68K = $(AS)

I found only user still.



> The documentation about *_AFLAGS* sounds like the flags were passed
> to $(AS). This is somewhat misleading since we do not invoke $(AS)
> directly.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  Documentation/kbuild/kbuild.txt|  5 ++---
>  Documentation/kbuild/makefiles.txt | 12 ++--
>  2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
> index 9c230ea71963..7a7e2aa2fab5 100644
> --- a/Documentation/kbuild/kbuild.txt
> +++ b/Documentation/kbuild/kbuild.txt
> @@ -31,12 +31,11 @@ Additional options to the assembler (for built-in and 
> modules).
>
>  AFLAGS_MODULE
>  --
> -Additional module specific options to use for $(AS).
> +Additional module specific options to use for assembler.
>
>  AFLAGS_KERNEL
>  --
> -Additional options for $(AS) when used for assembler
> -code for code that is compiled as built-in.
> +Additional options when used for assembling code that is compiled as 
> built-in.
>
>  KCFLAGS
>  --
> diff --git a/Documentation/kbuild/makefiles.txt 
> b/Documentation/kbuild/makefiles.txt
> index d65ad5746f94..f0b3a30b985d 100644
> --- a/Documentation/kbuild/makefiles.txt
> +++ b/Documentation/kbuild/makefiles.txt
> @@ -306,7 +306,7 @@ more details, with real examples.
> variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
> entire tree.
>
> -   asflags-y specifies options for assembling with $(AS).
> +   asflags-y specifies options for assembling.
>
> Example:
> #arch/sparc/kernel/Makefile
> @@ -441,7 +441,7 @@ more details, with real examples.
> as-instr checks if the assembler reports a specific instruction
> and then outputs either option1 or option2
> C escapes are supported in the test instruction
> -   Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options
> +   Note: as-instr-option uses KBUILD_AFLAGS for assembler options
>
>  cc-option
> cc-option is used to check if $(CC) supports a given option, and if
> @@ -814,7 +814,7 @@ When kbuild executes, the following steps are followed 
> (roughly):
> In this example, the binary $(obj)/image is a binary version of
> vmlinux. The usage of $(call if_changed,xxx) will be described later.
>
> -KBUILD_AFLAGS  $(AS) assembler flags
> +KBUILD_AFLAGS  assembler flags
>
> Default value - see top level Makefile
> Append or modify as required per architecture.
> @@ -853,15 +853,15 @@ When kbuild executes, the following steps are followed 
> (roughly):
> The first example utilises the trick that a config option expands
> to 'y' when selected.
>
> -KBUILD_AFLAGS_KERNEL   $(AS) options specific for built-in
> +KBUILD_AFLAGS_KERNEL   assembler options specific for built-in
>
> $(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to 
> compile
> resident kernel code.
>
> -KBUILD_AFLAGS_MODULE   Options for $(AS) when building modules
> +KBUILD_AFLAGS_MODULE   Options for assembler when building modules
>
> $(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
> -   are used for $(AS).
> +   are used for assembler.
> From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
>
>  KBUILD_CFLAGS_KERNEL   $(CC) options specific for built-in
> --
> 2.17.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH] Doc : doc-guide : Fix a typo

2019-06-27 Thread Sheriff Esseson
fix the disjunction by replacing "of" with "or".

Signed-off-by: Sheriff Esseson 
---
 Documentation/doc-guide/kernel-doc.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst 
b/Documentation/doc-guide/kernel-doc.rst
index f96059767..192c36af3 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -359,7 +359,7 @@ Domain`_ references.
   ``monospaced font``.
 
   Useful if you need to use special characters that would otherwise have some
-  meaning either by kernel-doc script of by reStructuredText.
+  meaning either by kernel-doc script or by reStructuredText.
 
   This is particularly useful if you need to use things like ``%ph`` inside
   a function description.
-- 
2.22.0



[linux-kernel-mentees] [PATCH] Doc : doc-guide : Fix a typo

2019-06-27 Thread Sheriff Esseson
fix the disjunction by replacing "of" with "or".

Signed-off-by: Sheriff Esseson 
---
 Documentation/doc-guide/kernel-doc.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst 
b/Documentation/doc-guide/kernel-doc.rst
index f96059767..192c36af3 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -359,7 +359,7 @@ Domain`_ references.
   ``monospaced font``.
 
   Useful if you need to use special characters that would otherwise have some
-  meaning either by kernel-doc script of by reStructuredText.
+  meaning either by kernel-doc script or by reStructuredText.
 
   This is particularly useful if you need to use things like ``%ph`` inside
   a function description.
-- 
2.22.0



[linux-kernel-mentees] [PATCH v2] Doc : doc-guide : Fix a typo

2019-06-27 Thread Sheriff Esseson
fix the disjunction by replacing "of" with "or".

Signed-off-by: Sheriff Esseson 
---

changes in v2:
- cc-ed Corbet.

 Documentation/doc-guide/kernel-doc.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst 
b/Documentation/doc-guide/kernel-doc.rst
index f96059767..192c36af3 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -359,7 +359,7 @@ Domain`_ references.
   ``monospaced font``.
 
   Useful if you need to use special characters that would otherwise have some
-  meaning either by kernel-doc script of by reStructuredText.
+  meaning either by kernel-doc script or by reStructuredText.
 
   This is particularly useful if you need to use things like ``%ph`` inside
   a function description.
-- 
2.22.0



Re: [linux-kernel-mentees] [PATCH v2] Doc : doc-guide : Fix a typo

2019-06-27 Thread Sheriff Esseson
fix the disjunction by replacing "of" with "or".

Signed-off-by: Sheriff Esseson 
---

changes in v2:
- cc-ed Corbet.

 Documentation/doc-guide/kernel-doc.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst 
b/Documentation/doc-guide/kernel-doc.rst
index f96059767..192c36af3 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -359,7 +359,7 @@ Domain`_ references.
   ``monospaced font``.
 
   Useful if you need to use special characters that would otherwise have some
-  meaning either by kernel-doc script of by reStructuredText.
+  meaning either by kernel-doc script or by reStructuredText.
 
   This is particularly useful if you need to use things like ``%ph`` inside
   a function description.
-- 
2.22.0



Re: [PATCH next v2] softirq: enable MAX_SOFTIRQ_TIME tuning with sysctl, max_softirq_time_msecs

2019-06-27 Thread Zhiqiang Liu
Friendly ping ...

On 2019/6/25 11:13, Zhiqiang Liu wrote:
> From: Zhiqiang liu 
> 
> In __do_softirq func, MAX_SOFTIRQ_TIME was set to 2ms via experimentation by
> commit c10d73671 ("softirq: reduce latencies") in 2013, which was designed
> to reduce latencies for various network workloads. The key reason is that the
> maximum number of microseconds in one NAPI polling cycle in net_rx_action func
> was set to 2 jiffies, so different HZ settting will lead to different 
> latencies.
> 
> However, commit 7acf8a1e8 ("Replace 2 jiffies with sysctl netdev_budget_usecs
> to enable softirq tuning") adopts netdev_budget_usecs to tun maximum number of
> microseconds in one NAPI polling cycle. So the latencies of net_rx_action can 
> be
> controlled by sysadmins to copy with hardware changes over time.
> 
> Correspondingly, the MAX_SOFTIRQ_TIME should be able to be tunned by 
> sysadmins,
> who knows best about hardware performance, for excepted tradeoff between 
> latence
> and fairness. Here, we add sysctl variable max_softirq_time_msecs to replace
> MAX_SOFTIRQ_TIME with 2ms default value.
> 
> Note: max_softirq_time_msecs will be coverted to jiffies, and any budget
> value will be rounded up to the next jiffies, which relates to CONFIG_HZ.
> The time accuracy of jiffies will result in a certain difference
> between the setting jiffies of max_softirq_time_msecs and the actual
> value, which is in one jiffies range.
> 
> Signed-off-by: Zhiqiang liu 
> ---
>  Documentation/sysctl/kernel.txt | 17 +
>  kernel/softirq.c|  8 +---
>  kernel/sysctl.c |  9 +
>  3 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index f0c86fbb3b48..23b36393f150 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -44,6 +44,7 @@ show up in /proc/sys/kernel:
>  - kexec_load_disabled
>  - kptr_restrict
>  - l2cr[ PPC only ]
> +- max_softirq_time_msecs
>  - modprobe==> Documentation/debugging-modules.txt
>  - modules_disabled
>  - msg_next_id  [ sysv ipc ]
> @@ -445,6 +446,22 @@ This flag controls the L2 cache of G3 processor boards. 
> If
> 
>  ==
> 
> +max_softirq_time_msecs:
> +
> +Maximum number of milliseconds to break the loop of restarting softirq
> +processing for at most MAX_SOFTIRQ_RESTART times in __do_softirq().
> +max_softirq_time_msecs will be coverted to jiffies, and any budget
> +value will be rounded up to the next jiffies, which relates to CONFIG_HZ.
> +The time accuracy of jiffies will result in a certain difference
> +between the setting jiffies of max_softirq_time_msecs and the actual
> +value, which is in one jiffies range.
> +
> +max_softirq_time_msecs is a non-negative integer value, and setting
> +negative value is meaningless and will return error.
> +Default: 2
> +
> +==
> +
>  modules_disabled:
> 
>  A toggle value indicating if modules are allowed to be loaded
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index a6b81c6b6bff..1e456db70093 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -199,7 +199,8 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
> 
>  /*
>   * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> - * but break the loop if need_resched() is set or after 2 ms.
> + * but break the loop if need_resched() is set or after
> + * max_softirq_time_msecs msecs.
>   * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
>   * certain cases, such as stop_machine(), jiffies may cease to
>   * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> @@ -210,7 +211,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +unsigned int __read_mostly max_softirq_time_msecs = 2;
>  #define MAX_SOFTIRQ_RESTART 10
> 
>  #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -248,7 +249,8 @@ static inline void lockdep_softirq_end(bool in_hardirq) { 
> }
> 
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> - unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> + unsigned long end = jiffies +
> + msecs_to_jiffies(max_softirq_time_msecs);
>   unsigned long old_flags = current->flags;
>   int max_restart = MAX_SOFTIRQ_RESTART;
>   struct softirq_action *h;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 1beca96fb625..96ff292ce7f6 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -118,6 +118,7 @@ extern unsigned int sysctl_nr_open_min, 
> sysctl_nr_open_max;
>  #ifndef CONFIG_MMU
>  extern int sysctl_nr_trim_pages;
>  #endif
> +extern unsigned int max_softirq_time_msecs;
> 
>  /* Constants used for minimum and  maximu