Re: [PATCH v8 11/27] x86/mm: Introduce _PAGE_DIRTY_SW

2019-08-28 Thread Peter Zijlstra
On Tue, Aug 27, 2019 at 03:37:12PM -0700, Yu-cheng Yu wrote:
> On Fri, 2019-08-23 at 16:02 +0200, Peter Zijlstra wrote:
> > On Tue, Aug 13, 2019 at 01:52:09PM -0700, Yu-cheng Yu wrote:
> > 
> > > +static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t to)
> > > +{
> > > + if (pte_flags(pte) & from)
> > > + pte = pte_set_flags(pte_clear_flags(pte, from), to);
> > > + return pte;
> > > +}
> > 
> > Aside of the whole conditional thing (I agree it would be better to have
> > this unconditionally); the function doesn't really do as advertised.
> > 
> > That is, if @from is clear, it doesn't endeavour to make sure @to is
> > also clear.
> > 
> > Now it might be sufficient, but in that case it really needs a comment
> > and or different name.
> > 
> > An implementation that actually moves the bit is something like:
> > 
> > pteval_t a,b;
> > 
> > a = native_pte_value(pte);
> > b = (a >> from_bit) & 1;
> > a &= ~((1ULL << from_bit) | (1ULL << to_bit));
> > a |= b << to_bit;
> > return make_native_pte(a);
> 
> There can be places calling pte_wrprotect() on a PTE that is already RO +
> DIRTY_SW.  Then in pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW) we do 
> not
>  want to clear _PAGE_DIRTY_SW.  But, I will look into this and make it more
> obvious.

Well, then the name 'move' is just wrong, because that is not the
semantics you're looking for.

So the thing is; if you provide a generic function that 'munges' two
bits, then it's name had better be accurate. But AFAICT you only ever
used this for the DIRTY bits, so it might be better to have a function
specifically for that and with a comment that spells out the exact
semantics and reasons for them.


Re: [PATCH V15 1/5] dt-bindings: fsl: scu: add thermal binding

2019-08-28 Thread Zhang Rui
Hi, Anson,

We're missing ACK from the maintainers for patch 4/5 and 5/5, if we
want to shipped the patch via thermal tree.

For patch 2/5, as it introduces a new API for OF_THERMAL, I'd like to
get Eduardo' feedback before taking them.

thanks,
rui

On Wed, 2019-07-24 at 03:16 +, Anson Huang wrote:
> Ping...
> 
> > Hi, Daniel/Rui/Eduardo
> > Could you please take a look at this patch series?
> > 
> > Anson
> > 
> > > From: Anson Huang 
> > > 
> > > NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as
> > > system
> > > controller, the system controller is in charge of system power,
> > > clock
> > > and thermal sensors etc. management, Linux kernel has to
> > > communicate
> > > with system controller via MU (message unit) IPC to get
> > > temperature
> > > from thermal sensors, this patch adds binding doc for i.MX system
> > > controller thermal driver.
> > > 
> > > Signed-off-by: Anson Huang 
> > > Reviewed-by: Rob Herring 
> > > Reviewed-by: Dong Aisheng 
> > > ---
> > > No change.
> > > ---
> > >  .../devicetree/bindings/arm/freescale/fsl,scu.txt| 16
> > 
> > 
> > >  1 file changed, 16 insertions(+)
> > > 
> > > diff --git
> > > a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > > b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > > index a575e42..fc3844e 100644
> > > --- a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > > +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > > @@ -155,6 +155,17 @@ Required properties:
> > >  Optional properties:
> > >  - timeout-sec: contains the watchdog timeout in seconds.
> > > 
> > > +Thermal bindings based on SCU Message Protocol
> > > +
> > > +
> > > +Required properties:
> > > +- compatible:Should be :
> > > +   "fsl,imx8qxp-sc-thermal"
> > > + followed by "fsl,imx-sc-thermal";
> > > +
> > > +- #thermal-sensor-cells: See
> > > Documentation/devicetree/bindings/thermal/thermal.txt
> > > + for a description.
> > > +
> > >  Example (imx8qxp):
> > >  -
> > >  aliases {
> > > @@ -222,6 +233,11 @@ firmware {
> > >   compatible = "fsl,imx8qxp-sc-wdt", "fsl,imx-sc-
> > > wdt";
> > >   timeout-sec = <60>;
> > >   };
> > > +
> > > + tsens: thermal-sensor {
> > > + compatible = "fsl,imx8qxp-sc-thermal",
> > > "fsl,imx-sc-
> > > thermal";
> > > + #thermal-sensor-cells = <1>;
> > > + };
> > >   };
> > >  };
> > > 
> > > --
> > > 2.7.4
> 
> 



RE: [PATCH V15 1/5] dt-bindings: fsl: scu: add thermal binding

2019-08-28 Thread Anson Huang
Hi, Rui

> Subject: Re: [PATCH V15 1/5] dt-bindings: fsl: scu: add thermal binding
> 
> Hi, Anson,
> 
> We're missing ACK from the maintainers for patch 4/5 and 5/5, if we want to
> shipped the patch via thermal tree.

I think 4/5 and 5/5 can be taken by Shawn Guo once the driver part is taken, he
normally do it in this way, once driver and dt-binding are taken, I will notify 
him.

> 
> For patch 2/5, as it introduces a new API for OF_THERMAL, I'd like to get
> Eduardo' feedback before taking them.

OK, NOT sure when he can take a look at it, we are kind of pending on this for 
some
time, let's wait. If you have chance to talk to him, please help ask for help 
from him about
this patch series, thanks a lot!

thanks,
Anson




Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
On Tue 2019-08-27 15:30:30, Thomas Gleixner wrote:
> On Tue, 27 Aug 2019, Pavel Machek wrote:
> 
> > On Tue 2019-08-27 09:50:51, Greg Kroah-Hartman wrote:
> > > From: Tom Lendacky 
> > > 
> > > commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 upstream.
> > > 
> > > There have been reports of RDRAND issues after resuming from suspend on
> > > some AMD family 15h and family 16h systems. This issue stems from a BIOS
> > > not performing the proper steps during resume to ensure RDRAND continues
> > > to function properly.
> > 
> > Yes. And instead of reinitializing the RDRAND on resume, this patch
> > breaks support even for people with properly functioning BIOSes...
> 
> There is no way to reinitialize RDRAND from the kernel otherwise we would
> have exactly done that. If you know how to do that please tell.

Would they? AMD is not exactly doing good job with communication
here. If BIOS can do it, kernel can do it, too... or do you have
information saying otherwise?

> Also disabling it for every BIOS is the only way which can be done because
> there is no way to know whether the BIOS is fixed or not at cold boot
> time. And it has to be known there because applications cache the

I'm pretty sure DMI-based whitelist would help here. It should be
reasonably to fill it with the common machines at least.

Plus, where is the CVE, and does AMD do anything to make BIOS vendors
fix them?

Best regards,
Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Thomas Gleixner
Pavel,

On Wed, 28 Aug 2019, Pavel Machek wrote:
> On Tue 2019-08-27 15:30:30, Thomas Gleixner wrote:
> > There is no way to reinitialize RDRAND from the kernel otherwise we would
> > have exactly done that. If you know how to do that please tell.
> 
> Would they? AMD is not exactly doing good job with communication

Yes they would. Stop making up weird conspiracy theories.

> here. If BIOS can do it, kernel can do it, too...

May I recommend to read up on SMM and BIOS being able to lock down access
to certain facilities?

> or do you have information saying otherwise?

Yes. It was clearly stated by Tom that it can only be done in the BIOS.

> > Also disabling it for every BIOS is the only way which can be done because
> > there is no way to know whether the BIOS is fixed or not at cold boot
> > time. And it has to be known there because applications cache the
> 
> I'm pretty sure DMI-based whitelist would help here. It should be
> reasonably to fill it with the common machines at least.

Send patches to that effect.
 
> Plus, where is the CVE, and does AMD do anything to make BIOS vendors
> fix them?

May I redirect you to: https://www.amd.com/en/corporate/contact

Thanks,

tglx


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Petr Mladek
On Tue 2019-08-27 23:12:44, Uwe Kleine-König  wrote:
> Petr Mladek had some concerns:
> > The array is long, created by cpu&paste, the index of each code
> > is not obvious.
> 
> Yeah right, the array is long. This cannot really be changed because we
> have that many error codes. I don't understand your concern about the
> index not being obvious. The array was just a list of (number, string)
> pairs where the position in the array didn't have any semantic.

I missed that the number was stored in the array as well. I somehow
expected that it was array of strings.


> > There are ideas to make the code even more tricky to reduce
> > the size, keep it fast.
> 
> I think Enrico Weigelt's suggestion to use a case is the best
> performance-wise so that's what I picked up. Also I hope that
> performance isn't that important because the need to print an error
> should not be so common that it really hurts in production.

I personally do not like switch/case. It is a lot of code.
I wonder if it even saved some space.

If you want to safe space, I would use u16 to store the numbers.
Or I would use array of strings. There will be only few holes.

You might also consider handling only the most commonly
used codes from errno.h and errno-base.h (1..133). There will
be no holes and the codes are stable.


> > Both, %dE modifier and the output format (ECODE) is non-standard.
> 
> Yeah, obviously right. The problem is that the new modifier does
> something that wasn't implemented before, so it cannot match any
> standard. %pI is only known on Linux either, so I think being
> non-standard is a weak argument.

I am not completely sure that %p modifiers were a good idea.
They came before I started maintaining printk(). They add more
complex algorithms into paths where we could not report problems
easily (printk recursion). Also they are causing problems with
unit testing that might be done in userspace. These non-standard
formats cause that printk() can't be simply substituted by printf().

I am not keen to spread these problems over more formats.
Also %d format is more complicated. It is often used with
already existing modifiers.


> > Upper letters gain a lot of attention. But the error code is
> > only helper information. Also many error codes are misleading because
> > they are used either wrongly or there was no better available.
> 
> This isn't really an argument against the patch I think. Sure, if a
> function returned (say) EIO while ETIMEOUT would be better, my patch
> doesn't improve that detail. Still
>
> mydev: Failed to initialize blablub: EIO
>
> is more expressive than
> 
> mydev: Failed to initialize blablub: -5

OK, upper letters probably are not a problem.

But what about EWOULDBLOCK and EDEADLOCK? They have the same
error codes as EAGAIN and EDEADLK. It might cause a lot of confusion.
People might spend a lot of time searching for EAGAIN before they
notice that EWOULDBLOCK was used in the code instead.

Also you still did not answer the question where the idea came from.
Did it just look nice? Anyone asked for it? Who? Why?


> > There is no proof that this approach would be widely acceptable for
> > subsystem maintainers. Some might not like mass and "blind" code
> > changes. Some might not like the output at all.
> 
> I don't intend to mass convert existing code. I would restrict myself to
> updating the documentation and then maybe send a patch per subsystem as an
> example to let maintainers know and judge for themselves if they like it or
> not. And if it doesn't get picked up, we can just remove the feature again 
> next
> year (or so).

It looks like a lot of potentially useless work.


> I dropped the example conversion, I think the idea should be clear now
> even without an explicit example.

Please, do the opposite. Add conversion of few subsystems into the
patchset and add more people into CC. We will see immediately whether
it makes sense to spend time on this.

I personally think that this feature is not worth the code, data,
and bikeshedding.

Best Regards,
Petr


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
Hi!

> > > There is no way to reinitialize RDRAND from the kernel otherwise we would
> > > have exactly done that. If you know how to do that please tell.
> > 
> > Would they? AMD is not exactly doing good job with communication
> 
> Yes they would. Stop making up weird conspiracy theories.

> > here. If BIOS can do it, kernel can do it, too...
> 
> May I recommend to read up on SMM and BIOS being able to lock down access
> to certain facilities?
> 
> > or do you have information saying otherwise?
> 
> Yes. It was clearly stated by Tom that it can only be done in the
> BIOS.

Do you have a link for that? Because I don't think I seen that one.

> > > Also disabling it for every BIOS is the only way which can be done because
> > > there is no way to know whether the BIOS is fixed or not at cold boot
> > > time. And it has to be known there because applications cache the
> > 
> > I'm pretty sure DMI-based whitelist would help here. It should be
> > reasonably to fill it with the common machines at least.
> 
> Send patches to that effect.

Why should it be my job? AMD screwed this up, they should fix it
properly. And you should insist on proper fix.

> > Plus, where is the CVE, and does AMD do anything to make BIOS vendors
> > fix them?
> 
> May I redirect you to: https://www.amd.com/en/corporate/contact

That will certainly make communication easier, right.

Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Jani Nikula
On Wed, 28 Aug 2019, Petr Mladek  wrote:
> On Tue 2019-08-27 23:12:44, Uwe Kleine-König  wrote:
>> Petr Mladek had some concerns:
>> > The array is long, created by cpu&paste, the index of each code
>> > is not obvious.
>> 
>> Yeah right, the array is long. This cannot really be changed because we
>> have that many error codes. I don't understand your concern about the
>> index not being obvious. The array was just a list of (number, string)
>> pairs where the position in the array didn't have any semantic.
>
> I missed that the number was stored in the array as well. I somehow
> expected that it was array of strings.
>
>
>> > There are ideas to make the code even more tricky to reduce
>> > the size, keep it fast.
>> 
>> I think Enrico Weigelt's suggestion to use a case is the best
>> performance-wise so that's what I picked up. Also I hope that
>> performance isn't that important because the need to print an error
>> should not be so common that it really hurts in production.
>
> I personally do not like switch/case. It is a lot of code.
> I wonder if it even saved some space.
>
> If you want to safe space, I would use u16 to store the numbers.
> Or I would use array of strings. There will be only few holes.
>
> You might also consider handling only the most commonly
> used codes from errno.h and errno-base.h (1..133). There will
> be no holes and the codes are stable.
>
>
>> > Both, %dE modifier and the output format (ECODE) is non-standard.
>> 
>> Yeah, obviously right. The problem is that the new modifier does
>> something that wasn't implemented before, so it cannot match any
>> standard. %pI is only known on Linux either, so I think being
>> non-standard is a weak argument.
>
> I am not completely sure that %p modifiers were a good idea.
> They came before I started maintaining printk(). They add more
> complex algorithms into paths where we could not report problems
> easily (printk recursion). Also they are causing problems with
> unit testing that might be done in userspace. These non-standard
> formats cause that printk() can't be simply substituted by printf().
>
> I am not keen to spread these problems over more formats.
> Also %d format is more complicated. It is often used with
> already existing modifiers.
>
>
>> > Upper letters gain a lot of attention. But the error code is
>> > only helper information. Also many error codes are misleading because
>> > they are used either wrongly or there was no better available.
>> 
>> This isn't really an argument against the patch I think. Sure, if a
>> function returned (say) EIO while ETIMEOUT would be better, my patch
>> doesn't improve that detail. Still
>>
>> mydev: Failed to initialize blablub: EIO
>>
>> is more expressive than
>> 
>> mydev: Failed to initialize blablub: -5
>
> OK, upper letters probably are not a problem.
>
> But what about EWOULDBLOCK and EDEADLOCK? They have the same
> error codes as EAGAIN and EDEADLK. It might cause a lot of confusion.
> People might spend a lot of time searching for EAGAIN before they
> notice that EWOULDBLOCK was used in the code instead.
>
> Also you still did not answer the question where the idea came from.
> Did it just look nice? Anyone asked for it? Who? Why?
>
>
>> > There is no proof that this approach would be widely acceptable for
>> > subsystem maintainers. Some might not like mass and "blind" code
>> > changes. Some might not like the output at all.
>> 
>> I don't intend to mass convert existing code. I would restrict myself to
>> updating the documentation and then maybe send a patch per subsystem as an
>> example to let maintainers know and judge for themselves if they like it or
>> not. And if it doesn't get picked up, we can just remove the feature again 
>> next
>> year (or so).
>
> It looks like a lot of potentially useless work.
>
>
>> I dropped the example conversion, I think the idea should be clear now
>> even without an explicit example.
>
> Please, do the opposite. Add conversion of few subsystems into the
> patchset and add more people into CC. We will see immediately whether
> it makes sense to spend time on this.
>
> I personally think that this feature is not worth the code, data,
> and bikeshedding.

The obvious alternative, I think already mentioned, is to just add
strerror() or similar as a function. I doubt there'd be much opposition
to that. Folks could use %s and strerr(ret). And a follow-up could add
the special format specifier if needed.

BR,
Jani.



-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov
On Wed, Aug 28, 2019 at 01:49:47PM +0200, Pavel Machek wrote:
> AMD screwed this up,

Except that it wasn't AMD who screwed up but BIOS on *some* laptops.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Sergey Senozhatsky
On (08/28/19 14:54), Jani Nikula wrote:
[..]
> > I personally think that this feature is not worth the code, data,
> > and bikeshedding.
> 
> The obvious alternative, I think already mentioned, is to just add
> strerror() or similar as a function. I doubt there'd be much opposition
> to that. Folks could use %s and strerr(ret). And a follow-up could add
> the special format specifier if needed.

Yeah, I'd say that strerror() would be a better alternative
to vsprintf() specifier. (if we decide to add such functionality).

-ss


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
On Wed 2019-08-28 14:00:24, Borislav Petkov wrote:
> On Wed, Aug 28, 2019 at 01:49:47PM +0200, Pavel Machek wrote:
> > AMD screwed this up,
> 
> Except that it wasn't AMD who screwed up but BIOS on *some* laptops.

Yes, and now AMD has patch to break it on *all* machines.

Hmm. "Get random data" instruction that fails to return random data,
depending on BIOS... (and even indicates success, IIRC)... Sounds like
CPU vendor screwup to me.

And they can also fix it up. If re-initing random generator is
impossible from kernel, surely CPU microcode can be either updated to
do the init automatically, or at least allow kernel to do the init.

Best regards,
Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov
On Wed, Aug 28, 2019 at 02:09:36PM +0200, Pavel Machek wrote:
> Yes, and now AMD has patch to break it on *all* machines.

It doesn't break all machines - you need to look at that patch again.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
On Wed 2019-08-28 14:16:28, Borislav Petkov wrote:
> On Wed, Aug 28, 2019 at 02:09:36PM +0200, Pavel Machek wrote:
> > Yes, and now AMD has patch to break it on *all* machines.
> 
> It doesn't break all machines - you need to look at that patch again.

This is not a way to have an inteligent conversation.

The patch clearly breaks more machines than it has to. It is known to
cause regression. You snipped the rest of the email with better ways
to solve this.

Best regards,
Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov
On Wed, Aug 28, 2019 at 02:29:13PM +0200, Pavel Machek wrote:
> This is not a way to have an inteligent conversation.

No, this *is* the way to keep the conversation sane, without veering
off into some absurd claims.

So, to cut to the chase: you can simply add "rdrand=force" to your
cmdline parameters and get back to using RDRAND.

And yet if you still feel this fix does not meet your expectations,
you were told already to either produce patches or who to contact. I'm
afraid complaining on this thread won't get you anywhere but that's your
call.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Rasmus Villemoes
On 28/08/2019 14.02, Sergey Senozhatsky wrote:
> On (08/28/19 14:54), Jani Nikula wrote:
> [..]
>>> I personally think that this feature is not worth the code, data,
>>> and bikeshedding.
>>
>> The obvious alternative, I think already mentioned, is to just add
>> strerror() or similar as a function. I doubt there'd be much opposition
>> to that. Folks could use %s and strerr(ret). And a follow-up could add
>> the special format specifier if needed.
> 
> Yeah, I'd say that strerror() would be a better alternative
> to vsprintf() specifier. (if we decide to add such functionality).

Please no. The .text footprint of the changes at the call sites to do
pr_err("...%s...", errcode(err)) instead of the current
pr_err("...%d...", err) would very soon dwarf whatever is necessary to
implement %pE or %dE. Also that would prevent any kind of graceful
fallback in case err doesn't have a known value - errcode() would have
to return some fixed "huh, unknown error number" string, so we'd lose
the actual number.

Rasmus


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Sergey Senozhatsky
On (08/28/19 14:49), Rasmus Villemoes wrote:
> On 28/08/2019 14.02, Sergey Senozhatsky wrote:
> > On (08/28/19 14:54), Jani Nikula wrote:
> > [..]
> >>> I personally think that this feature is not worth the code, data,
> >>> and bikeshedding.
> >>
> >> The obvious alternative, I think already mentioned, is to just add
> >> strerror() or similar as a function. I doubt there'd be much opposition
> >> to that. Folks could use %s and strerr(ret). And a follow-up could add
> >> the special format specifier if needed.
> > 
> > Yeah, I'd say that strerror() would be a better alternative
> > to vsprintf() specifier. (if we decide to add such functionality).
> 
> Please no. The .text footprint of the changes at the call sites to do
> pr_err("...%s...", errcode(err)) instead of the current
> pr_err("...%d...", err) would very soon dwarf whatever is necessary to
> implement %pE or %dE.

New vsprintf() specifiers have some downsides as well. Should %dE
accidentally (via backport) make it to the -stable kernel, which
does not support %dE, and we are going to lose the actual error
code value as well.

-ss


Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
On Wed 2019-08-28 14:46:21, Borislav Petkov wrote:
> On Wed, Aug 28, 2019 at 02:29:13PM +0200, Pavel Machek wrote:
> > This is not a way to have an inteligent conversation.
> 
> No, this *is* the way to keep the conversation sane, without veering
> off into some absurd claims.
> 
> So, to cut to the chase: you can simply add "rdrand=force" to your
> cmdline parameters and get back to using RDRAND.
> 
> And yet if you still feel this fix does not meet your expectations,
> you were told already to either produce patches or who to contact. I'm
> afraid complaining on this thread won't get you anywhere but that's your
> call.

No, this does not meet my expectations, it violates stable kernel
rules, and will cause regression to some users, while better solution
is known to be available.

Best regards,

Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH 2/5] rcu/tree: Add multiple in-flight batches of kfree_rcu work

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 07:52:53AM +0800, Boqun Feng wrote:
> Hi Joel,
> 
> On Tue, Aug 27, 2019 at 03:01:56PM -0400, Joel Fernandes (Google) wrote:
> > During testing, it was observed that amount of memory consumed due
> > kfree_rcu() batching is 300-400MB. Previously we had only a single
> > head_free pointer pointing to the list of rcu_head(s) that are to be
> > freed after a grace period. Until this list is drained, we cannot queue
> > any more objects on it since such objects may not be ready to be
> > reclaimed when the worker thread eventually gets to drainin g the
> > head_free list.
> > 
> > We can do better by maintaining multiple lists as done by this patch.
> > Testing shows that memory consumption came down by around 100-150MB with
> > just adding another list. Adding more than 1 additional list did not
> > show any improvement.
> > 
> > Suggested-by: Paul E. McKenney 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> >  kernel/rcu/tree.c | 64 +--
> >  1 file changed, 45 insertions(+), 19 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 4f7c3096d786..9b9ae4db1c2d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2688,28 +2688,38 @@ EXPORT_SYMBOL_GPL(call_rcu);
> >  
> >  /* Maximum number of jiffies to wait before draining a batch. */
> >  #define KFREE_DRAIN_JIFFIES (HZ / 50)
> > +#define KFREE_N_BATCHES 2
> > +
> > +struct kfree_rcu_work {
> > +   /* The rcu_work node for queuing work with queue_rcu_work(). The work
> > +* is done after a grace period.
> > +*/
> > +   struct rcu_work rcu_work;
> > +
> > +   /* The list of objects that have now left ->head and are queued for
> > +* freeing after a grace period.
> > +*/
> > +   struct rcu_head *head_free;
> > +
> > +   struct kfree_rcu_cpu *krcp;
> > +};
> > +static DEFINE_PER_CPU(__typeof__(struct kfree_rcu_work)[KFREE_N_BATCHES], 
> > krw);
> >  
> 
> Why not
> 
>   static DEFINE_PER_CPU(struct kfree_rcu_work[KFREE_N_BATCHES], krw);
> 
> here? Am I missing something?

Yes, that's better.

> Further, given "struct kfree_rcu_cpu" is only for defining percpu
> variables, how about orginazing the data structure like:
> 
>   struct kfree_rcu_cpu {
>   ...
>   struct kfree_rcu_work krws[KFREE_N_BATCHES];
>   ...
>   }
> 
> This could save one pointer in kfree_rcu_cpu, and I think it provides
> better cache locality for accessing _cpu and _work on the same cpu.
> 
> Thoughts?

Yes, that's better. Thanks, Boqun! Following is the diff which I will fold
into this patch:

---8<---

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b3259306b7a5..fac5ae96d8b1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2717,7 +2717,6 @@ struct kfree_rcu_work {
 
struct kfree_rcu_cpu *krcp;
 };
-static DEFINE_PER_CPU(__typeof__(struct kfree_rcu_work)[KFREE_N_BATCHES], krw);
 
 /*
  * Maximum number of kfree(s) to batch, if this limit is hit then the batch of
@@ -2731,7 +2730,7 @@ struct kfree_rcu_cpu {
struct rcu_head *head;
 
/* Pointer to the per-cpu array of kfree_rcu_work structures */
-   struct kfree_rcu_work *krwp;
+   struct kfree_rcu_work krw_arr[KFREE_N_BATCHES];
 
/* Protect concurrent access to this structure and kfree_rcu_work. */
spinlock_t lock;
@@ -2800,8 +2799,8 @@ static inline bool queue_kfree_rcu_work(struct 
kfree_rcu_cpu *krcp)
 
lockdep_assert_held(&krcp->lock);
while (i < KFREE_N_BATCHES) {
-   if (!krcp->krwp[i].head_free) {
-   krwp = &(krcp->krwp[i]);
+   if (!krcp->krw_arr[i].head_free) {
+   krwp = &(krcp->krw_arr[i]);
break;
}
i++;
@@ -3780,13 +3779,11 @@ static void __init kfree_rcu_batch_init(void)
 
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-   struct kfree_rcu_work *krwp = &(per_cpu(krw, cpu)[0]);
int i = KFREE_N_BATCHES;
 
spin_lock_init(&krcp->lock);
-   krcp->krwp = krwp;
while (i--)
-   krwp[i].krcp = krcp;
+   krcp->krw_arr[i].krcp = krcp;
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
}
 }


[PATCH v2] rcu/tree: Add multiple in-flight batches of kfree_rcu work

2019-08-28 Thread Joel Fernandes (Google)
During testing, it was observed that amount of memory consumed due
kfree_rcu() batching is 300-400MB. Previously we had only a single
head_free pointer pointing to the list of rcu_head(s) that are to be
freed after a grace period. Until this list is drained, we cannot queue
any more objects on it since such objects may not be ready to be
reclaimed when the worker thread eventually gets to drainin g the
head_free list.

We can do better by maintaining multiple lists as done by this patch.
Testing shows that memory consumption came down by around 100-150MB with
just adding another list. Adding more than 1 additional list did not
show any improvement.

Suggested-by: Paul E. McKenney 
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/tree.c | 61 ---
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4f7c3096d786..5bf8f7e793ea 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2688,28 +2688,37 @@ EXPORT_SYMBOL_GPL(call_rcu);
 
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
+#define KFREE_N_BATCHES 2
+
+struct kfree_rcu_work {
+   /* The rcu_work node for queuing work with queue_rcu_work(). The work
+* is done after a grace period.
+*/
+   struct rcu_work rcu_work;
+
+   /* The list of objects that have now left ->head and are queued for
+* freeing after a grace period.
+*/
+   struct rcu_head *head_free;
+
+   struct kfree_rcu_cpu *krcp;
+};
 
 /*
  * Maximum number of kfree(s) to batch, if this limit is hit then the batch of
  * kfree(s) is queued for freeing after a grace period, right away.
  */
 struct kfree_rcu_cpu {
-   /* The rcu_work node for queuing work with queue_rcu_work(). The work
-* is done after a grace period.
-*/
-   struct rcu_work rcu_work;
 
/* The list of objects being queued in a batch but are not yet
 * scheduled to be freed.
 */
struct rcu_head *head;
 
-   /* The list of objects that have now left ->head and are queued for
-* freeing after a grace period.
-*/
-   struct rcu_head *head_free;
+   /* Pointer to the per-cpu array of kfree_rcu_work structures */
+   struct kfree_rcu_work krw_arr[KFREE_N_BATCHES];
 
-   /* Protect concurrent access to this structure. */
+   /* Protect concurrent access to this structure and kfree_rcu_work. */
spinlock_t lock;
 
/* The delayed work that flushes ->head to ->head_free incase ->head
@@ -2730,12 +2739,14 @@ static void kfree_rcu_work(struct work_struct *work)
 {
unsigned long flags;
struct rcu_head *head, *next;
-   struct kfree_rcu_cpu *krcp = container_of(to_rcu_work(work),
-   struct kfree_rcu_cpu, rcu_work);
+   struct kfree_rcu_work *krwp = container_of(to_rcu_work(work),
+   struct kfree_rcu_work, rcu_work);
+   struct kfree_rcu_cpu *krcp;
+
+   krcp = krwp->krcp;
 
spin_lock_irqsave(&krcp->lock, flags);
-   head = krcp->head_free;
-   krcp->head_free = NULL;
+   head = xchg(&krwp->head_free, NULL);
spin_unlock_irqrestore(&krcp->lock, flags);
 
/*
@@ -2758,19 +2769,28 @@ static void kfree_rcu_work(struct work_struct *work)
  */
 static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 {
+   int i = 0;
+   struct kfree_rcu_work *krwp = NULL;
+
lockdep_assert_held(&krcp->lock);
+   while (i < KFREE_N_BATCHES) {
+   if (!krcp->krw_arr[i].head_free) {
+   krwp = &(krcp->krw_arr[i]);
+   break;
+   }
+   i++;
+   }
 
-   /* If a previous RCU batch work is already in progress, we cannot queue
+   /* If both RCU batches are already in progress, we cannot queue
 * another one, just refuse the optimization and it will be retried
 * again in KFREE_DRAIN_JIFFIES time.
 */
-   if (krcp->head_free)
+   if (!krwp)
return false;
 
-   krcp->head_free = krcp->head;
-   krcp->head = NULL;
-   INIT_RCU_WORK(&krcp->rcu_work, kfree_rcu_work);
-   queue_rcu_work(system_wq, &krcp->rcu_work);
+   krwp->head_free = xchg(&krcp->head, NULL);
+   INIT_RCU_WORK(&krwp->rcu_work, kfree_rcu_work);
+   queue_rcu_work(system_wq, &krwp->rcu_work);
 
return true;
 }
@@ -3736,8 +3756,11 @@ static void __init kfree_rcu_batch_init(void)
 
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+   int i = KFREE_N_BATCHES;
 
spin_lock_init(&krcp->lock);
+   while (i--)
+   krcp->krw_arr[i].krcp = krcp;
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
}
 }
-- 
2.23.0.18

Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Thomas Gleixner
On Wed, 28 Aug 2019, Pavel Machek wrote:
> On Wed 2019-08-28 14:46:21, Borislav Petkov wrote:
> > On Wed, Aug 28, 2019 at 02:29:13PM +0200, Pavel Machek wrote:
> > > This is not a way to have an inteligent conversation.
> > 
> > No, this *is* the way to keep the conversation sane, without veering
> > off into some absurd claims.
> > 
> > So, to cut to the chase: you can simply add "rdrand=force" to your
> > cmdline parameters and get back to using RDRAND.
> > 
> > And yet if you still feel this fix does not meet your expectations,
> > you were told already to either produce patches or who to contact. I'm
> > afraid complaining on this thread won't get you anywhere but that's your
> > call.
> 
> No, this does not meet my expectations, it violates stable kernel
> rules, and will cause regression to some users, while better solution
> is known to be available.

Your unqualified ranting does not meet my expectation either and it
violates any rule of common sense.

For the record:

  Neither AMD nor we have any idea which particular machines have a fixed
  BIOS and which have not. There is no technical indicator either at boot
  time as the wreckage manifests itself only after resume.

  So in the interest of users the only sensible decision is to disable
  RDRAND for this class of CPUs.

  If you have a list of machines which have a fixed BIOS, then provide it
  in form of patches. If not then stop claiming that there is a better
  solution available.

Anyway, I'm done with that and further rants of yours go directly to
/dev/null.

Thanks for wasting everyones time

   tglx


Re: [PATCH v8 11/27] x86/mm: Introduce _PAGE_DIRTY_SW

2019-08-28 Thread Yu-cheng Yu
On Wed, 2019-08-28 at 09:03 +0200, Peter Zijlstra wrote:
> On Tue, Aug 27, 2019 at 03:37:12PM -0700, Yu-cheng Yu wrote:
> > On Fri, 2019-08-23 at 16:02 +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 13, 2019 at 01:52:09PM -0700, Yu-cheng Yu wrote:
> > > 
> > > > +static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t
> > > > to)
> > > > +{
> > > > +   if (pte_flags(pte) & from)
> > > > +   pte = pte_set_flags(pte_clear_flags(pte, from), to);
> > > > +   return pte;
> > > > +}
> > > 
> > > Aside of the whole conditional thing (I agree it would be better to have
> > > this unconditionally); the function doesn't really do as advertised.
> > > 
> > > That is, if @from is clear, it doesn't endeavour to make sure @to is
> > > also clear.
> > > 
> > > Now it might be sufficient, but in that case it really needs a comment
> > > and or different name.
> > > 
> > > An implementation that actually moves the bit is something like:
> > > 
> > >   pteval_t a,b;
> > > 
> > >   a = native_pte_value(pte);
> > >   b = (a >> from_bit) & 1;
> > >   a &= ~((1ULL << from_bit) | (1ULL << to_bit));
> > >   a |= b << to_bit;
> > >   return make_native_pte(a);
> > 
> > There can be places calling pte_wrprotect() on a PTE that is already RO +
> > DIRTY_SW.  Then in pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW) we do
> > not
> >  want to clear _PAGE_DIRTY_SW.  But, I will look into this and make it more
> > obvious.
> 
> Well, then the name 'move' is just wrong, because that is not the
> semantics you're looking for.
> 
> So the thing is; if you provide a generic function that 'munges' two
> bits, then it's name had better be accurate. But AFAICT you only ever
> used this for the DIRTY bits, so it might be better to have a function
> specifically for that and with a comment that spells out the exact
> semantics and reasons for them.

Yes, I will work on that.

Yu-cheng


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Uwe Kleine-König
On 8/28/19 2:59 PM, Sergey Senozhatsky wrote:
> On (08/28/19 14:49), Rasmus Villemoes wrote:
>> On 28/08/2019 14.02, Sergey Senozhatsky wrote:
>>> On (08/28/19 14:54), Jani Nikula wrote:
>>> [..]
> I personally think that this feature is not worth the code, data,
> and bikeshedding.

 The obvious alternative, I think already mentioned, is to just add
 strerror() or similar as a function. I doubt there'd be much opposition
 to that. Folks could use %s and strerr(ret). And a follow-up could add
 the special format specifier if needed.
>>>
>>> Yeah, I'd say that strerror() would be a better alternative
>>> to vsprintf() specifier. (if we decide to add such functionality).
>>
>> Please no. The .text footprint of the changes at the call sites to do
>> pr_err("...%s...", errcode(err)) instead of the current
>> pr_err("...%d...", err) would very soon dwarf whatever is necessary to
>> implement %pE or %dE.

Yeah, that's what I think, too. I cannot imagine a user of strerror()
who needs the string representation for something different than to feed
it to one of the family members of printk. That's also why I think that
the other already existing format specifier are a good idea.

It might not be the nicest part of the printk code, but this way it is
at least concentrated in one place only.

> New vsprintf() specifiers have some downsides as well. Should %dE
> accidentally (via backport) make it to the -stable kernel, which
> does not support %dE, and we are going to lose the actual error
> code value as well.

That is wrong. When you do

pr_err("There are no round tuits to give out: %dE\n", -ENOENT);

in a kernel that doesn't support %dE you get:

There are no round tuits to give out: -2E

That's a bit ugly but I can still work out what the original value was.

Best regards
Uwe



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] docs: kbuild: fix invalid ReST syntax

2019-08-28 Thread Masahiro Yamada
On Wed, Aug 14, 2019 at 7:54 PM Masahiro Yamada
 wrote:
>
> I see the following warnings when I open this document with a ReST
> viewer, retext:
>
> /home/masahiro/ref/linux/Documentation/kbuild/makefiles.rst:1142: (WARNING/2) 
> Inline emphasis start-string without end-string.
> /home/masahiro/ref/linux/Documentation/kbuild/makefiles.rst:1152: (WARNING/2) 
> Inline emphasis start-string without end-string.
> /home/masahiro/ref/linux/Documentation/kbuild/makefiles.rst:1154: (WARNING/2) 
> Inline emphasis start-string without end-string.
>
> These hunks were added by commit e846f0dc57f4 ("kbuild: add support
> for ensuring headers are self-contained") and commit 1e21cbfada87
> ("kbuild: support header-test-pattern-y"), respectively. They were
> written not for ReST but for the plain text, and merged via the
> kbuild tree.
>
> In the same development cycle, this document was converted to ReST
> by commit cd238effefa2 ("docs: kbuild: convert docs to ReST and rename
> to *.rst"), and merged via the doc sub-system.
>
> Merging them together into Linus' tree resulted in the current situation.
>
> To fix the syntax, surround the asterisks with back-quotes, and
> use :: for the code sample.
>
> Fixes: 39ceda5ce1b0 ("Merge tag 'kbuild-v5.3' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
> Signed-off-by: Masahiro Yamada 
> ---
>


Both applied to linux-kbuild.





>  Documentation/kbuild/makefiles.rst | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/kbuild/makefiles.rst 
> b/Documentation/kbuild/makefiles.rst
> index f4f0f7ffde2b..b4c28c543d72 100644
> --- a/Documentation/kbuild/makefiles.rst
> +++ b/Documentation/kbuild/makefiles.rst
> @@ -1139,7 +1139,7 @@ When kbuild executes, the following steps are followed 
> (roughly):
>
>  header-test-y
>
> -   header-test-y specifies headers (*.h) in the current directory that
> +   header-test-y specifies headers (`*.h`) in the current directory that
> should be compile tested to ensure they are self-contained,
> i.e. compilable as standalone units. If CONFIG_HEADER_TEST is enabled,
> this builds them as part of extra-y.
> @@ -1147,11 +1147,11 @@ When kbuild executes, the following steps are 
> followed (roughly):
>  header-test-pattern-y
>
> This works as a weaker version of header-test-y, and accepts wildcard
> -   patterns. The typical usage is:
> +   patterns. The typical usage is::
>
> - header-test-pattern-y += *.h
> +   header-test-pattern-y += *.h
>
> -   This specifies all the files that matches to '*.h' in the current
> +   This specifies all the files that matches to `*.h` in the current
> directory, but the files in 'header-test-' are excluded.
>
>  6.7 Commands useful for building a boot image
> --
> 2.17.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH] rcu/dyntick-idle: Add better tracing

2019-08-28 Thread Joel Fernandes (Google)
The dyntick-idle traces are a bit confusing. This patch makes it simpler
and adds some missing cases such as EQS-enter because user vs idle mode.

Following are the changes:
(1) Add a new context field to trace_rcu_dyntick tracepoint. This
context field can be "USER", "IDLE" or "IRQ".

(2) Remove the "++=" and "--=" strings and replace them with
   "StillNonIdle". This is much easier on the eyes, and the -- and ++
   are easily apparent in the dynticks_nesting counters we are printing
   anyway.

This patch is based on the previous patches to simplify rcu_dyntick
counters [1] and with these traces, I have verified the counters are
working properly.

[1]
Link: https://lore.kernel.org/patchwork/patch/1120021/
Link: https://lore.kernel.org/patchwork/patch/1120022/

Signed-off-by: Joel Fernandes (Google) 
---
 include/trace/events/rcu.h | 13 -
 kernel/rcu/tree.c  | 17 +++--
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 66122602bd08..474c1f7e7104 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -449,12 +449,14 @@ TRACE_EVENT_RCU(rcu_fqs,
  */
 TRACE_EVENT_RCU(rcu_dyntick,
 
-   TP_PROTO(const char *polarity, long oldnesting, long newnesting, 
atomic_t dynticks),
+   TP_PROTO(const char *polarity, const char *context, long oldnesting,
+long newnesting, atomic_t dynticks),
 
-   TP_ARGS(polarity, oldnesting, newnesting, dynticks),
+   TP_ARGS(polarity, context, oldnesting, newnesting, dynticks),
 
TP_STRUCT__entry(
__field(const char *, polarity)
+   __field(const char *, context)
__field(long, oldnesting)
__field(long, newnesting)
__field(int, dynticks)
@@ -462,14 +464,15 @@ TRACE_EVENT_RCU(rcu_dyntick,
 
TP_fast_assign(
__entry->polarity = polarity;
+   __entry->context = context;
__entry->oldnesting = oldnesting;
__entry->newnesting = newnesting;
__entry->dynticks = atomic_read(&dynticks);
),
 
-   TP_printk("%s %lx %lx %#3x", __entry->polarity,
- __entry->oldnesting, __entry->newnesting,
- __entry->dynticks & 0xfff)
+   TP_printk("%s %s %lx %lx %#3x", __entry->polarity,
+   __entry->context, __entry->oldnesting, __entry->newnesting,
+   __entry->dynticks & 0xfff)
 );
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1465a3e406f8..1a65919ec800 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -570,7 +570,8 @@ static void rcu_eqs_enter(bool user)
}
 
lockdep_assert_irqs_disabled();
-   trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, 
rdp->dynticks);
+   trace_rcu_dyntick(TPS("Start"), (user ? TPS("USER") : TPS("IDLE")),
+ rdp->dynticks_nesting, 0, rdp->dynticks);
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
!is_idle_task(current));
rdp = this_cpu_ptr(&rcu_data);
do_nocb_deferred_wakeup(rdp);
@@ -642,15 +643,17 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
 * leave it in non-RCU-idle state.
 */
if (rdp->dynticks_nesting != 1) {
-   trace_rcu_dyntick(TPS("--="), rdp->dynticks_nesting,
- rdp->dynticks_nesting - 2, rdp->dynticks);
+   trace_rcu_dyntick(TPS("StillNonIdle"), TPS("IRQ"),
+ rdp->dynticks_nesting, rdp->dynticks_nesting 
- 2,
+ rdp->dynticks);
WRITE_ONCE(rdp->dynticks_nesting, /* No store tearing. */
   rdp->dynticks_nesting - 2);
return;
}
 
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
-   trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nesting, 0, 
rdp->dynticks);
+   trace_rcu_dyntick(TPS("Start"), TPS("IRQ"), rdp->dynticks_nesting, 0,
+ rdp->dynticks);
WRITE_ONCE(rdp->dynticks_nesting, 0); /* Avoid store tearing. */
 
if (irq)
@@ -733,7 +736,8 @@ static void rcu_eqs_exit(bool user)
rcu_dynticks_task_exit();
rcu_dynticks_eqs_exit();
rcu_cleanup_after_idle();
-   trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks);
+   trace_rcu_dyntick(TPS("End"), (user ? TPS("USER") : TPS("IDLE")),
+ rdp->dynticks_nesting, 1, rdp->dynticks);
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
!is_idle_task(current));
WRITE_ONCE(rdp->dynticks_nesting, 1);
 
@@ -825,7 +829,8 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
}
 
-   trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
+   trace_rcu_dyntick(incb

Re: [PATCH v2] doc: kselftest: update for clarity on running kselftests in CI rings

2019-08-28 Thread shuah

On 8/26/19 6:37 PM, Shuah Khan wrote:

Update to add clarity and recommendations on running newer kselftests
on older kernels vs. matching the kernel and kselftest revisions.

The recommendation is "Match kernel revision and kselftest."

Signed-off-by: Shuah Khan 
---
Changes since v1: Fixed "WARNING: Title underline too short."


I have a few more changes and would like to make and send a v3 after
the LPC's Testing and Fuzzing kselftest discussion.

Holding off on this patch for now.

thanks,
-- Shuah


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Uwe Kleine-König
Hello Petr,

On 8/28/19 1:32 PM, Petr Mladek wrote:
> On Tue 2019-08-27 23:12:44, Uwe Kleine-König  wrote:
>> Petr Mladek had some concerns:
>>> There are ideas to make the code even more tricky to reduce
>>> the size, keep it fast.
>>
>> I think Enrico Weigelt's suggestion to use a case is the best
>> performance-wise so that's what I picked up. Also I hope that
>> performance isn't that important because the need to print an error
>> should not be so common that it really hurts in production.
> 
> I personally do not like switch/case. It is a lot of code.
> I wonder if it even saved some space.

I guess we have to die either way. Either it is quick or it is space
efficient. With the big case I trust the compiler to pick something
sensible expecting that it adapts for example to -Osize.

> If you want to safe space, I would use u16 to store the numbers.
> Or I would use array of strings. There will be only few holes.
> 
> You might also consider handling only the most commonly
> used codes from errno.h and errno-base.h (1..133). There will
> be no holes and the codes are stable.

I'd like to postpone the discussion about "how" until we agreed about
the "if at all".

>>> Both, %dE modifier and the output format (ECODE) is non-standard.
>>
>> Yeah, obviously right. The problem is that the new modifier does
>> something that wasn't implemented before, so it cannot match any
>> standard. %pI is only known on Linux either, so I think being
>> non-standard is a weak argument.
> 
> I am not completely sure that %p modifiers were a good idea.
> They came before I started maintaining printk(). They add more
> complex algorithms into paths where we could not report problems
> easily (printk recursion). Also they are causing problems with
> unit testing that might be done in userspace. These non-standard
> formats cause that printk() can't be simply substituted by printf().

In my eyes the wish to have printk() and userspace's printf
feature-identical isn't helpful because they have similar but not equal
purposes. And if a driver author cares about being able to use their
code 1:1 in userspace they could just not use %dE, %pI and whatever
there is additionally.

> I am not keen to spread these problems over more formats.
> Also %d format is more complicated. It is often used with
> already existing modifiers.

I don't understand what you want to tell me here.

>>> Upper letters gain a lot of attention. But the error code is
>>> only helper information. Also many error codes are misleading because
>>> they are used either wrongly or there was no better available.
>>
>> This isn't really an argument against the patch I think. Sure, if a
>> function returned (say) EIO while ETIMEOUT would be better, my patch
>> doesn't improve that detail. Still
>>
>> mydev: Failed to initialize blablub: EIO
>>
>> is more expressive than
>>
>> mydev: Failed to initialize blablub: -5
> 
> OK, upper letters probably are not a problem.
> 
> But what about EWOULDBLOCK and EDEADLOCK? They have the same
> error codes as EAGAIN and EDEADLK. It might cause a lot of confusion.
>
> People might spend a lot of time searching for EAGAIN before they
> notice that EWOULDBLOCK was used in the code instead.

It already does now. If you start to debug an error message that tells
you the error is -11, you might well come to the conclusion you have to
look for EDEADLK. IMHO the right approach here should be to declare one
of the duplicates as the "right" one and replace the wrong one in the
complete tree (apart from the definition for user space of course).
After that the problem with these duplicates is completely orthogonal
(it is already now mostly orthogonal in my eyes) and also solved for
those who picked the wrong one by hand.

> Also you still did not answer the question where the idea came from.
> Did it just look nice? Anyone asked for it? Who? Why?

It was my idea, and I didn't talk about it before creating a patch. You
asked in reply to v1: "Did it look like a good idea?
Did anyone got tired by searching for the error codes many
times a day? Did the idea came from a developer, support, or user, please?"

So yes, having to lookup the right error code from messages is something
that annoys me, especially because there are at least two files you have
to check for the definition. I consider myself to have all three hats
(developer, supporter and user), and your feedback set apart my
impression is that all people replying to this thread consider it a good
idea, too.

>>> There is no proof that this approach would be widely acceptable for
>>> subsystem maintainers. Some might not like mass and "blind" code
>>> changes. Some might not like the output at all.
>>
>> I don't intend to mass convert existing code. I would restrict myself to
>> updating the documentation and then maybe send a patch per subsystem as an
>> example to let maintainers know and judge for themselves if they like it or
>> not. And if it doesn't get picked up, we c

Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Uwe Kleine-König
On 8/28/19 1:32 PM, Petr Mladek wrote:
> On Tue 2019-08-27 23:12:44, Uwe Kleine-König  wrote:
>> I dropped the example conversion, I think the idea should be clear now
>> even without an explicit example.
> 
> Please, do the opposite. Add conversion of few subsystems into the
> patchset and add more people into CC. We will see immediately whether
> it makes sense to spend time on this.

For now I asked in the arm linux irc channel and got two people replying
(both added to Cc:):

Mark Brown (maintainer of SPI, regmap, ASoC and regulator) said:

1567019926 < broonie> ukleinek: I think that's a great idea and have
thought about trying to implement it in the past.
1567019937 < broonie> ukleinek: Making the logs more directly readable
is enormously helpful.

and Alexandre Belloni (arm/at91, mips/microsemi, rtc) said:

1567021451 < abelloni> ukleinek: seems good to me but it would probably
be better to be able to generate the list

(I fully agree to the wish to generate the list, as I already wrote
before, I don't have a good idea how to do that without generating
C-Code by some means which is ugly and also complicated by the fact that
there are several locations (at least for now) that have definitions for
error codes.)

Best regards
Uwe



signature.asc
Description: OpenPGP digital signature


Re: [RFC v1 1/2] rcu/tree: Clean up dynticks counter usage

2019-08-28 Thread Paul E. McKenney
On Mon, Aug 26, 2019 at 09:33:53PM -0400, Joel Fernandes (Google) wrote:
> The dynticks counter are confusing due to crowbar writes of
> DYNTICK_IRQ_NONIDLE whose purpose is to detect half-interrupts (i.e. we
> see rcu_irq_enter() but not rcu_irq_exit() due to a usermode upcall) and
> if so then do a reset of the dyntick_nmi_nesting counters. This patch
> tries to get rid of DYNTICK_IRQ_NONIDLE while still keeping the code
> working, fully functional, and less confusing. The confusion recently
> has even led to patches forgetting that DYNTICK_IRQ_NONIDLE was written
> to which wasted lots of time.
> 
> The patch has the following changes:
> 
> (1) Use dynticks_nesting instead of dynticks_nmi_nesting for determining
> outer most "EQS exit". This is needed to detect in
> rcu_nmi_enter_common() if we have already EQS-exited, such as because of
> a syscall. Currently we rely on a forced write of DYNTICK_IRQ_NONIDLE
> from rcu_eqs_exit() for this purpose. This is one purpose of the
> DYNTICK_IRQ_NONIDLE write (other than detecting half-interrupts).
> However, we do not need to do that. dyntick_nesting already tells us that
> we have EQS-exited so just use that thus removing the dependence of
> dynticks_nmi_nesting for this purpose.
> 
> (2) Keep dynticks_nmi_nesting around because:
> 
>   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
>   interrupt nesting level.
> 
>   (b) We need to detect half-interrupts till we are sure they're not an
>   issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> 
> (3) Since we got rid of DYNTICK_IRQ_NONIDLE, we also do cheaper
> comparisons with zero instead for the code that keeps the tick on in
> rcu_nmi_enter_common().
> 
> In the next patch, both of the concerns of (2) will be addressed and
> then we can get rid of dynticks_nmi_nesting, however one step at a time.

Postponing discussion of the commit log for the moment.

> Signed-off-by: Joel Fernandes (Google) 
> ---
>  kernel/rcu/rcu.h  |  4 
>  kernel/rcu/tree.c | 60 ---
>  2 files changed, 36 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index aeec70fda82c..046833f3784b 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -12,10 +12,6 @@
>  
>  #include 
>  
> -/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */
> -#define DYNTICK_IRQ_NONIDLE  ((LONG_MAX / 2) + 1)
> -
> -

OK.

>  /*
>   * Grace-period counter management.
>   */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 68ebf0eb64c8..255cd6835526 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -81,7 +81,7 @@
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>   .dynticks_nesting = 1,
> - .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
> + .dynticks_nmi_nesting = 0,

C initializes to zero by default, so this can simply be deleted.

>   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  struct rcu_state rcu_state = {
> @@ -558,17 +558,18 @@ EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
>  /*
>   * Enter an RCU extended quiescent state, which can be either the
>   * idle loop or adaptive-tickless usermode execution.
> - *
> - * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
> - * the possibility of usermode upcalls having messed up our count
> - * of interrupt nesting level during the prior busy period.
>   */
>  static void rcu_eqs_enter(bool user)
>  {
>   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  
> - WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> + /* Entering usermode/idle from interrupt is not handled. These would
> +  * mean usermode upcalls or idle entry happened from interrupts. But,
> +  * reset the counter if we warn.
> +  */

Please either put the "/*" on its own line or use "//"-style comments.

> + if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> + WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +

@@@

>   WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>rdp->dynticks_nesting == 0);
>   if (rdp->dynticks_nesting != 1) {
> @@ -642,23 +643,27 @@ static __always_inline void rcu_nmi_exit_common(bool 
> irq)
>* (We are exiting an NMI handler, so RCU better be paying attention
>* to us!)
>*/
> + WARN_ON_ONCE(rdp->dynticks_nesting <= 0);

This is fine.

>   WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
>   WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
>  
> + WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> +rdp->dynticks_nmi_nesting - 1);

This is problematic.  The +/-1 and +/-2 dance is specifically for NMIs, so...

>   /*
>* If the nesting level is not 1, the CPU wasn't RCU-idle, so
>* leave it in non-RCU-idle state.
>*/
> - if (rdp->dynticks_nmi_nesting != 1) {
> -

Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Paul E. McKenney
On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> The dynticks_nmi_nesting counter serves 4 purposes:
> 
>   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
>   interrupt nesting level.
> 
>   (b) We need to detect half-interrupts till we are sure they're not an
>   issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> 
>   (c) When a quiescent state report is needed from a nohz_full CPU.
>   The nesting counter detects we are a first level interrupt.
> 
> For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> outermost interrupt that interrupted an RCU-idle state can set it to 1.
> 
> For (b), this warning condition has not occurred for several kernel
> releases.  But we still keep the warning but change it to use
> in_interrupt() instead of the nesting counter. In a later year, we can
> remove the warning.
> 
> For (c), the nest check is not really necessary since forced_tick would
> have been set to true in the outermost interrupt, so the nested/NMI
> interrupts will check forced_tick anyway, and bail.

Skipping the commit log and documentation for this pass.

> Signed-off-by: Joel Fernandes (Google) 
> ---
>  .../Data-Structures/Data-Structures.rst   | 31 +++--
>  Documentation/RCU/stallwarn.txt   |  6 +-
>  kernel/rcu/tree.c | 64 +++
>  kernel/rcu/tree.h |  4 +-
>  kernel/rcu/tree_stall.h   |  4 +-
>  5 files changed, 41 insertions(+), 68 deletions(-)

[ . . . ]

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 255cd6835526..1465a3e406f8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -81,7 +81,6 @@
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>   .dynticks_nesting = 1,
> - .dynticks_nmi_nesting = 0,

This should be in the previous patch, give or take naming.

>   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  struct rcu_state rcu_state = {
> @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>   /* Check for counter underflows */
>   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
>"RCU dynticks_nesting counter underflow!");
> - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> -  "RCU dynticks_nmi_nesting counter underflow/zero!");
>  
> - /* Are we at first interrupt nesting level? */
> - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> - return false;
> -
> - /* Does CPU appear to be idle from an RCU standpoint? */
> - return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> + /* Are we the outermost interrupt that arrived when RCU was idle? */
> + return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
>  }
>  
>  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch ... 
> */
> @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
>   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  
>   /* Entering usermode/idle from interrupt is not handled. These would
> -  * mean usermode upcalls or idle entry happened from interrupts. But,
> -  * reset the counter if we warn.
> +  * mean usermode upcalls or idle exit happened from interrupts. Remove
> +  * the warning by 2020.
>*/
> - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> + WARN_ON_ONCE(in_interrupt());

And this is a red flag.  Bad things happen should some common code
that disables BH be invoked from the idle loop.  This might not be
happening now, but we need to avoid this sort of constraint.

How about instead merging ->dyntick_nesting into the low-order bits
of ->dyntick_nmi_nesting?

Yes, this assumes that we don't enter process level twice, but it should
be easy to add a WARN_ON() to test for that.  Except that we don't have
to because there is already this near the end of rcu_eqs_exit():

WARN_ON_ONCE(rdp->dynticks_nmi_nesting);

So the low-order bit of the combined counter could indicate process-level
non-idle, the next three bits could be unused to make interpretation
of hex printouts easier, and then the rest of the bits could be used in
the same way as currently.

This would allow a single read to see the full state, so that 0x1 means
at process level in the kernel, 0x11 is interrupt (or NMI) from process
level, 0x10 is interrupt/NMI from idle/user, and so on.

What am I missing here?  Why wouldn't this work, and without adding yet
another RCU-imposed constraint on some other subsystem?

Thanx, Paul


Re: [PATCH 0/5] kfree_rcu() additions for -rcu

2019-08-28 Thread Paul E. McKenney
On Tue, Aug 27, 2019 at 03:01:54PM -0400, Joel Fernandes (Google) wrote:
> Hi,
> 
> This is a series on top of the patch "rcu/tree: Add basic support for 
> kfree_rcu() batching".
> 
> Link: http://lore.kernel.org/r/20190814160411.58591-1-j...@joelfernandes.org
> 
> It adds performance tests, some clean ups and removal of "lazy" RCU callbacks.
> 
> Now that kfree_rcu() is handled separately from call_rcu(), we also get rid of
> kfree "lazy" handling from tree RCU as suggested by Paul which will be unused.
> This also results in a nice negative delta as well.
> 
> Joel Fernandes (Google) (5):
> rcu/rcuperf: Add kfree_rcu() performance Tests
> rcu/tree: Add multiple in-flight batches of kfree_rcu work
> rcu/tree: Add support for debug_objects debugging for kfree_rcu()
> rcu: Remove kfree_rcu() special casing and lazy handling
> rcu: Remove kfree_call_rcu_nobatch()
> 
> Documentation/RCU/stallwarn.txt   |  13 +-
> .../admin-guide/kernel-parameters.txt |  13 ++
> include/linux/rcu_segcblist.h |   2 -
> include/linux/rcutiny.h   |   5 -
> include/linux/rcutree.h   |   1 -
> include/trace/events/rcu.h|  32 ++--
> kernel/rcu/rcu.h  |  27 ---
> kernel/rcu/rcu_segcblist.c|  25 +--
> kernel/rcu/rcu_segcblist.h|  25 +--
> kernel/rcu/rcuperf.c  | 173 +-
> kernel/rcu/srcutree.c |   4 +-
> kernel/rcu/tiny.c |  29 ++-
> kernel/rcu/tree.c | 145 ++-
> kernel/rcu/tree.h |   1 -
> kernel/rcu/tree_plugin.h  |  42 +
> kernel/rcu/tree_stall.h   |   6 +-
> 16 files changed, 337 insertions(+), 206 deletions(-)

Looks like a 131-line positive delta to me.  ;-)

Thanx, Paul


Re: [PATCH 0/5] kfree_rcu() additions for -rcu

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 01:28:08PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 27, 2019 at 03:01:54PM -0400, Joel Fernandes (Google) wrote:
> > Hi,
> > 
> > This is a series on top of the patch "rcu/tree: Add basic support for 
> > kfree_rcu() batching".
> > 
> > Link: http://lore.kernel.org/r/20190814160411.58591-1-j...@joelfernandes.org
> > 
> > It adds performance tests, some clean ups and removal of "lazy" RCU 
> > callbacks.
> > 
> > Now that kfree_rcu() is handled separately from call_rcu(), we also get rid 
> > of
> > kfree "lazy" handling from tree RCU as suggested by Paul which will be 
> > unused.
> > This also results in a nice negative delta as well.
> > 
> > Joel Fernandes (Google) (5):
> > rcu/rcuperf: Add kfree_rcu() performance Tests
> > rcu/tree: Add multiple in-flight batches of kfree_rcu work
> > rcu/tree: Add support for debug_objects debugging for kfree_rcu()
> > rcu: Remove kfree_rcu() special casing and lazy handling
> > rcu: Remove kfree_call_rcu_nobatch()
> > 
> > Documentation/RCU/stallwarn.txt   |  13 +-
> > .../admin-guide/kernel-parameters.txt |  13 ++
> > include/linux/rcu_segcblist.h |   2 -
> > include/linux/rcutiny.h   |   5 -
> > include/linux/rcutree.h   |   1 -
> > include/trace/events/rcu.h|  32 ++--
> > kernel/rcu/rcu.h  |  27 ---
> > kernel/rcu/rcu_segcblist.c|  25 +--
> > kernel/rcu/rcu_segcblist.h|  25 +--
> > kernel/rcu/rcuperf.c  | 173 +-
> > kernel/rcu/srcutree.c |   4 +-
> > kernel/rcu/tiny.c |  29 ++-
> > kernel/rcu/tree.c | 145 ++-
> > kernel/rcu/tree.h |   1 -
> > kernel/rcu/tree_plugin.h  |  42 +
> > kernel/rcu/tree_stall.h   |   6 +-
> > 16 files changed, 337 insertions(+), 206 deletions(-)
> 
> Looks like a 131-line positive delta to me.  ;-)

Not if you overlook the rcuperf changes which is just test code. :-D ;-)

thanks,

 - Joel



Re: [PATCH v2] rcu/tree: Add multiple in-flight batches of kfree_rcu work

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 10:09:52AM -0400, Joel Fernandes (Google) wrote:
> During testing, it was observed that amount of memory consumed due
> kfree_rcu() batching is 300-400MB. Previously we had only a single
> head_free pointer pointing to the list of rcu_head(s) that are to be
> freed after a grace period. Until this list is drained, we cannot queue
> any more objects on it since such objects may not be ready to be
> reclaimed when the worker thread eventually gets to drainin g the
> head_free list.
> 
> We can do better by maintaining multiple lists as done by this patch.
> Testing shows that memory consumption came down by around 100-150MB with
> just adding another list. Adding more than 1 additional list did not
> show any improvement.

Nice!  A few comments below.  Please address them and post a full v3.
(I am off the next two days, and I guarantee you that upon return I will
mix and match the wrong patches otherwise!)

> Suggested-by: Paul E. McKenney 
> Signed-off-by: Joel Fernandes (Google) 
> ---
>  kernel/rcu/tree.c | 61 ---
>  1 file changed, 42 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 4f7c3096d786..5bf8f7e793ea 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2688,28 +2688,37 @@ EXPORT_SYMBOL_GPL(call_rcu);
>  
>  /* Maximum number of jiffies to wait before draining a batch. */
>  #define KFREE_DRAIN_JIFFIES (HZ / 50)
> +#define KFREE_N_BATCHES 2
> +
> +struct kfree_rcu_work {
> + /* The rcu_work node for queuing work with queue_rcu_work(). The work
> +  * is done after a grace period.
> +  */
> + struct rcu_work rcu_work;
> +
> + /* The list of objects that have now left ->head and are queued for
> +  * freeing after a grace period.
> +  */
> + struct rcu_head *head_free;
> +
> + struct kfree_rcu_cpu *krcp;
> +};
>  
>  /*
>   * Maximum number of kfree(s) to batch, if this limit is hit then the batch 
> of
>   * kfree(s) is queued for freeing after a grace period, right away.
>   */
>  struct kfree_rcu_cpu {
> - /* The rcu_work node for queuing work with queue_rcu_work(). The work
> -  * is done after a grace period.
> -  */
> - struct rcu_work rcu_work;
>  
>   /* The list of objects being queued in a batch but are not yet
>* scheduled to be freed.
>*/
>   struct rcu_head *head;
>  
> - /* The list of objects that have now left ->head and are queued for
> -  * freeing after a grace period.
> -  */
> - struct rcu_head *head_free;
> + /* Pointer to the per-cpu array of kfree_rcu_work structures */
> + struct kfree_rcu_work krw_arr[KFREE_N_BATCHES];
>  
> - /* Protect concurrent access to this structure. */
> + /* Protect concurrent access to this structure and kfree_rcu_work. */
>   spinlock_t lock;
>  
>   /* The delayed work that flushes ->head to ->head_free incase ->head
> @@ -2730,12 +2739,14 @@ static void kfree_rcu_work(struct work_struct *work)
>  {
>   unsigned long flags;
>   struct rcu_head *head, *next;
> - struct kfree_rcu_cpu *krcp = container_of(to_rcu_work(work),
> - struct kfree_rcu_cpu, rcu_work);
> + struct kfree_rcu_work *krwp = container_of(to_rcu_work(work),
> + struct kfree_rcu_work, rcu_work);
> + struct kfree_rcu_cpu *krcp;
> +
> + krcp = krwp->krcp;
>  
>   spin_lock_irqsave(&krcp->lock, flags);
> - head = krcp->head_free;
> - krcp->head_free = NULL;
> + head = xchg(&krwp->head_free, NULL);

Given that we hold the lock, why the xchg()?  Alternatively, why not
just acquire the lock in the other places you use xchg()?  This is a
per-CPU lock, so contention should not be a problem, should it?

>   spin_unlock_irqrestore(&krcp->lock, flags);
>  
>   /*
> @@ -2758,19 +2769,28 @@ static void kfree_rcu_work(struct work_struct *work)
>   */
>  static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
>  {
> + int i = 0;
> + struct kfree_rcu_work *krwp = NULL;
> +
>   lockdep_assert_held(&krcp->lock);
> + while (i < KFREE_N_BATCHES) {
> + if (!krcp->krw_arr[i].head_free) {
> + krwp = &(krcp->krw_arr[i]);
> + break;
> + }
> + i++;
> + }
>  
> - /* If a previous RCU batch work is already in progress, we cannot queue
> + /* If both RCU batches are already in progress, we cannot queue
>* another one, just refuse the optimization and it will be retried
>* again in KFREE_DRAIN_JIFFIES time.
>*/

If you are going to remove the traditional first "/*" line of a comment,
why not go all the way and cut the last one as well?  "//".

> - if (krcp->head_free)
> + if (!krwp)
>   return false;
>  
> - krcp->head_free = krcp->head;
> - krcp->head = NULL;
> - INIT_RCU_WORK(&krcp->r

Re: [PATCH 0/5] kfree_rcu() additions for -rcu

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 04:34:58PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 01:28:08PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 27, 2019 at 03:01:54PM -0400, Joel Fernandes (Google) wrote:
> > > Hi,
> > > 
> > > This is a series on top of the patch "rcu/tree: Add basic support for 
> > > kfree_rcu() batching".
> > > 
> > > Link: 
> > > http://lore.kernel.org/r/20190814160411.58591-1-j...@joelfernandes.org
> > > 
> > > It adds performance tests, some clean ups and removal of "lazy" RCU 
> > > callbacks.
> > > 
> > > Now that kfree_rcu() is handled separately from call_rcu(), we also get 
> > > rid of
> > > kfree "lazy" handling from tree RCU as suggested by Paul which will be 
> > > unused.
> > > This also results in a nice negative delta as well.
> > > 
> > > Joel Fernandes (Google) (5):
> > > rcu/rcuperf: Add kfree_rcu() performance Tests
> > > rcu/tree: Add multiple in-flight batches of kfree_rcu work
> > > rcu/tree: Add support for debug_objects debugging for kfree_rcu()
> > > rcu: Remove kfree_rcu() special casing and lazy handling
> > > rcu: Remove kfree_call_rcu_nobatch()
> > > 
> > > Documentation/RCU/stallwarn.txt   |  13 +-
> > > .../admin-guide/kernel-parameters.txt |  13 ++
> > > include/linux/rcu_segcblist.h |   2 -
> > > include/linux/rcutiny.h   |   5 -
> > > include/linux/rcutree.h   |   1 -
> > > include/trace/events/rcu.h|  32 ++--
> > > kernel/rcu/rcu.h  |  27 ---
> > > kernel/rcu/rcu_segcblist.c|  25 +--
> > > kernel/rcu/rcu_segcblist.h|  25 +--
> > > kernel/rcu/rcuperf.c  | 173 +-
> > > kernel/rcu/srcutree.c |   4 +-
> > > kernel/rcu/tiny.c |  29 ++-
> > > kernel/rcu/tree.c | 145 ++-
> > > kernel/rcu/tree.h |   1 -
> > > kernel/rcu/tree_plugin.h  |  42 +
> > > kernel/rcu/tree_stall.h   |   6 +-
> > > 16 files changed, 337 insertions(+), 206 deletions(-)
> > 
> > Looks like a 131-line positive delta to me.  ;-)
> 
> Not if you overlook the rcuperf changes which is just test code. :-D ;-)

Which suggests that you should move the "nice negative delta" comment
to the commits that actually have nice negative deltas.  ;-)

Thanx, Paul


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Mark Brown
On Wed, Aug 28, 2019 at 09:59:16PM +0200, Uwe Kleine-König wrote:
> On 8/28/19 1:32 PM, Petr Mladek wrote:

> > Please, do the opposite. Add conversion of few subsystems into the
> > patchset and add more people into CC. We will see immediately whether
> > it makes sense to spend time on this.

> For now I asked in the arm linux irc channel and got two people replying
> (both added to Cc:):

> Mark Brown (maintainer of SPI, regmap, ASoC and regulator) said:

> 1567019926 < broonie> ukleinek: I think that's a great idea and have
>   thought about trying to implement it in the past.
> 1567019937 < broonie> ukleinek: Making the logs more directly readable
>   is enormously helpful.

In the past I've actually implemented error code decoders like this for
some other systems I've worked on due to annoyance with having to look
up codes, as far as I can tell they went over quite well with other
developers and I certainly found them helpful myself.  They're also
useful to end users looking at logs, they're not always aware of how to
find the relevant error code definitions so a slightly more descriptive
messages can help people figure things out.

It does also occur to me that this might be useful to the people who
want to work on making probe deferral quieter since it gives them an
annotation that the number going in is an error code.  There's a bunch
more work to do there though.


signature.asc
Description: PGP signature


Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > The dynticks_nmi_nesting counter serves 4 purposes:
> > 
> >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> >   interrupt nesting level.
> > 
> >   (b) We need to detect half-interrupts till we are sure they're not an
> >   issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 
> > 0.
> > 
> >   (c) When a quiescent state report is needed from a nohz_full CPU.
> >   The nesting counter detects we are a first level interrupt.
> > 
> > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > 
> > For (b), this warning condition has not occurred for several kernel
> > releases.  But we still keep the warning but change it to use
> > in_interrupt() instead of the nesting counter. In a later year, we can
> > remove the warning.
> > 
> > For (c), the nest check is not really necessary since forced_tick would
> > have been set to true in the outermost interrupt, so the nested/NMI
> > interrupts will check forced_tick anyway, and bail.
> 
> Skipping the commit log and documentation for this pass.
[snip] 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 255cd6835526..1465a3e406f8 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -81,7 +81,6 @@
> >  
> >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > .dynticks_nesting = 1,
> > -   .dynticks_nmi_nesting = 0,
> 
> This should be in the previous patch, give or take naming.

Done.

> > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> >  };
> >  struct rcu_state rcu_state = {
> > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > /* Check for counter underflows */
> > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> >  "RCU dynticks_nesting counter underflow!");
> > -   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > -"RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> > -   /* Are we at first interrupt nesting level? */
> > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > -   return false;
> > -
> > -   /* Does CPU appear to be idle from an RCU standpoint? */
> > -   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > +   /* Are we the outermost interrupt that arrived when RCU was idle? */
> > +   return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> >  }
> >  
> >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch 
> > ... */
> > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >  
> > /* Entering usermode/idle from interrupt is not handled. These would
> > -* mean usermode upcalls or idle entry happened from interrupts. But,
> > -* reset the counter if we warn.
> > +* mean usermode upcalls or idle exit happened from interrupts. Remove
> > +* the warning by 2020.
> >  */
> > -   if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > -   WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > +   WARN_ON_ONCE(in_interrupt());
> 
> And this is a red flag.  Bad things happen should some common code
> that disables BH be invoked from the idle loop.  This might not be
> happening now, but we need to avoid this sort of constraint.
> How about instead merging ->dyntick_nesting into the low-order bits
> of ->dyntick_nmi_nesting?
> 
> Yes, this assumes that we don't enter process level twice, but it should
> be easy to add a WARN_ON() to test for that.  Except that we don't have
> to because there is already this near the end of rcu_eqs_exit():
> 
>   WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> 
> So the low-order bit of the combined counter could indicate process-level
> non-idle, the next three bits could be unused to make interpretation
> of hex printouts easier, and then the rest of the bits could be used in
> the same way as currently.
> 
> This would allow a single read to see the full state, so that 0x1 means
> at process level in the kernel, 0x11 is interrupt (or NMI) from process
> level, 0x10 is interrupt/NMI from idle/user, and so on.
> 
> What am I missing here?  Why wouldn't this work, and without adding yet
> another RCU-imposed constraint on some other subsystem?

What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
address your concern?

Also, considering this warning condition is most likely never occurring as we
know it, and we are considering deleting it soon enough, is it really worth
reimplementing the whole mechanism with a complex bit-sharing scheme just
because of the BH-disable condition you mentioned, which likely doesn't
happen today? In my implementation, this is just a simple counter. I fe

Re: [PATCH 1/5] rcu/rcuperf: Add kfree_rcu() performance Tests

2019-08-28 Thread Paul E. McKenney
On Tue, Aug 27, 2019 at 03:01:55PM -0400, Joel Fernandes (Google) wrote:
> This test runs kfree_rcu() in a loop to measure performance of the new
> kfree_rcu() batching functionality.
> 
> The following table shows results when booting with arguments:
> rcuperf.kfree_loops=2 rcuperf.kfree_alloc_num=8000 
> rcuperf.kfree_rcu_test=1
> 
> In addition, rcuperf.kfree_no_batch is used to toggle the batching of
> kfree_rcu()s for a test run.
> 
> patch applied GPs time (seconds)
>  yes  173214.5
>  no   913311.5

This is really "rcuperf.kfree_no_batch" rather than "patch applied", right?
(Yes, we did discuss this last time around, but this table combined with
the prior paragraph is still ambiguous.)  Please make it unambiguous.
One way to do that is as follows:



The following table shows results when booting with arguments:
rcuperf.kfree_loops=2 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 
 rcuperf.kfree_no_batch=X

rcuperf.kfree_no_batch=X# Grace Periods Test Duration (s)
 X=1 (old behavior)  9133 11.5
 X=0 (new behavior)  1732 14.5



> On a 16 CPU system with the above boot parameters, we see that the total
> number of grace periods that elapse during the test drops from 9133 when
> not batching to 1732 when batching (a 5X improvement). The kfree_rcu()
> flood itself slows down a bit when batching, though, as shown.

This last sentence would be more clear as something like: "However,
use of batching increases the duration of the kfree_rcu()-flood test."

> Note that the active memory consumption during the kfree_rcu() flood
> does increase to around 200-250MB due to the batching (from around 50MB
> without batching). However, this memory consumption is relatively
> constant. In other words, the system is able to keep up with the
> kfree_rcu() load. The memory consumption comes down considerably if
> KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80.

That would be a decrease rather than an increase in KFREE_DRAIN_JIFFIES,
correct?

This would also be a good place to mention that a later patch will
decrease consumption, but that is strictly optional.  However, you did
introduce the topic of changing KFREE_DRAIN_JIFFIES, so if a later patch
changes this value, this would be an excellent place to mention this.

> Also, when running the test, please disable CONFIG_DEBUG_PREEMPT and
> CONFIG_PROVE_RCU for realistic comparisons with/without batching.
> 
> Signed-off-by: Joel Fernandes (Google) 

And the code is much better!  Just one duplication-avoidance nit below.
Plus a thought for a future patch.

Thanx, Paul

> ---
>  .../admin-guide/kernel-parameters.txt |  17 ++
>  kernel/rcu/rcuperf.c  | 181 +-
>  2 files changed, 190 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 79b983bedcaa..24fe8aefb12c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3896,6 +3896,23 @@
>   test until boot completes in order to avoid
>   interference.
>  
> + rcuperf.kfree_rcu_test= [KNL]
> + Set to measure performance of kfree_rcu() flooding.
> +
> + rcuperf.kfree_nthreads= [KNL]
> + The number of threads running loops of kfree_rcu().
> +
> + rcuperf.kfree_alloc_num= [KNL]
> + Number of allocations and frees done in an iteration.
> +
> + rcuperf.kfree_loops= [KNL]
> + Number of loops doing rcuperf.kfree_alloc_num number
> + of allocations and frees.
> +
> + rcuperf.kfree_no_batch= [KNL]
> + Use the non-batching (less efficient) version of 
> kfree_rcu().
> + This is useful for comparing with the batched version.
> +
>   rcuperf.nreaders= [KNL]
>   Set number of RCU readers.  The value -1 selects
>   N, where N is the number of CPUs.  A value
> diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
> index 5f884d560384..c1e25fd10f2a 100644
> --- a/kernel/rcu/rcuperf.c
> +++ b/kernel/rcu/rcuperf.c
> @@ -86,6 +86,7 @@ torture_param(bool, shutdown, RCUPERF_SHUTDOWN,
> "Shutdown at end of performance tests.");
>  torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
>  torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to 
> disable");
> +torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?");
>  
>  static char *perf_type = "rcu";
>  module_param(perf_type, charp, 0444);

Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > 
> > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > >   interrupt nesting level.
> > > 
> > >   (b) We need to detect half-interrupts till we are sure they're not 
> > > an
> > >   issue. However, change the comparison to DYNTICK_IRQ_NONIDLE 
> > > with 0.
> > > 
> > >   (c) When a quiescent state report is needed from a nohz_full CPU.
> > >   The nesting counter detects we are a first level interrupt.
> > > 
> > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > 
> > > For (b), this warning condition has not occurred for several kernel
> > > releases.  But we still keep the warning but change it to use
> > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > remove the warning.
> > > 
> > > For (c), the nest check is not really necessary since forced_tick would
> > > have been set to true in the outermost interrupt, so the nested/NMI
> > > interrupts will check forced_tick anyway, and bail.
> > 
> > Skipping the commit log and documentation for this pass.
> [snip] 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 255cd6835526..1465a3e406f8 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -81,7 +81,6 @@
> > >  
> > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > >   .dynticks_nesting = 1,
> > > - .dynticks_nmi_nesting = 0,
> > 
> > This should be in the previous patch, give or take naming.
> 
> Done.
> 
> > >   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > >  };
> > >  struct rcu_state rcu_state = {
> > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > >   /* Check for counter underflows */
> > >   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > >"RCU dynticks_nesting counter underflow!");
> > > - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > -  "RCU dynticks_nmi_nesting counter underflow/zero!");
> > >  
> > > - /* Are we at first interrupt nesting level? */
> > > - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > - return false;
> > > -
> > > - /* Does CPU appear to be idle from an RCU standpoint? */
> > > - return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > + /* Are we the outermost interrupt that arrived when RCU was idle? */
> > > + return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > >  }
> > >  
> > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch 
> > > ... */
> > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > >   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > >  
> > >   /* Entering usermode/idle from interrupt is not handled. These would
> > > -  * mean usermode upcalls or idle entry happened from interrupts. But,
> > > -  * reset the counter if we warn.
> > > +  * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > +  * the warning by 2020.
> > >*/
> > > - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > + WARN_ON_ONCE(in_interrupt());
> > 
> > And this is a red flag.  Bad things happen should some common code
> > that disables BH be invoked from the idle loop.  This might not be
> > happening now, but we need to avoid this sort of constraint.
> > How about instead merging ->dyntick_nesting into the low-order bits
> > of ->dyntick_nmi_nesting?
> > 
> > Yes, this assumes that we don't enter process level twice, but it should
> > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > to because there is already this near the end of rcu_eqs_exit():
> > 
> > WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > 
> > So the low-order bit of the combined counter could indicate process-level
> > non-idle, the next three bits could be unused to make interpretation
> > of hex printouts easier, and then the rest of the bits could be used in
> > the same way as currently.
> > 
> > This would allow a single read to see the full state, so that 0x1 means
> > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > 
> > What am I missing here?  Why wouldn't this work, and without adding yet
> > another RCU-imposed constraint on some other subsystem?
> 
> What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> address your concern?
> 
> Also, considering this warning condition is most likely never occurring as we
> know it, and we are considering deleting it soon enough, is it

Re: [PATCH 0/5] kfree_rcu() additions for -rcu

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 01:46:24PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 04:34:58PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 01:28:08PM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 27, 2019 at 03:01:54PM -0400, Joel Fernandes (Google) wrote:
> > > > Hi,
> > > > 
> > > > This is a series on top of the patch "rcu/tree: Add basic support for 
> > > > kfree_rcu() batching".
> > > > 
> > > > Link: 
> > > > http://lore.kernel.org/r/20190814160411.58591-1-j...@joelfernandes.org
> > > > 
> > > > It adds performance tests, some clean ups and removal of "lazy" RCU 
> > > > callbacks.
> > > > 
> > > > Now that kfree_rcu() is handled separately from call_rcu(), we also get 
> > > > rid of
> > > > kfree "lazy" handling from tree RCU as suggested by Paul which will be 
> > > > unused.
> > > > This also results in a nice negative delta as well.
> > > > 
> > > > Joel Fernandes (Google) (5):
> > > > rcu/rcuperf: Add kfree_rcu() performance Tests
> > > > rcu/tree: Add multiple in-flight batches of kfree_rcu work
> > > > rcu/tree: Add support for debug_objects debugging for kfree_rcu()
> > > > rcu: Remove kfree_rcu() special casing and lazy handling
> > > > rcu: Remove kfree_call_rcu_nobatch()
> > > > 
> > > > Documentation/RCU/stallwarn.txt   |  13 +-
> > > > .../admin-guide/kernel-parameters.txt |  13 ++
> > > > include/linux/rcu_segcblist.h |   2 -
> > > > include/linux/rcutiny.h   |   5 -
> > > > include/linux/rcutree.h   |   1 -
> > > > include/trace/events/rcu.h|  32 ++--
> > > > kernel/rcu/rcu.h  |  27 ---
> > > > kernel/rcu/rcu_segcblist.c|  25 +--
> > > > kernel/rcu/rcu_segcblist.h|  25 +--
> > > > kernel/rcu/rcuperf.c  | 173 +-
> > > > kernel/rcu/srcutree.c |   4 +-
> > > > kernel/rcu/tiny.c |  29 ++-
> > > > kernel/rcu/tree.c | 145 ++-
> > > > kernel/rcu/tree.h |   1 -
> > > > kernel/rcu/tree_plugin.h  |  42 +
> > > > kernel/rcu/tree_stall.h   |   6 +-
> > > > 16 files changed, 337 insertions(+), 206 deletions(-)
> > > 
> > > Looks like a 131-line positive delta to me.  ;-)
> > 
> > Not if you overlook the rcuperf changes which is just test code. :-D ;-)
> 
> Which suggests that you should move the "nice negative delta" comment
> to the commits that actually have nice negative deltas.  ;-)

Will do!

thanks,

 - Joel



Re: [PATCH 3/5] rcu/tree: Add support for debug_objects debugging for kfree_rcu()

2019-08-28 Thread Paul E. McKenney
On Tue, Aug 27, 2019 at 03:01:57PM -0400, Joel Fernandes (Google) wrote:
> Make use of RCU's debug_objects debugging support
> (CONFIG_DEBUG_OBJECTS_RCU_HEAD) similar to call_rcu() and other flavors.

Other flavors?  Ah, call_srcu(), rcu_barrier(), and srcu_barrier(),
right?

> We queue the object during the kfree_rcu() call and dequeue it during
> reclaim.
> 
> Tested that enabling CONFIG_DEBUG_OBJECTS_RCU_HEAD successfully detects
> double kfree_rcu() calls.
> 
> Signed-off-by: Joel Fernandes (Google) 

The code looks good!

Thanx, Paul

> ---
>  kernel/rcu/tree.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9b9ae4db1c2d..64568f12641d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2757,6 +2757,7 @@ static void kfree_rcu_work(struct work_struct *work)
>   for (; head; head = next) {
>   next = head->next;
>   /* Could be possible to optimize with kfree_bulk in future */
> + debug_rcu_head_unqueue(head);
>   __rcu_reclaim(rcu_state.name, head);
>   cond_resched_tasks_rcu_qs();
>   }
> @@ -2868,6 +2869,13 @@ void kfree_call_rcu(struct rcu_head *head, 
> rcu_callback_t func)
>   if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING)
>   return kfree_call_rcu_nobatch(head, func);
>  
> + if (debug_rcu_head_queue(head)) {
> + /* Probable double kfree_rcu() */
> + WARN_ONCE(1, "kfree_call_rcu(): Double-freed call. rcu_head 
> %p\n",
> +   head);
> + return;
> + }
> +
>   head->func = func;
>  
>   local_irq_save(flags);  /* For safely calling this_cpu_ptr(). */
> -- 
> 2.23.0.187.g17f5b7556c-goog
> 


Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > 
> > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > >   interrupt nesting level.
> > > > 
> > > >   (b) We need to detect half-interrupts till we are sure they're 
> > > > not an
> > > >   issue. However, change the comparison to DYNTICK_IRQ_NONIDLE 
> > > > with 0.
> > > > 
> > > >   (c) When a quiescent state report is needed from a nohz_full CPU.
> > > >   The nesting counter detects we are a first level interrupt.
> > > > 
> > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > 
> > > > For (b), this warning condition has not occurred for several kernel
> > > > releases.  But we still keep the warning but change it to use
> > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > remove the warning.
> > > > 
> > > > For (c), the nest check is not really necessary since forced_tick would
> > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > interrupts will check forced_tick anyway, and bail.
> > > 
> > > Skipping the commit log and documentation for this pass.
> > [snip] 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 255cd6835526..1465a3e406f8 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -81,7 +81,6 @@
> > > >  
> > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > .dynticks_nesting = 1,
> > > > -   .dynticks_nmi_nesting = 0,
> > > 
> > > This should be in the previous patch, give or take naming.
> > 
> > Done.
> > 
> > > > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > >  };
> > > >  struct rcu_state rcu_state = {
> > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > /* Check for counter underflows */
> > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > >  "RCU dynticks_nesting counter underflow!");
> > > > -   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) 
> > > > <= 0,
> > > > -"RCU dynticks_nmi_nesting counter 
> > > > underflow/zero!");
> > > >  
> > > > -   /* Are we at first interrupt nesting level? */
> > > > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > -   return false;
> > > > -
> > > > -   /* Does CPU appear to be idle from an RCU standpoint? */
> > > > -   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > +   /* Are we the outermost interrupt that arrived when RCU was 
> > > > idle? */
> > > > +   return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > >  }
> > > >  
> > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > rcu_do_batch ... */
> > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > >  
> > > > /* Entering usermode/idle from interrupt is not handled. These 
> > > > would
> > > > -* mean usermode upcalls or idle entry happened from 
> > > > interrupts. But,
> > > > -* reset the counter if we warn.
> > > > +* mean usermode upcalls or idle exit happened from interrupts. 
> > > > Remove
> > > > +* the warning by 2020.
> > > >  */
> > > > -   if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > -   WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > +   WARN_ON_ONCE(in_interrupt());
> > > 
> > > And this is a red flag.  Bad things happen should some common code
> > > that disables BH be invoked from the idle loop.  This might not be
> > > happening now, but we need to avoid this sort of constraint.
> > > How about instead merging ->dyntick_nesting into the low-order bits
> > > of ->dyntick_nmi_nesting?
> > > 
> > > Yes, this assumes that we don't enter process level twice, but it should
> > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > to because there is already this near the end of rcu_eqs_exit():
> > > 
> > >   WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > 
> > > So the low-order bit of the combined counter could indicate process-level
> > > non-idle, the next three bits could be unused to make interpretation
> > > of hex printouts easier, and then the rest of the bits could be used in
> > > the same way as currently.
> > > 
> > > This would allow a single read to see the full state, so that 0x1 means
> > > at process level in the kernel, 0x11 is interrupt (or NMI) f

Re: [PATCH 3/5] rcu/tree: Add support for debug_objects debugging for kfree_rcu()

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 02:31:19PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 27, 2019 at 03:01:57PM -0400, Joel Fernandes (Google) wrote:
> > Make use of RCU's debug_objects debugging support
> > (CONFIG_DEBUG_OBJECTS_RCU_HEAD) similar to call_rcu() and other flavors.
> 
> Other flavors?  Ah, call_srcu(), rcu_barrier(), and srcu_barrier(),
> right?

Yes.

> > We queue the object during the kfree_rcu() call and dequeue it during
> > reclaim.
> > 
> > Tested that enabling CONFIG_DEBUG_OBJECTS_RCU_HEAD successfully detects
> > double kfree_rcu() calls.
> > 
> > Signed-off-by: Joel Fernandes (Google) 
> 
> The code looks good!

thanks, does that mean you'll ack/apply it? :-P

 - Joel

> 
>   Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 9b9ae4db1c2d..64568f12641d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2757,6 +2757,7 @@ static void kfree_rcu_work(struct work_struct *work)
> > for (; head; head = next) {
> > next = head->next;
> > /* Could be possible to optimize with kfree_bulk in future */
> > +   debug_rcu_head_unqueue(head);
> > __rcu_reclaim(rcu_state.name, head);
> > cond_resched_tasks_rcu_qs();
> > }
> > @@ -2868,6 +2869,13 @@ void kfree_call_rcu(struct rcu_head *head, 
> > rcu_callback_t func)
> > if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING)
> > return kfree_call_rcu_nobatch(head, func);
> >  
> > +   if (debug_rcu_head_queue(head)) {
> > +   /* Probable double kfree_rcu() */
> > +   WARN_ONCE(1, "kfree_call_rcu(): Double-freed call. rcu_head 
> > %p\n",
> > + head);
> > +   return;
> > +   }
> > +
> > head->func = func;
> >  
> > local_irq_save(flags);  /* For safely calling this_cpu_ptr(). */
> > -- 
> > 2.23.0.187.g17f5b7556c-goog
> > 


Re: [PATCH 4/5] rcu: Remove kfree_rcu() special casing and lazy handling

2019-08-28 Thread Paul E. McKenney
On Tue, Aug 27, 2019 at 03:01:58PM -0400, Joel Fernandes (Google) wrote:
> Remove kfree_rcu() special casing and lazy handling from RCU.
> For Tiny RCU we fold the special handling into just Tiny RCU code.
> 
> Suggested-by: Paul E. McKenney 
> Signed-off-by: Joel Fernandes (Google) 

Nice!  Several comments inline below.

Thanx, Paul

> ---
>  Documentation/RCU/stallwarn.txt | 13 +++--
>  include/linux/rcu_segcblist.h   |  2 --
>  include/trace/events/rcu.h  | 32 +-
>  kernel/rcu/rcu.h| 27 ---
>  kernel/rcu/rcu_segcblist.c  | 25 +++--
>  kernel/rcu/rcu_segcblist.h  | 25 ++---
>  kernel/rcu/srcutree.c   |  4 +--
>  kernel/rcu/tiny.c   | 29 +++-
>  kernel/rcu/tree.c   | 48 -
>  kernel/rcu/tree.h   |  1 -
>  kernel/rcu/tree_plugin.h| 42 +++--
>  kernel/rcu/tree_stall.h |  6 ++---
>  12 files changed, 97 insertions(+), 157 deletions(-)
> 
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index f48f4621ccbc..2c02b8de01e7 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -225,18 +225,12 @@ an estimate of the total number of RCU callbacks queued 
> across all CPUs
>  In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
>  for each CPU:
>  
> - 0: (64628 ticks this GP) idle=dd5/3fff/0 softirq=82/543 
> last_accelerate: a345/d342 Nonlazy posted: ..D
> + 0: (64628 ticks this GP) idle=dd5/3fff/0 softirq=82/543 
> last_accelerate: a345/d342 dyntick_enabled: 1
>  
>  The "last_accelerate:" prints the low-order 16 bits (in hex) of the
>  jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
>  from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
> -rcu_prepare_for_idle().  The "Nonlazy posted:" indicates lazy-callback
> -status, so that an "l" indicates that all callbacks were lazy at the start
> -of the last idle period and an "L" indicates that there are currently
> -no non-lazy callbacks (in both cases, "." is printed otherwise, as
> -shown above) and "D" indicates that dyntick-idle processing is enabled
> -("." is printed otherwise, for example, if disabled via the "nohz="
> -kernel boot parameter).
> +rcu_prepare_for_idle().
> 
>  If the grace period ends just as the stall warning starts printing,
>  there will be a spurious stall-warning message, which will include
> @@ -249,7 +243,8 @@ possible for a zero-jiffy stall to be flagged in this 
> case, depending
>  on how the stall warning and the grace-period initialization happen to
>  interact.  Please note that it is not possible to entirely eliminate this
>  sort of false positive without resorting to things like stop_machine(),
> -which is overkill for this sort of problem.
> +which is overkill for this sort of problem. "dyntick_enabled: 1" indicates 
> that
> +dyntick-idle processing is enabled.

OK, I'll bite...  Why not add the sentence to the earlier paragraph that
used to discuss its "D" counterpart?

>  If all CPUs and tasks have passed through quiescent states, but the
>  grace period has nevertheless failed to end, the stall-warning splat
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index 646759042333..b36afe7b22c9 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -22,7 +22,6 @@ struct rcu_cblist {
>   struct rcu_head *head;
>   struct rcu_head **tail;
>   long len;
> - long len_lazy;
>  };
>  
>  #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
> @@ -73,7 +72,6 @@ struct rcu_segcblist {
>  #else
>   long len;
>  #endif
> - long len_lazy;
>   u8 enabled;
>   u8 offloaded;
>  };
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 694bd040cf51..0dd3478597ee 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -474,16 +474,14 @@ TRACE_EVENT_RCU(rcu_dyntick,
>   */
>  TRACE_EVENT_RCU(rcu_callback,
>  
> - TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy,
> -  long qlen),
> + TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen),
>  
> - TP_ARGS(rcuname, rhp, qlen_lazy, qlen),
> + TP_ARGS(rcuname, rhp, qlen),
>  
>   TP_STRUCT__entry(
>   __field(const char *, rcuname)
>   __field(void *, rhp)
>   __field(void *, func)
> - __field(long, qlen_lazy)
>   __field(long, qlen)
>   ),
>  
> @@ -491,13 +489,12 @@ TRACE_EVENT_RCU(rcu_callback,
>   __entry->rcuname = rcuname;
>   __entry->rhp = rhp;
>   __entry->func = rhp->func;
> - __entry->qlen_lazy = qlen_lazy;
>   __entry->qlen = qlen;
>   

Re: [RFC v1 1/2] rcu/tree: Clean up dynticks counter usage

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 01:13:44PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 26, 2019 at 09:33:53PM -0400, Joel Fernandes (Google) wrote:
> > The dynticks counter are confusing due to crowbar writes of
> > DYNTICK_IRQ_NONIDLE whose purpose is to detect half-interrupts (i.e. we
> > see rcu_irq_enter() but not rcu_irq_exit() due to a usermode upcall) and
> > if so then do a reset of the dyntick_nmi_nesting counters. This patch
> > tries to get rid of DYNTICK_IRQ_NONIDLE while still keeping the code
> > working, fully functional, and less confusing. The confusion recently
> > has even led to patches forgetting that DYNTICK_IRQ_NONIDLE was written
> > to which wasted lots of time.
> > 
> > The patch has the following changes:
[snip]
> >  /*
> >   * Grace-period counter management.
> >   */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 68ebf0eb64c8..255cd6835526 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -81,7 +81,7 @@
> >  
> >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > .dynticks_nesting = 1,
> > -   .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
> > +   .dynticks_nmi_nesting = 0,
> 
> C initializes to zero by default, so this can simply be deleted.

Fixed.

> > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> >  };
> >  struct rcu_state rcu_state = {
> > @@ -558,17 +558,18 @@ EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
> >  /*
> >   * Enter an RCU extended quiescent state, which can be either the
> >   * idle loop or adaptive-tickless usermode execution.
> > - *
> > - * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
> > - * the possibility of usermode upcalls having messed up our count
> > - * of interrupt nesting level during the prior busy period.
> >   */
> >  static void rcu_eqs_enter(bool user)
> >  {
> > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >  
> > -   WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> > -   WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > +   /* Entering usermode/idle from interrupt is not handled. These would
> > +* mean usermode upcalls or idle entry happened from interrupts. But,
> > +* reset the counter if we warn.
> > +*/
> 
> Please either put the "/*" on its own line or use "//"-style comments.

I'll put "/*" on its own line.

> > WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
> > WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
> >  
> > +   WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> > +  rdp->dynticks_nmi_nesting - 1);
> 
> This is problematic.  The +/-1 and +/-2 dance is specifically for NMIs, so...

This counter is deleted in the following patch so I hope its Ok to leave it
here for this one. I just kept it split into different patch to make
testing/review/development easier.

> > if (irq)
> > rcu_prepare_for_idle();
> > @@ -723,10 +728,6 @@ void rcu_irq_exit_irqson(void)
> >  /*
> >   * Exit an RCU extended quiescent state, which can be either the
> >   * idle loop or adaptive-tickless usermode execution.
> > - *
> > - * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
> > - * allow for the possibility of usermode upcalls messing up our count of
> > - * interrupt nesting level during the busy period that is just now 
> > starting.
> >   */
> >  static void rcu_eqs_exit(bool user)
> >  {
> > @@ -747,8 +748,13 @@ static void rcu_eqs_exit(bool user)
> > trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks);
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
> > !is_idle_task(current));
> > WRITE_ONCE(rdp->dynticks_nesting, 1);
> > -   WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > -   WRITE_ONCE(rdp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
> > +
> > +   /* Exiting usermode/idle from interrupt is not handled. These would
> > +* mean usermode upcalls or idle exit happened from interrupts. But,
> > +* reset the counter if we warn.
> > +*/
> > +   if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > +   WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> 
> And here.  Plus this is adding a test and branch in the common case.
> Given that the location being written to should be hot in the cache,
> it is not clear that this is a win.

The next patch removes the branch itself and just has the warning.

> > WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> >  
> > /*
> > @@ -826,16 +833,21 @@ static __always_inline void rcu_nmi_enter_common(bool 
> > irq)
> >  
> > incby = 1;
> > } else if (tick_nohz_full_cpu(rdp->cpu) &&
> > -  rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
> > -  rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > +  !rdp->dynticks_nmi_nesting && rdp->rcu_urgent_qs &&
> > +  !rdp->rcu_forced_tick) {
> 
> OK.  Though you should be able to save a line by pulling the
> "rdp->rcu_urgent_qs &&" onto the first line.

Fixed.

> >

Re: [PATCH 5/5] rcu: Remove kfree_call_rcu_nobatch()

2019-08-28 Thread Paul E. McKenney
On Tue, Aug 27, 2019 at 03:01:59PM -0400, Joel Fernandes (Google) wrote:
> Now that kfree_rcu() special casing have been removed from tree RCU,
> remove kfree_call_rcu_nobatch() since it is not needed.
> 
> Signed-off-by: Joel Fernandes (Google) 

Now -this- one qualifies as a nice negative delta!  ;-)

A few things below, please fix in next version.

Thanx, Paul

> ---
>  .../admin-guide/kernel-parameters.txt |  4 ---
>  include/linux/rcutiny.h   |  5 ---
>  include/linux/rcutree.h   |  1 -
>  kernel/rcu/rcuperf.c  | 10 +-
>  kernel/rcu/tree.c | 33 ---
>  5 files changed, 14 insertions(+), 39 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 24fe8aefb12c..56be0e30100b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3909,10 +3909,6 @@
>   Number of loops doing rcuperf.kfree_alloc_num number
>   of allocations and frees.
>  
> - rcuperf.kfree_no_batch= [KNL]
> - Use the non-batching (less efficient) version of 
> kfree_rcu().
> - This is useful for comparing with the batched version.
> -
>   rcuperf.nreaders= [KNL]
>   Set number of RCU readers.  The value -1 selects
>   N, where N is the number of CPUs.  A value
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 949841f52ec5..7aa93afa5d8d 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -39,11 +39,6 @@ static inline void kfree_call_rcu(struct rcu_head *head, 
> rcu_callback_t func)
>   call_rcu(head, func);
>  }
>  
> -static inline void kfree_call_rcu_nobatch(struct rcu_head *head, 
> rcu_callback_t func)
> -{
> - call_rcu(head, func);
> -}
> -
>  void rcu_qs(void);
>  
>  static inline void rcu_softirq_qs(void)
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 961b7e05d141..0b68aa952f8b 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -34,7 +34,6 @@ static inline void rcu_virt_note_context_switch(int cpu)
>  
>  void synchronize_rcu_expedited(void);
>  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> -void kfree_call_rcu_nobatch(struct rcu_head *head, rcu_callback_t func);
>  
>  void rcu_barrier(void);
>  bool rcu_eqs_special_set(int cpu);
> diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
> index c1e25fd10f2a..da94b89cd531 100644
> --- a/kernel/rcu/rcuperf.c
> +++ b/kernel/rcu/rcuperf.c
> @@ -593,7 +593,6 @@ rcu_perf_shutdown(void *arg)
>  torture_param(int, kfree_nthreads, -1, "Number of threads running loops of 
> kfree_rcu().");
>  torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees 
> done in an iteration.");
>  torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num 
> allocations and frees.");
> -torture_param(int, kfree_no_batch, 0, "Use the non-batching (slower) version 
> of kfree_rcu().");
>  
>  static struct task_struct **kfree_reader_tasks;
>  static int kfree_nrealthreads;
> @@ -632,14 +631,7 @@ kfree_perf_thread(void *arg)
>   if (!alloc_ptr)
>   return -ENOMEM;
>  
> - if (!kfree_no_batch) {
> - kfree_rcu(alloc_ptr, rh);
> - } else {
> - rcu_callback_t cb;
> -
> - cb = (rcu_callback_t)(unsigned 
> long)offsetof(struct kfree_obj, rh);
> - kfree_call_rcu_nobatch(&(alloc_ptr->rh), cb);
> - }
> + kfree_rcu(alloc_ptr, rh);
>   }
>  
>   cond_resched();
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 12c17e10f2b4..c767973d62ac 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2777,8 +2777,10 @@ static void kfree_rcu_work(struct work_struct *work)
>   rcu_lock_acquire(&rcu_callback_map);
>   trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
>  
> - /* Could be possible to optimize with kfree_bulk in future */
> - kfree((void *)head - offset);
> + if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) {
> + /* Could be optimized with kfree_bulk() in future. */
> + kfree((void *)head - offset);
> + }

This really needs to be in the previous patch until such time as Tiny RCU
no longer needs the restriction.

>   rcu_lock_release(&rcu_callback_map);
>   cond_resched_tasks_rcu_qs();
> @@ -2856,16 +2858,6 @@ static void kfree_rcu_monitor(struct work_struct *work)
>   spin_unl

Re: [PATCH 3/5] rcu/tree: Add support for debug_objects debugging for kfree_rcu()

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 05:43:20PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 02:31:19PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 27, 2019 at 03:01:57PM -0400, Joel Fernandes (Google) wrote:
> > > Make use of RCU's debug_objects debugging support
> > > (CONFIG_DEBUG_OBJECTS_RCU_HEAD) similar to call_rcu() and other flavors.
> > 
> > Other flavors?  Ah, call_srcu(), rcu_barrier(), and srcu_barrier(),
> > right?
> 
> Yes.
> 
> > > We queue the object during the kfree_rcu() call and dequeue it during
> > > reclaim.
> > > 
> > > Tested that enabling CONFIG_DEBUG_OBJECTS_RCU_HEAD successfully detects
> > > double kfree_rcu() calls.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) 
> > 
> > The code looks good!
> 
> thanks, does that mean you'll ack/apply it? :-P

Is it independent of 1/5 and 2/5?

Thanx, Paul

>  - Joel
> 
> > 
> > Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/tree.c | 8 
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 9b9ae4db1c2d..64568f12641d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2757,6 +2757,7 @@ static void kfree_rcu_work(struct work_struct *work)
> > >   for (; head; head = next) {
> > >   next = head->next;
> > >   /* Could be possible to optimize with kfree_bulk in future */
> > > + debug_rcu_head_unqueue(head);
> > >   __rcu_reclaim(rcu_state.name, head);
> > >   cond_resched_tasks_rcu_qs();
> > >   }
> > > @@ -2868,6 +2869,13 @@ void kfree_call_rcu(struct rcu_head *head, 
> > > rcu_callback_t func)
> > >   if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING)
> > >   return kfree_call_rcu_nobatch(head, func);
> > >  
> > > + if (debug_rcu_head_queue(head)) {
> > > + /* Probable double kfree_rcu() */
> > > + WARN_ONCE(1, "kfree_call_rcu(): Double-freed call. rcu_head 
> > > %p\n",
> > > +   head);
> > > + return;
> > > + }
> > > +
> > >   head->func = func;
> > >  
> > >   local_irq_save(flags);  /* For safely calling this_cpu_ptr(). */
> > > -- 
> > > 2.23.0.187.g17f5b7556c-goog
> > > 


Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > 
> > > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect 
> > > > > first
> > > > >   interrupt nesting level.
> > > > > 
> > > > >   (b) We need to detect half-interrupts till we are sure they're 
> > > > > not an
> > > > >   issue. However, change the comparison to 
> > > > > DYNTICK_IRQ_NONIDLE with 0.
> > > > > 
> > > > >   (c) When a quiescent state report is needed from a nohz_full 
> > > > > CPU.
> > > > >   The nesting counter detects we are a first level interrupt.
> > > > > 
> > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only 
> > > > > the
> > > > > outermost interrupt that interrupted an RCU-idle state can set it to 
> > > > > 1.
> > > > > 
> > > > > For (b), this warning condition has not occurred for several kernel
> > > > > releases.  But we still keep the warning but change it to use
> > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > remove the warning.
> > > > > 
> > > > > For (c), the nest check is not really necessary since forced_tick 
> > > > > would
> > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > interrupts will check forced_tick anyway, and bail.
> > > > 
> > > > Skipping the commit log and documentation for this pass.
> > > [snip] 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -81,7 +81,6 @@
> > > > >  
> > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > >   .dynticks_nesting = 1,
> > > > > - .dynticks_nmi_nesting = 0,
> > > > 
> > > > This should be in the previous patch, give or take naming.
> > > 
> > > Done.
> > > 
> > > > >   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > >  };
> > > > >  struct rcu_state rcu_state = {
> > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > >   /* Check for counter underflows */
> > > > >   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > >"RCU dynticks_nesting counter underflow!");
> > > > > - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) 
> > > > > <= 0,
> > > > > -  "RCU dynticks_nmi_nesting counter 
> > > > > underflow/zero!");
> > > > >  
> > > > > - /* Are we at first interrupt nesting level? */
> > > > > - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > - return false;
> > > > > -
> > > > > - /* Does CPU appear to be idle from an RCU standpoint? */
> > > > > - return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > + /* Are we the outermost interrupt that arrived when RCU was 
> > > > > idle? */
> > > > > + return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > >  }
> > > > >  
> > > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > > rcu_do_batch ... */
> > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > >   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > >  
> > > > >   /* Entering usermode/idle from interrupt is not handled. These 
> > > > > would
> > > > > -  * mean usermode upcalls or idle entry happened from 
> > > > > interrupts. But,
> > > > > -  * reset the counter if we warn.
> > > > > +  * mean usermode upcalls or idle exit happened from interrupts. 
> > > > > Remove
> > > > > +  * the warning by 2020.
> > > > >*/
> > > > > - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > + WARN_ON_ONCE(in_interrupt());
> > > > 
> > > > And this is a red flag.  Bad things happen should some common code
> > > > that disables BH be invoked from the idle loop.  This might not be
> > > > happening now, but we need to avoid this sort of constraint.
> > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > of ->dyntick_nmi_nesting?
> > > > 
> > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > to because there is already this near the end of rcu_eqs_exit():
> > > > 
> > > > WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > 
> > > > So the low-order bit of the combined counter could indicate 
> > > > process-level
> > > > non-idle, the next three bits could be unused t

Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Pavel Machek
On Wed 2019-08-28 16:15:06, Thomas Gleixner wrote:
> On Wed, 28 Aug 2019, Pavel Machek wrote:
> > On Wed 2019-08-28 14:46:21, Borislav Petkov wrote:
> > > On Wed, Aug 28, 2019 at 02:29:13PM +0200, Pavel Machek wrote:
> > > > This is not a way to have an inteligent conversation.
> > > 
> > > No, this *is* the way to keep the conversation sane, without veering
> > > off into some absurd claims.
> > > 
> > > So, to cut to the chase: you can simply add "rdrand=force" to your
> > > cmdline parameters and get back to using RDRAND.
> > > 
> > > And yet if you still feel this fix does not meet your expectations,
> > > you were told already to either produce patches or who to contact. I'm
> > > afraid complaining on this thread won't get you anywhere but that's your
> > > call.
> > 
> > No, this does not meet my expectations, it violates stable kernel
> > rules, and will cause regression to some users, while better solution
> > is known to be available.
> 
> Your unqualified ranting does not meet my expectation either and it
> violates any rule of common sense.
> 
> For the record:
> 
>   Neither AMD nor we have any idea which particular machines have a fixed
>   BIOS and which have not. There is no technical indicator either at boot
>   time as the wreckage manifests itself only after resume.
> 
>   So in the interest of users the only sensible decision is to disable
>   RDRAND for this class of CPUs.

No.

Obviously best solution would be microcode update to fix the problem,
or to enable kernel to fix the problem.

>   If you have a list of machines which have a fixed BIOS, then provide it
>   in form of patches. If not then stop claiming that there is a better
>   solution available.

And yes, whitelist would be already better than present solution. It is
not my duty to submit fixes to your proposed patch.

> Anyway, I'm done with that and further rants of yours go directly to
> /dev/null.
> 
> Thanks for wasting everyones time

Thanks for your profesional attitude.

Pavel
-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) 
> > > > > wrote:
> > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > 
> > > > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect 
> > > > > > first
> > > > > >   interrupt nesting level.
> > > > > > 
> > > > > >   (b) We need to detect half-interrupts till we are sure 
> > > > > > they're not an
> > > > > >   issue. However, change the comparison to 
> > > > > > DYNTICK_IRQ_NONIDLE with 0.
> > > > > > 
> > > > > >   (c) When a quiescent state report is needed from a nohz_full 
> > > > > > CPU.
> > > > > >   The nesting counter detects we are a first level 
> > > > > > interrupt.
> > > > > > 
> > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. 
> > > > > > Only the
> > > > > > outermost interrupt that interrupted an RCU-idle state can set it 
> > > > > > to 1.
> > > > > > 
> > > > > > For (b), this warning condition has not occurred for several kernel
> > > > > > releases.  But we still keep the warning but change it to use
> > > > > > in_interrupt() instead of the nesting counter. In a later year, we 
> > > > > > can
> > > > > > remove the warning.
> > > > > > 
> > > > > > For (c), the nest check is not really necessary since forced_tick 
> > > > > > would
> > > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > 
> > > > > Skipping the commit log and documentation for this pass.
> > > > [snip] 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -81,7 +81,6 @@
> > > > > >  
> > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > > > .dynticks_nesting = 1,
> > > > > > -   .dynticks_nmi_nesting = 0,
> > > > > 
> > > > > This should be in the previous patch, give or take naming.
> > > > 
> > > > Done.
> > > > 
> > > > > > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > >  };
> > > > > >  struct rcu_state rcu_state = {
> > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > /* Check for counter underflows */
> > > > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > >  "RCU dynticks_nesting counter underflow!");
> > > > > > -   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) 
> > > > > > <= 0,
> > > > > > -"RCU dynticks_nmi_nesting counter 
> > > > > > underflow/zero!");
> > > > > >  
> > > > > > -   /* Are we at first interrupt nesting level? */
> > > > > > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > -   return false;
> > > > > > -
> > > > > > -   /* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > -   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > +   /* Are we the outermost interrupt that arrived when RCU was 
> > > > > > idle? */
> > > > > > +   return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > >  }
> > > > > >  
> > > > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > > > rcu_do_batch ... */
> > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > >  
> > > > > > /* Entering usermode/idle from interrupt is not handled. These 
> > > > > > would
> > > > > > -* mean usermode upcalls or idle entry happened from 
> > > > > > interrupts. But,
> > > > > > -* reset the counter if we warn.
> > > > > > +* mean usermode upcalls or idle exit happened from interrupts. 
> > > > > > Remove
> > > > > > +* the warning by 2020.
> > > > > >  */
> > > > > > -   if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > -   WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > +   WARN_ON_ONCE(in_interrupt());
> > > > > 
> > > > > And this is a red flag.  Bad things happen should some common code
> > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > happening now, but we need to avoid this sort of constraint.
> > > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > > of ->dyntick_nmi_nesting?
> > > > > 
> > > > > Yes, this assumes that we don't enter process level twice, but it 
> > > > > should
> > > > > be easy to add a WARN_ON() to test for that.  Except that we don't 
> > > > > have
> > > > > to becaus

Re: [patch] Fix up l1ft documentation was Re: Taking a break - time to look back

2019-08-28 Thread Pavel Machek
Hi!

> On Tue, 12 Mar 2019, Pavel Machek wrote:
> > On Mon 2019-03-11 23:31:08, Thomas Gleixner wrote:
> > > Calling this a lie is a completly unjustified personal attack on those who
> > 
> > So how should it be called? I initally used less strong words, only to
> > get "Care to tell what's a lie instead of making bold statements?"
> > back. Also look at the timing of the thread.
> 
> You called it a lie from the very beginning or what do you think made me
> tell you that? Here is what you said:

Actually, I still call it a lie. Document clearly says that bug is
fixed in non-virtualized cases, when in fact it depends on PAE and
limited memory.

> If you want to provide more accurate documentation then you better come up
> with something which is helpful instead of completely useless blurb like
> the below:

At this point I want you to fix it yourself. Lying about security bugs
being fixed when they are not is not cool. I tried to be helpful and
submit a patch, but I don't feel like you are cooperating on getting
the patch applied.

> > +   Mitigation is present in kernels v4.19 and newer, and in
> > +   recent -stable kernels. PAE needs to be enabled for mitigation to
> > +   work.
> 
> No. The mitigation is available when the kernel provides it. Numbers are
> irrelevant because that documentation has to be applicable for stable
> kernels as well. And what is a recent -stable kernel?
> 
> Also the PAE part needs to go to a completely different section.

Best regards,
Pavel


-- 
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: Digital signature


Re: [PATCH] rcu/dyntick-idle: Add better tracing

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 02:26:13PM -0400, Joel Fernandes (Google) wrote:
> The dyntick-idle traces are a bit confusing. This patch makes it simpler
> and adds some missing cases such as EQS-enter because user vs idle mode.
> 
> Following are the changes:
> (1) Add a new context field to trace_rcu_dyntick tracepoint. This
> context field can be "USER", "IDLE" or "IRQ".
> 
> (2) Remove the "++=" and "--=" strings and replace them with
>"StillNonIdle". This is much easier on the eyes, and the -- and ++
>are easily apparent in the dynticks_nesting counters we are printing
>anyway.
> 
> This patch is based on the previous patches to simplify rcu_dyntick
> counters [1] and with these traces, I have verified the counters are
> working properly.
> 
> [1]
> Link: https://lore.kernel.org/patchwork/patch/1120021/
> Link: https://lore.kernel.org/patchwork/patch/1120022/
> 
> Signed-off-by: Joel Fernandes (Google) 

Looks plausible to me.

Thanx, Paul

> ---
>  include/trace/events/rcu.h | 13 -
>  kernel/rcu/tree.c  | 17 +++--
>  2 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 66122602bd08..474c1f7e7104 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -449,12 +449,14 @@ TRACE_EVENT_RCU(rcu_fqs,
>   */
>  TRACE_EVENT_RCU(rcu_dyntick,
>  
> - TP_PROTO(const char *polarity, long oldnesting, long newnesting, 
> atomic_t dynticks),
> + TP_PROTO(const char *polarity, const char *context, long oldnesting,
> +  long newnesting, atomic_t dynticks),
>  
> - TP_ARGS(polarity, oldnesting, newnesting, dynticks),
> + TP_ARGS(polarity, context, oldnesting, newnesting, dynticks),
>  
>   TP_STRUCT__entry(
>   __field(const char *, polarity)
> + __field(const char *, context)
>   __field(long, oldnesting)
>   __field(long, newnesting)
>   __field(int, dynticks)
> @@ -462,14 +464,15 @@ TRACE_EVENT_RCU(rcu_dyntick,
>  
>   TP_fast_assign(
>   __entry->polarity = polarity;
> + __entry->context = context;
>   __entry->oldnesting = oldnesting;
>   __entry->newnesting = newnesting;
>   __entry->dynticks = atomic_read(&dynticks);
>   ),
>  
> - TP_printk("%s %lx %lx %#3x", __entry->polarity,
> -   __entry->oldnesting, __entry->newnesting,
> -   __entry->dynticks & 0xfff)
> + TP_printk("%s %s %lx %lx %#3x", __entry->polarity,
> + __entry->context, __entry->oldnesting, __entry->newnesting,
> + __entry->dynticks & 0xfff)
>  );
>  
>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1465a3e406f8..1a65919ec800 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -570,7 +570,8 @@ static void rcu_eqs_enter(bool user)
>   }
>  
>   lockdep_assert_irqs_disabled();
> - trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, 
> rdp->dynticks);
> + trace_rcu_dyntick(TPS("Start"), (user ? TPS("USER") : TPS("IDLE")),
> +   rdp->dynticks_nesting, 0, rdp->dynticks);
>   WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
> !is_idle_task(current));
>   rdp = this_cpu_ptr(&rcu_data);
>   do_nocb_deferred_wakeup(rdp);
> @@ -642,15 +643,17 @@ static __always_inline void rcu_nmi_exit_common(bool 
> irq)
>* leave it in non-RCU-idle state.
>*/
>   if (rdp->dynticks_nesting != 1) {
> - trace_rcu_dyntick(TPS("--="), rdp->dynticks_nesting,
> -   rdp->dynticks_nesting - 2, rdp->dynticks);
> + trace_rcu_dyntick(TPS("StillNonIdle"), TPS("IRQ"),
> +   rdp->dynticks_nesting, rdp->dynticks_nesting 
> - 2,
> +   rdp->dynticks);
>   WRITE_ONCE(rdp->dynticks_nesting, /* No store tearing. */
>  rdp->dynticks_nesting - 2);
>   return;
>   }
>  
>   /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> - trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nesting, 0, 
> rdp->dynticks);
> + trace_rcu_dyntick(TPS("Start"), TPS("IRQ"), rdp->dynticks_nesting, 0,
> +   rdp->dynticks);
>   WRITE_ONCE(rdp->dynticks_nesting, 0); /* Avoid store tearing. */
>  
>   if (irq)
> @@ -733,7 +736,8 @@ static void rcu_eqs_exit(bool user)
>   rcu_dynticks_task_exit();
>   rcu_dynticks_eqs_exit();
>   rcu_cleanup_after_idle();
> - trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks);
> + trace_rcu_dyntick(TPS("End"), (user ? TPS("USER") : TPS("IDLE")),
> +   rdp->dynticks_nesting, 1, rdp->dynticks);
>   WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
> !is_idle_task(current));
>   WRIT

Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) 
> > > > > > wrote:
> > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > 
> > > > > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect 
> > > > > > > first
> > > > > > >   interrupt nesting level.
> > > > > > > 
> > > > > > >   (b) We need to detect half-interrupts till we are sure 
> > > > > > > they're not an
> > > > > > >   issue. However, change the comparison to 
> > > > > > > DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > 
> > > > > > >   (c) When a quiescent state report is needed from a 
> > > > > > > nohz_full CPU.
> > > > > > >   The nesting counter detects we are a first level 
> > > > > > > interrupt.
> > > > > > > 
> > > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. 
> > > > > > > Only the
> > > > > > > outermost interrupt that interrupted an RCU-idle state can set it 
> > > > > > > to 1.
> > > > > > > 
> > > > > > > For (b), this warning condition has not occurred for several 
> > > > > > > kernel
> > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > in_interrupt() instead of the nesting counter. In a later year, 
> > > > > > > we can
> > > > > > > remove the warning.
> > > > > > > 
> > > > > > > For (c), the nest check is not really necessary since forced_tick 
> > > > > > > would
> > > > > > > have been set to true in the outermost interrupt, so the 
> > > > > > > nested/NMI
> > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > 
> > > > > > Skipping the commit log and documentation for this pass.
> > > > > [snip] 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -81,7 +81,6 @@
> > > > > > >  
> > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) 
> > > > > > > = {
> > > > > > >   .dynticks_nesting = 1,
> > > > > > > - .dynticks_nmi_nesting = 0,
> > > > > > 
> > > > > > This should be in the previous patch, give or take naming.
> > > > > 
> > > > > Done.
> > > > > 
> > > > > > >   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > >  };
> > > > > > >  struct rcu_state rcu_state = {
> > > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > >   /* Check for counter underflows */
> > > > > > >   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > > >"RCU dynticks_nesting counter underflow!");
> > > > > > > - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) 
> > > > > > > <= 0,
> > > > > > > -  "RCU dynticks_nmi_nesting counter 
> > > > > > > underflow/zero!");
> > > > > > >  
> > > > > > > - /* Are we at first interrupt nesting level? */
> > > > > > > - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > - return false;
> > > > > > > -
> > > > > > > - /* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > - return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > + /* Are we the outermost interrupt that arrived when RCU was 
> > > > > > > idle? */
> > > > > > > + return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > > > > rcu_do_batch ... */
> > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > >   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > >  
> > > > > > >   /* Entering usermode/idle from interrupt is not handled. These 
> > > > > > > would
> > > > > > > -  * mean usermode upcalls or idle entry happened from 
> > > > > > > interrupts. But,
> > > > > > > -  * reset the counter if we warn.
> > > > > > > +  * mean usermode upcalls or idle exit happened from interrupts. 
> > > > > > > Remove
> > > > > > > +  * the warning by 2020.
> > > > > > >*/
> > > > > > > - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > > - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > > + WARN_ON_ONCE(in_interrupt());
> > > > > > 
> > > > > > And this is a red flag.  Bad things happen should some common code
> > > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > > happening now, but we need to avoid this sort of constraint.
> > > > > > How about instead merging ->dyntick_nesting into the low-order

Re: [PATCH 01/11] ftrace: move recordmcount tools to scripts/ftrace

2019-08-28 Thread Changbin Du
On Mon, Aug 26, 2019 at 06:44:44PM -0400, Steven Rostedt wrote:
> On Sun, 25 Aug 2019 21:23:20 +0800
> Changbin Du  wrote:
> 
> > Move ftrace tools to its own directory. We will add another tool later.
> > 
> > Cc: John F. Reiser 
> > Signed-off-by: Changbin Du 
> > ---
> >  scripts/.gitignore   |  1 -
> >  scripts/Makefile |  2 +-
> >  scripts/Makefile.build   | 10 +-
> >  scripts/ftrace/.gitignore|  4 
> >  scripts/ftrace/Makefile  |  4 
> >  scripts/{ => ftrace}/recordmcount.c  |  0
> >  scripts/{ => ftrace}/recordmcount.h  |  0
> >  scripts/{ => ftrace}/recordmcount.pl |  0
> >  8 files changed, 14 insertions(+), 7 deletions(-)
> >  create mode 100644 scripts/ftrace/.gitignore
> >  create mode 100644 scripts/ftrace/Makefile
> >  rename scripts/{ => ftrace}/recordmcount.c (100%)
> >  rename scripts/{ => ftrace}/recordmcount.h (100%)
> >  rename scripts/{ => ftrace}/recordmcount.pl (100%)
> >  mode change 100755 => 100644
> 
> Note, we are in the process of merging recordmcount with objtool. It
> would be better to continue from that work.
> 
>  
> http://lkml.kernel.org/r/2767f55f4a5fbf30ba0635aed7a9c5ee92ac07dd.1563992889.git.mhels...@vmware.com
> 
> -- Steve
Thanks for reminding. Let me check if prototype tool can merge into
objtool easily after above work.

-- 
Cheers,
Changbin Du


Re: [PATCH v2] vsprintf: introduce %dE for error constants

2019-08-28 Thread Sergey Senozhatsky
On (08/28/19 18:22), Uwe Kleine-König wrote:
> That is wrong. When you do
> 
>   pr_err("There are no round tuits to give out: %dE\n", -ENOENT);
> 
> in a kernel that doesn't support %dE you get:
> 
>   There are no round tuits to give out: -2E

OK. Good point.

-ss


Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Joel Fernandes
On Wed, Aug 28, 2019 at 04:12:47PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) 
> > > > > > > wrote:
> > > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > > 
> > > > > > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to 
> > > > > > > > detect first
> > > > > > > >   interrupt nesting level.
> > > > > > > > 
> > > > > > > >   (b) We need to detect half-interrupts till we are sure 
> > > > > > > > they're not an
> > > > > > > >   issue. However, change the comparison to 
> > > > > > > > DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > > 
> > > > > > > >   (c) When a quiescent state report is needed from a 
> > > > > > > > nohz_full CPU.
> > > > > > > >   The nesting counter detects we are a first level 
> > > > > > > > interrupt.
> > > > > > > > 
> > > > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. 
> > > > > > > > Only the
> > > > > > > > outermost interrupt that interrupted an RCU-idle state can set 
> > > > > > > > it to 1.
> > > > > > > > 
> > > > > > > > For (b), this warning condition has not occurred for several 
> > > > > > > > kernel
> > > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > > in_interrupt() instead of the nesting counter. In a later year, 
> > > > > > > > we can
> > > > > > > > remove the warning.
> > > > > > > > 
> > > > > > > > For (c), the nest check is not really necessary since 
> > > > > > > > forced_tick would
> > > > > > > > have been set to true in the outermost interrupt, so the 
> > > > > > > > nested/NMI
> > > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > > 
> > > > > > > Skipping the commit log and documentation for this pass.
> > > > > > [snip] 
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -81,7 +81,6 @@
> > > > > > > >  
> > > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, 
> > > > > > > > rcu_data) = {
> > > > > > > > .dynticks_nesting = 1,
> > > > > > > > -   .dynticks_nmi_nesting = 0,
> > > > > > > 
> > > > > > > This should be in the previous patch, give or take naming.
> > > > > > 
> > > > > > Done.
> > > > > > 
> > > > > > > > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > > >  };
> > > > > > > >  struct rcu_state rcu_state = {
> > > > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > > > /* Check for counter underflows */
> > > > > > > > 
> > > > > > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > > > >  "RCU dynticks_nesting counter 
> > > > > > > > underflow!");
> > > > > > > > -   
> > > > > > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) 
> > > > > > > > <= 0,
> > > > > > > > -"RCU dynticks_nmi_nesting counter 
> > > > > > > > underflow/zero!");
> > > > > > > >  
> > > > > > > > -   /* Are we at first interrupt nesting level? */
> > > > > > > > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > > -   return false;
> > > > > > > > -
> > > > > > > > -   /* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > > -   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > > +   /* Are we the outermost interrupt that arrived when RCU 
> > > > > > > > was idle? */
> > > > > > > > +   return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > > > > > rcu_do_batch ... */
> > > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > > >  
> > > > > > > > /* Entering usermode/idle from interrupt is not 
> > > > > > > > handled. These would
> > > > > > > > -* mean usermode upcalls or idle entry happened from 
> > > > > > > > interrupts. But,
> > > > > > > > -* reset the counter if we warn.
> > > > > > > > +* mean usermode upcalls or idle exit happened from 
> > > > > > > > interrupts. Remove
> > > > > > > > +* the warning by 2020.
> > > > > > > >  */
> > > > > > > > -   if (WARN_ON_ONCE(rdp->dynticks_

Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

2019-08-28 Thread Paul E. McKenney
On Wed, Aug 28, 2019 at 09:51:55PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 04:12:47PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes 
> > > > > > > > (Google) wrote:
> > > > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > > > 
> > > > > > > > >   (a) rcu_is_cpu_rrupt_from_idle() needs to be able to 
> > > > > > > > > detect first
> > > > > > > > >   interrupt nesting level.
> > > > > > > > > 
> > > > > > > > >   (b) We need to detect half-interrupts till we are sure 
> > > > > > > > > they're not an
> > > > > > > > >   issue. However, change the comparison to 
> > > > > > > > > DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > > > 
> > > > > > > > >   (c) When a quiescent state report is needed from a 
> > > > > > > > > nohz_full CPU.
> > > > > > > > >   The nesting counter detects we are a first level 
> > > > > > > > > interrupt.
> > > > > > > > > 
> > > > > > > > > For (a) we can just use dyntick_nesting == 1 to determine 
> > > > > > > > > this. Only the
> > > > > > > > > outermost interrupt that interrupted an RCU-idle state can 
> > > > > > > > > set it to 1.
> > > > > > > > > 
> > > > > > > > > For (b), this warning condition has not occurred for several 
> > > > > > > > > kernel
> > > > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > > > in_interrupt() instead of the nesting counter. In a later 
> > > > > > > > > year, we can
> > > > > > > > > remove the warning.
> > > > > > > > > 
> > > > > > > > > For (c), the nest check is not really necessary since 
> > > > > > > > > forced_tick would
> > > > > > > > > have been set to true in the outermost interrupt, so the 
> > > > > > > > > nested/NMI
> > > > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > > > 
> > > > > > > > Skipping the commit log and documentation for this pass.
> > > > > > > [snip] 
> > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > @@ -81,7 +81,6 @@
> > > > > > > > >  
> > > > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, 
> > > > > > > > > rcu_data) = {
> > > > > > > > >   .dynticks_nesting = 1,
> > > > > > > > > - .dynticks_nmi_nesting = 0,
> > > > > > > > 
> > > > > > > > This should be in the previous patch, give or take naming.
> > > > > > > 
> > > > > > > Done.
> > > > > > > 
> > > > > > > > >   .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > > > >  };
> > > > > > > > >  struct rcu_state rcu_state = {
> > > > > > > > > @@ -392,15 +391,9 @@ static int 
> > > > > > > > > rcu_is_cpu_rrupt_from_idle(void)
> > > > > > > > >   /* Check for counter underflows */
> > > > > > > > >   
> > > > > > > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 
> > > > > > > > > 0,
> > > > > > > > >"RCU dynticks_nesting counter 
> > > > > > > > > underflow!");
> > > > > > > > > - 
> > > > > > > > > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting)
> > > > > > > > >  <= 0,
> > > > > > > > > -  "RCU dynticks_nmi_nesting counter 
> > > > > > > > > underflow/zero!");
> > > > > > > > >  
> > > > > > > > > - /* Are we at first interrupt nesting level? */
> > > > > > > > > - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > > > - return false;
> > > > > > > > > -
> > > > > > > > > - /* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > > > - return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > > > + /* Are we the outermost interrupt that arrived when RCU 
> > > > > > > > > was idle? */
> > > > > > > > > + return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per 
> > > > > > > > > rcu_do_batch ... */
> > > > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > > > >   struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > > > >  
> > > > > > > > >   /* Entering usermode/idle from interrupt is not 
> > > > > > > > > handled. These would
> > > > > > > > > -  * mean usermode upcalls or idle entry happened from 
> > > > > > > > > interrupts. But,
> > > > > > > > > -  * reset the counter

[PATCH] [RFC] i2c: imx: make use of format specifier %dE

2019-08-28 Thread Uwe Kleine-König
I created a patch that teaches printk et al to emit a symbolic error
name for an error valued integer[1]. With that applied

dev_err(&pdev->dev, "can't enable I2C clock, ret=%dE\n", ret);

emits

... can't enable I2C clock, ret=EIO

if ret is -EIO. Petr Mladek (i.e. one of the printk maintainers) had
concerns if this would be well received and worth the effort. He asked
to present it to a few subsystems. So for now, this patch converting the
i2c-imx driver shouldn't be applied yet but it would be great to get
some feedback about if you think that being able to easily printk (for
example) "EIO" instead of "-5" is a good idea. Would it help you? Do you
think it helps your users?

Thanks
Uwe

[1] https://lkml.org/lkml/2019/8/27/1456
---
 drivers/i2c/busses/i2c-imx.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 15f6cde6452f..359e911cb891 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -289,7 +289,7 @@ static void i2c_imx_dma_request(struct imx_i2c_struct 
*i2c_imx,
if (IS_ERR(dma->chan_tx)) {
ret = PTR_ERR(dma->chan_tx);
if (ret != -ENODEV && ret != -EPROBE_DEFER)
-   dev_err(dev, "can't request DMA tx channel (%d)\n", 
ret);
+   dev_err(dev, "can't request DMA tx channel (%dE)\n", 
ret);
goto fail_al;
}
 
@@ -300,7 +300,7 @@ static void i2c_imx_dma_request(struct imx_i2c_struct 
*i2c_imx,
dma_sconfig.direction = DMA_MEM_TO_DEV;
ret = dmaengine_slave_config(dma->chan_tx, &dma_sconfig);
if (ret < 0) {
-   dev_err(dev, "can't configure tx channel (%d)\n", ret);
+   dev_err(dev, "can't configure tx channel (%dE)\n", ret);
goto fail_tx;
}
 
@@ -308,7 +308,7 @@ static void i2c_imx_dma_request(struct imx_i2c_struct 
*i2c_imx,
if (IS_ERR(dma->chan_rx)) {
ret = PTR_ERR(dma->chan_rx);
if (ret != -ENODEV && ret != -EPROBE_DEFER)
-   dev_err(dev, "can't request DMA rx channel (%d)\n", 
ret);
+   dev_err(dev, "can't request DMA rx channel (%dE)\n", 
ret);
goto fail_tx;
}
 
@@ -319,7 +319,7 @@ static void i2c_imx_dma_request(struct imx_i2c_struct 
*i2c_imx,
dma_sconfig.direction = DMA_DEV_TO_MEM;
ret = dmaengine_slave_config(dma->chan_rx, &dma_sconfig);
if (ret < 0) {
-   dev_err(dev, "can't configure rx channel (%d)\n", ret);
+   dev_err(dev, "can't configure rx channel (%dE)\n", ret);
goto fail_rx;
}
 
@@ -964,7 +964,7 @@ static int i2c_imx_xfer(struct i2c_adapter *adapter,
pm_runtime_put_autosuspend(i2c_imx->adapter.dev.parent);
 
 out:
-   dev_dbg(&i2c_imx->adapter.dev, "<%s> exit with: %s: %d\n", __func__,
+   dev_dbg(&i2c_imx->adapter.dev, "<%s> exit with: %s: %dE\n", __func__,
(result < 0) ? "error" : "success msg",
(result < 0) ? result : num);
return (result < 0) ? result : num;
@@ -1100,7 +1100,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
 
ret = clk_prepare_enable(i2c_imx->clk);
if (ret) {
-   dev_err(&pdev->dev, "can't enable I2C clock, ret=%d\n", ret);
+   dev_err(&pdev->dev, "can't enable I2C clock, ret=%dE\n", ret);
return ret;
}
 
@@ -1108,7 +1108,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
ret = devm_request_irq(&pdev->dev, irq, i2c_imx_isr, IRQF_SHARED,
pdev->name, i2c_imx);
if (ret) {
-   dev_err(&pdev->dev, "can't claim irq %d\n", irq);
+   dev_err(&pdev->dev, "can't claim irq %dE\n", irq);
goto clk_disable;
}
 
@@ -1230,7 +1230,7 @@ static int __maybe_unused i2c_imx_runtime_resume(struct 
device *dev)
 
ret = clk_enable(i2c_imx->clk);
if (ret)
-   dev_err(dev, "can't enable I2C clock, ret=%d\n", ret);
+   dev_err(dev, "can't enable I2C clock, ret=%dE\n", ret);
 
return ret;
 }
-- 
2.23.0



[PATCH] [RFC] tty/serial: imx: make use of format specifier %dE

2019-08-28 Thread Uwe Kleine-König
I created a patch that teaches printk et al to emit a symbolic error
name for an error valued integer[1]. With that applied

dev_err(&pdev->dev, "failed to get ipg clk: %dE\n", ret);

emits

... failed to get ipg clk: EPROBE_DEFER

if ret is -EPROBE_DEFER. Petr Mladek (i.e. one of the printk
maintainers) had concerns if this would be well received and worth the
effort. He asked to present it to a few subsystems. So for now, this
patch converting the imx UART driver shouldn't be applied yet but it
would be great to get some feedback about if you think that being able
to easily printk (for example) "EIO" instead of "-5" is a good idea.
Would it help you? Do you think it helps your users?

Thanks
Uwe

[1] https://lkml.org/lkml/2019/8/27/1456
---
 drivers/tty/serial/imx.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 57d6e6ba556e..a3dbb9378e8b 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -2143,7 +2143,7 @@ static int imx_uart_probe_dt(struct imx_port *sport,
 
ret = of_alias_get_id(np, "serial");
if (ret < 0) {
-   dev_err(&pdev->dev, "failed to get alias id, errno %d\n", ret);
+   dev_err(&pdev->dev, "failed to get alias id, error %dE\n", ret);
return ret;
}
sport->port.line = ret;
@@ -2236,14 +2236,14 @@ static int imx_uart_probe(struct platform_device *pdev)
sport->clk_ipg = devm_clk_get(&pdev->dev, "ipg");
if (IS_ERR(sport->clk_ipg)) {
ret = PTR_ERR(sport->clk_ipg);
-   dev_err(&pdev->dev, "failed to get ipg clk: %d\n", ret);
+   dev_err(&pdev->dev, "failed to get ipg clk: %dE\n", ret);
return ret;
}
 
sport->clk_per = devm_clk_get(&pdev->dev, "per");
if (IS_ERR(sport->clk_per)) {
ret = PTR_ERR(sport->clk_per);
-   dev_err(&pdev->dev, "failed to get per clk: %d\n", ret);
+   dev_err(&pdev->dev, "failed to get per clk: %dE\n", ret);
return ret;
}
 
@@ -2252,7 +2252,7 @@ static int imx_uart_probe(struct platform_device *pdev)
/* For register access, we only need to enable the ipg clock. */
ret = clk_prepare_enable(sport->clk_ipg);
if (ret) {
-   dev_err(&pdev->dev, "failed to enable per clk: %d\n", ret);
+   dev_err(&pdev->dev, "failed to enable per clk: %dE\n", ret);
return ret;
}
 
@@ -2330,7 +2330,7 @@ static int imx_uart_probe(struct platform_device *pdev)
ret = devm_request_irq(&pdev->dev, rxirq, imx_uart_rxint, 0,
   dev_name(&pdev->dev), sport);
if (ret) {
-   dev_err(&pdev->dev, "failed to request rx irq: %d\n",
+   dev_err(&pdev->dev, "failed to request rx irq: %dE\n",
ret);
return ret;
}
@@ -2338,7 +2338,7 @@ static int imx_uart_probe(struct platform_device *pdev)
ret = devm_request_irq(&pdev->dev, txirq, imx_uart_txint, 0,
   dev_name(&pdev->dev), sport);
if (ret) {
-   dev_err(&pdev->dev, "failed to request tx irq: %d\n",
+   dev_err(&pdev->dev, "failed to request tx irq: %dE\n",
ret);
return ret;
}
@@ -2346,7 +2346,7 @@ static int imx_uart_probe(struct platform_device *pdev)
ret = devm_request_irq(&pdev->dev, rtsirq, imx_uart_rtsint, 0,
   dev_name(&pdev->dev), sport);
if (ret) {
-   dev_err(&pdev->dev, "failed to request rts irq: %d\n",
+   dev_err(&pdev->dev, "failed to request rts irq: %dE\n",
ret);
return ret;
}
@@ -2354,7 +2354,7 @@ static int imx_uart_probe(struct platform_device *pdev)
ret = devm_request_irq(&pdev->dev, rxirq, imx_uart_int, 0,
   dev_name(&pdev->dev), sport);
if (ret) {
-   dev_err(&pdev->dev, "failed to request irq: %d\n", ret);
+   dev_err(&pdev->dev, "failed to request irq: %dE\n", 
ret);
return ret;
}
}
-- 
2.23.0