On Mon, Jul 06, 2020 at 05:20:57PM -0400, Dave Jones wrote:
> On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote:
>  > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote:
>  > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote:
>  > >  
>  > > looked promising the first few hours, but as soon as it hit four hours
>  > > of uptime, loadavg spiked and is now pinned to at least 1.00
>  > 
>  > OK, lots of cursing later, I now have the below...
>  > 
>  > The TL;DR is that while schedule() doesn't change p->state once it
>  > starts, it does read it quite a bit, and ttwu() will actually change it
>  > to TASK_WAKING. So if ttwu() changes it to WAKING before schedule()
>  > reads it to do loadavg accounting, things go sideways.
>  > 
>  > The below is extra complicated by the fact that I've had to scrounge up
>  > a bunch of load-store ordering without actually adding barriers. It adds
>  > yet another control dependency to ttwu(), so take that C standard :-)
> 
> Man this stuff is subtle. I could've read this a hundred times and not
> even come close to approaching this.
> 
> Basically me reading scheduler code:
> http://www.quickmeme.com/img/96/9642ed212bbced00885592b39880ec55218e922245e0637cf94db2e41857d558.jpg

Heh, that one made me nearly spill my tea, much funnies :-)

But yes, Dave Chinner also complained about this for the previous fix.
I've written this:

  
https://lore.kernel.org/lkml/20200703133259.ge4...@hirez.programming.kicks-ass.net/

to help with that. But clearly I'll need to update that patch again
after this little adventure.

>  > I've booted it, and build a few kernels with it and checked loadavg
>  > drops to 0 after each build, so from that pov all is well, but since
>  > I'm not confident I can reproduce the issue, I can't tell this actually
>  > fixes anything, except maybe phantoms of my imagination.
> 
> Five hours in, looking good so far.  I think you nailed it.

\o/ hooray! Thanks for testing Dave!

Reply via email to