Hello, On Tue, May 09, 2017 at 08:04:24PM -0700, Hugh Dickins wrote: > On Mon, 8 May 2017, Joonas Lahtinen wrote: > > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote: > > > On Fri, 5 May 2017, Joonas Lahtinen wrote: > > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote: > > > > > Thanx for the reply. > > > > > > > > > > Andrea Arcangeli: > > > > > > > > > > > > Yes I already reported this, my original fix was way more efficient > > > > > > (and also safer considering the above) than what landed upstream. My > > > > > > feedback was ignored though. > > > > > > > > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html > > > > > > > > > > I see. > > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd > > > > > and > > > > > others all stopped working due to the synchronize_rcu_expedited call > > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as > > > > > an i915 user. > > > > > > > > Filing a bug in freedesktop.org with all the details is the fastest way > > > > of getting help. Without the bug (and with such little information as > > > > the previous e-mail) it's hard to estimate the extent and nature of the > > > > bug. > > > > > > > > I've anyway gone and prepared a patch to drop the RCU sync completely > > > > from shrinker phase, as discussed originally with Chris. > > > > > > Is that a patch that will be suitable for 4.11-stable? Please do post > > > it here. I had not experienced this i915-induced hang at all when > > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11 > > > final I can get it fairly easily (I haven't tried Andrea's fix yet). > > > > Please try: > > > > https://patchwork.freedesktop.org/patch/154713/ > > > > If it works, a Tested-by: would be appreciated. > > Yes, that works for me, thank you. > > Tested-by: Hugh Dickins <hu...@google.com> > > But the linked patch seems to be lacking a Reported-by (not me) tag, > a Fixes tag, a Cc stable tag, and any indication in the Subject or > commit message that this patch is something needed to fix hangs > observed by several people - it just sounds like a minor cleanup.
It works for me too. I'm running my workstation also with synchronize_rcu removed from i915_gem_shrink_all in addition to the above. Isn't the oom method invoked from reclaim context too? As far as I can tell synchronize_rcu can end up throttling on a background synchronize_rcu_expedited(), so it might end up in the same issue unless removed too. Tested-by: Andrea Arcangeli <aarca...@redhat.com> (I can't reproduce the lockups 100% of the time, but they never happened again with this patch and I happened to run the load that reproduces them a couple of times already with v4.11 and this patch applied) It's also certainly improving performance by removing the synchronize_rcu_expedited from the _count methods where it was useless (in addition to unsafe). Thanks, Andrea