On Tue 27-11-18 09:08:50, Linus Torvalds wrote: > On Mon, Nov 26, 2018 at 10:24 PM kernel test robot > <rong.a.c...@intel.com> wrote: > > > > FYI, we noticed a -61.3% regression of vm-scalability.throughput due > > to commit ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for > > MADV_HUGEPAGE mappings") > > Well, that's certainly noticeable and not good. > > Andrea, I suspect it might be causing fights with auto numa migration.. > > Lots more system time, but also look at this: > > > 1122389 ± 9% +17.2% 1315380 ± 4% proc-vmstat.numa_hit > > 214722 ± 5% +21.6% 261076 ± 3% > > proc-vmstat.numa_huge_pte_updates > > 1108142 ± 9% +17.4% 1300857 ± 4% proc-vmstat.numa_local > > 145368 ± 48% +63.1% 237050 ± 17% proc-vmstat.numa_miss > > 159615 ± 44% +57.6% 251573 ± 16% proc-vmstat.numa_other > > 185.50 ± 81% +8278.6% 15542 ± 40% > > proc-vmstat.numa_pages_migrated > > Should the commit be reverted? Or perhaps at least modified?
Well, the commit is trying to revert to the behavior before 5265047ac301 because there are real usecases that suffered from that change and bug reports as a result of that. will-it-scale is certainly worth considering but it is an artificial testcase. A higher NUMA miss rate is an expected side effect of the patch because the fallback to a different NUMA node is more likely. The __GFP_THISNODE side effect is basically introducing node-reclaim behavior for THPages. Another thing is that there is no good behavior for everybody. Reclaim locally vs. THP on a remote node is hard to tell by default. We have discussed that at length and there were some conclusions. One of them is that we need a numa policy to tell whether a expensive localility is preferred over remote allocation. Also we definitely need a better pro-active defragmentation to allow larger pages on a local node. This is a work in progress and this patch is a stop gap fix. -- Michal Hocko SUSE Labs