Since this spinlock will only serialize migrate rate limiting, convert the spinlock to a trylock. If another task races ahead of this task then this task can simply move on.
While here, add correct two abnormalities. - Avoid time being stretched for every interval. - Use READ/WRITE_ONCE with next window. specjbb2005 / bops/JVM / higher bops are better on 2 Socket/2 Node Intel JVMS Prev Current %Change 4 206350 200892 -2.64502 1 319963 325766 1.81365 on 2 Socket/2 Node Power9 (PowerNV) JVMS Prev Current %Change 4 186539 190261 1.99529 1 220344 195305 -11.3636 on 4 Socket/4 Node Power7 JVMS Prev Current %Change 8 56836 57651.1 1.43413 1 112970 111351 -1.43312 dbench / transactions / higher numbers are better on 2 Socket/2 Node Intel count Min Max Avg Variance %Change 5 13136.1 13170.2 13150.2 14.7482 5 12254.7 12331.9 12297.8 28.1846 -6.48203 on 2 Socket/4 Node Power8 (PowerNV) count Min Max Avg Variance %Change 5 4319.79 4998.19 4836.53 261.109 5 4997.83 5030.14 5015.54 12.947 3.70121 on 2 Socket/2 Node Power9 (PowerNV) count Min Max Avg Variance %Change 5 9325.56 9402.7 9362.49 25.9638 5 9331.84 9375.11 9352.04 16.0703 -0.111616 on 4 Socket/4 Node Power7 count Min Max Avg Variance %Change 5 132.581 191.072 170.554 21.6444 5 147.55 181.605 168.963 11.3513 -0.932842 Signed-off-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com> Suggested-by: Peter Zijlstra <pet...@infradead.org> --- Changelog v1->v2: Fix stretch every interval pointed by Peter Zijlstra. Verified that some of the regression is due to fixing interval stretch. mm/migrate.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 8c0af0f..dbc2cb7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1868,16 +1868,24 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static bool numamigrate_update_ratelimit(pg_data_t *pgdat, unsigned long nr_pages) { + unsigned long next_window, interval; + + next_window = READ_ONCE(pgdat->numabalancing_migrate_next_window); + interval = msecs_to_jiffies(migrate_interval_millisecs); + /* * Rate-limit the amount of data that is being migrated to a node. * Optimal placement is no good if the memory bus is saturated and * all the time is being spent migrating! */ - if (time_after(jiffies, next_window)) { - spin_lock(&pgdat->numabalancing_migrate_lock); + if (time_after(jiffies, next_window) && + spin_trylock(&pgdat->numabalancing_migrate_lock)) { pgdat->numabalancing_migrate_nr_pages = 0; - pgdat->numabalancing_migrate_next_window = jiffies + - msecs_to_jiffies(migrate_interval_millisecs); + do { + next_window += interval; + } while (unlikely(time_after(jiffies, next_window))); + + WRITE_ONCE(pgdat->numabalancing_migrate_next_window, next_window); spin_unlock(&pgdat->numabalancing_migrate_lock); } if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { -- 1.8.3.1