Haren Myneni <ha...@linux.ibm.com> writes: >> Haren Myneni <ha...@linux.ibm.com> writes: >>> The hypervisor returns migration failure if all VAS windows are not >>> closed. During pre-migration stage, vas_migration_handler() sets >>> migration_in_progress flag and closes all windows from the list. >>> The allocate VAS window routine checks the migration flag, setup >>> the window and then add it to the list. So there is possibility of >>> the migration handler missing the window that is still in the >>> process of setup. >>> >>> t1: Allocate and open VAS t2: Migration event >>> window >>> >>> lock vas_pseries_mutex >>> If migration_in_progress set >>> unlock vas_pseries_mutex >>> return >>> open window HCALL >>> unlock vas_pseries_mutex >>> Modify window HCALL lock vas_pseries_mutex >>> setup window migration_in_progress=true >>> Closes all windows from >>> the list >>> unlock vas_pseries_mutex >>> lock vas_pseries_mutex return >>> if nr_closed_windows == 0 >>> // No DLPAR CPU or migration >>> add to the list >>> unlock vas_pseries_mutex >>> return >>> unlock vas_pseries_mutex >>> Close VAS window >>> // due to DLPAR CPU or migration >>> return -EBUSY >> >> Could the the path t1 takes simply hold the mutex for the duration of >> its execution instead of dropping and reacquiring it in the middle? >> >> Here's the relevant code from vas_allocate_window(): >> >> mutex_lock(&vas_pseries_mutex); >> if (migration_in_progress) >> rc = -EBUSY; >> else >> rc = allocate_setup_window(txwin, (u64 *)&domain[0], >> cop_feat_caps->win_type); >> mutex_unlock(&vas_pseries_mutex); >> if (rc) >> goto out; >> >> rc = h_modify_vas_window(txwin); >> if (!rc) >> rc = get_vas_user_win_ref(&txwin->vas_win.task_ref); >> if (rc) >> goto out_free; >> >> txwin->win_type = cop_feat_caps->win_type; >> mutex_lock(&vas_pseries_mutex); >> if (!caps->nr_close_wins) { >> list_add(&txwin->win_list, &caps->list); >> caps->nr_open_windows++; >> mutex_unlock(&vas_pseries_mutex); >> vas_user_win_add_mm_context(&txwin->vas_win.task_ref); >> return &txwin->vas_win; >> } >> mutex_unlock(&vas_pseries_mutex); >> >> Is there something about h_modify_vas_window() or get_vas_user_win_ref() >> that requires temporarily dropping the lock? >> > > Thanks Nathan for your comments. > > vas_pseries_mutex protects window ID and IRQ allocation between alloc > and free window HCALLs, and window list. Generally try to not using > mutex in HCALLs, but we need this mutex with these HCALLs. > > We can add h_modify_vas_window() or get_vas_user_win_ref() with in the > mutex context, but not needed.
Hmm. I contend that it would fix your bug in a simpler way that eliminates the race instead of coping with it by adding more state and complicating the locking model. With your change, readers of the migration_in_progress flag check it under the mutex, but the writer updates it outside of the mutex, which seems strange and unlikely to be correct. > Also free HCALL can take longer depends > on pending NX requests since the hypervisor waits for all requests to be > completed before closing the window. > > Applications can issue window open / free calls at the same time which > can experience mutex contention especially on the large system where > 100's of credits are available. Another example: The migration event can > wait longer (or timeout) to get this mutex if many threads issue > open/free window calls. Hence added h_modify_vas_window() (modify HCALL) > or get_vas_user_win_ref() outside of mutex. OK. I believe you're referring to this code, which can run under the lock: static long hcall_return_busy_check(long rc) { /* Check if we are stalled for some time */ if (H_IS_LONG_BUSY(rc)) { msleep(get_longbusy_msecs(rc)); rc = H_BUSY; } else if (rc == H_BUSY) { cond_resched(); } return rc; } ... static int h_deallocate_vas_window(u64 winid) { long rc; do { rc = plpar_hcall_norets(H_DEALLOCATE_VAS_WINDOW, winid); rc = hcall_return_busy_check(rc); } while (rc == H_BUSY); ... [ with similar loops in the window allocate and modify functions ] If get_longbusy_msecs() typically returns low values (<20), then you should prefer usleep_range() over msleep() to avoid over-sleeping. For example, msleep(1) can schedule away for much longer than 1ms -- often more like 20ms. If mutex hold times have been a problem in this code, then it's probably worth improving the way it handles the busy/retry statuses under the lock.