vas: Migration suspend waits for no in-progress open windows

Nathan Lynch Wed, 11 Oct 2023 13:37:28 -0700

Haren Myneni <ha...@linux.ibm.com> writes:
>> Haren Myneni <ha...@linux.ibm.com> writes:
>>> The hypervisor returns migration failure if all VAS windows are not
>>> closed. During pre-migration stage, vas_migration_handler() sets
>>> migration_in_progress flag and closes all windows from the list.
>>> The allocate VAS window routine checks the migration flag, setup
>>> the window and then add it to the list. So there is possibility of
>>> the migration handler missing the window that is still in the
>>> process of setup.
>>>
>>> t1: Allocate and open VAS   t2: Migration event
>>>      window
>>>
>>> lock vas_pseries_mutex
>>> If migration_in_progress set
>>>    unlock vas_pseries_mutex
>>>    return
>>> open window HCALL
>>> unlock vas_pseries_mutex
>>> Modify window HCALL         lock vas_pseries_mutex
>>> setup window                        migration_in_progress=true
>>>                             Closes all windows from
>>>                             the list
>>>                             unlock vas_pseries_mutex
>>> lock vas_pseries_mutex              return
>>> if nr_closed_windows == 0
>>>    // No DLPAR CPU or migration
>>>    add to the list
>>>    unlock vas_pseries_mutex
>>>    return
>>> unlock vas_pseries_mutex
>>> Close VAS window
>>> // due to DLPAR CPU or migration
>>> return -EBUSY
>> 
>> Could the the path t1 takes simply hold the mutex for the duration of
>> its execution instead of dropping and reacquiring it in the middle?
>> 
>> Here's the relevant code from vas_allocate_window():
>> 
>>      mutex_lock(&vas_pseries_mutex);
>>      if (migration_in_progress)
>>              rc = -EBUSY;
>>      else
>>              rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>>                                 cop_feat_caps->win_type);
>>      mutex_unlock(&vas_pseries_mutex);
>>      if (rc)
>>              goto out;
>> 
>>      rc = h_modify_vas_window(txwin);
>>      if (!rc)
>>              rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>>      if (rc)
>>              goto out_free;
>> 
>>      txwin->win_type = cop_feat_caps->win_type;
>>      mutex_lock(&vas_pseries_mutex);
>>      if (!caps->nr_close_wins) {
>>              list_add(&txwin->win_list, &caps->list);
>>              caps->nr_open_windows++;
>>              mutex_unlock(&vas_pseries_mutex);
>>              vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>>              return &txwin->vas_win;
>>      }
>>      mutex_unlock(&vas_pseries_mutex);
>> 
>> Is there something about h_modify_vas_window() or get_vas_user_win_ref()
>> that requires temporarily dropping the lock?
>> 
>
> Thanks Nathan for your comments.
>
> vas_pseries_mutex protects window ID and IRQ allocation between alloc 
> and free window HCALLs, and window list. Generally try to not using 
> mutex in HCALLs, but we need this mutex with these HCALLs.
>
> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the 
> mutex context, but not needed.


Hmm. I contend that it would fix your bug in a simpler way that
eliminates the race instead of coping with it by adding more state and
complicating the locking model. With your change, readers of the
migration_in_progress flag check it under the mutex, but the writer
updates it outside of the mutex, which seems strange and unlikely to be
correct.

> Also free HCALL can take longer depends 
> on pending NX requests since the hypervisor waits for all requests to be 
> completed before closing the window.
>
> Applications can issue window open / free calls at the same time which 
> can experience mutex contention especially on the large system where 
> 100's of credits are available. Another example: The migration event can 
> wait longer (or timeout) to get this mutex if many threads issue 
> open/free window calls. Hence added h_modify_vas_window() (modify HCALL) 
> or get_vas_user_win_ref() outside of mutex.

OK. I believe you're referring to this code, which can run under the lock:

static long hcall_return_busy_check(long rc)
{
        /* Check if we are stalled for some time */
        if (H_IS_LONG_BUSY(rc)) {
                msleep(get_longbusy_msecs(rc));
                rc = H_BUSY;
        } else if (rc == H_BUSY) {
                cond_resched();
        }

        return rc;
}

...

static int h_deallocate_vas_window(u64 winid)
{
        long rc;

        do {
                rc = plpar_hcall_norets(H_DEALLOCATE_VAS_WINDOW, winid);

                rc = hcall_return_busy_check(rc);
        } while (rc == H_BUSY);

...

[ with similar loops in the window allocate and modify functions ]

If get_longbusy_msecs() typically returns low values (<20), then you
should prefer usleep_range() over msleep() to avoid over-sleeping. For
example, msleep(1) can schedule away for much longer than 1ms -- often
more like 20ms.

If mutex hold times have been a problem in this code, then it's probably
worth improving the way it handles the busy/retry statuses under the
lock.

Re: [PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows

Reply via email to