On 10/16/23 1:30 PM, Nathan Lynch wrote:
Nathan Lynch <nath...@linux.ibm.com> writes:
Haren Myneni <ha...@linux.ibm.com> writes:
Haren Myneni <ha...@linux.ibm.com> writes:
The hypervisor returns migration failure if all VAS windows are not
closed. During pre-migration stage, vas_migration_handler() sets
migration_in_progress flag and closes all windows from the list.
The allocate VAS window routine checks the migration flag, setup
the window and then add it to the list. So there is possibility of
the migration handler missing the window that is still in the
process of setup.

t1: Allocate and open VAS       t2: Migration event
      window

lock vas_pseries_mutex
If migration_in_progress set
    unlock vas_pseries_mutex
    return
open window HCALL
unlock vas_pseries_mutex
Modify window HCALL             lock vas_pseries_mutex
setup window                    migration_in_progress=true
                                Closes all windows from
                                the list
                                unlock vas_pseries_mutex
lock vas_pseries_mutex          return
if nr_closed_windows == 0
    // No DLPAR CPU or migration
    add to the list
    unlock vas_pseries_mutex
    return
unlock vas_pseries_mutex
Close VAS window
// due to DLPAR CPU or migration
return -EBUSY

Could the the path t1 takes simply hold the mutex for the duration of
its execution instead of dropping and reacquiring it in the middle?

Here's the relevant code from vas_allocate_window():

        mutex_lock(&vas_pseries_mutex);
        if (migration_in_progress)
                rc = -EBUSY;
        else
                rc = allocate_setup_window(txwin, (u64 *)&domain[0],
                                   cop_feat_caps->win_type);
        mutex_unlock(&vas_pseries_mutex);
        if (rc)
                goto out;

        rc = h_modify_vas_window(txwin);
        if (!rc)
                rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
        if (rc)
                goto out_free;

        txwin->win_type = cop_feat_caps->win_type;
        mutex_lock(&vas_pseries_mutex);
        if (!caps->nr_close_wins) {
                list_add(&txwin->win_list, &caps->list);
                caps->nr_open_windows++;
                mutex_unlock(&vas_pseries_mutex);
                vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
                return &txwin->vas_win;
        }
        mutex_unlock(&vas_pseries_mutex);

Is there something about h_modify_vas_window() or get_vas_user_win_ref()
that requires temporarily dropping the lock?


Thanks Nathan for your comments.

vas_pseries_mutex protects window ID and IRQ allocation between alloc
and free window HCALLs, and window list. Generally try to not using
mutex in HCALLs, but we need this mutex with these HCALLs.

We can add h_modify_vas_window() or get_vas_user_win_ref() with in the
mutex context, but not needed.

Hmm. I contend that it would fix your bug in a simpler way that
eliminates the race instead of coping with it by adding more state and
complicating the locking model. With your change, readers of the
migration_in_progress flag check it under the mutex, but the writer
updates it outside of the mutex, which seems strange and unlikely to be
correct.

Expanding on this, with your change, migration_in_progress becomes a
boolean atomic_t flag accessed only with atomic_set() and
atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt
says:

   Non-RMW ops:

   The non-RMW ops are (typically) regular LOADs and STOREs and are
   canonically implemented using READ_ONCE(), WRITE_ONCE(),
   smp_load_acquire() and smp_store_release() respectively. Therefore, if
   you find yourself only using the Non-RMW operations of atomic_t, you
   do not in fact need atomic_t at all and are doing it wrong.

So making migration_in_progress an atomic_t does not confer any
advantageous properties to it that it lacks as a plain boolean.

Considering also (from the same document):

  - non-RMW operations are unordered;

  - RMW operations that have no return value are unordered;

I am concerned that threads executing these segments of code will not
always observe each others' effects in the intended order:

// vas_allocate_window()

         mutex_lock(&vas_pseries_mutex);
         if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) {
                 list_add(&txwin->win_list, &caps->list);
                 caps->nr_open_windows++;
                 atomic_dec(&caps->nr_open_wins_progress);
                 mutex_unlock(&vas_pseries_mutex);
                 vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
                 return &txwin->vas_win;
         }
         mutex_unlock(&vas_pseries_mutex);
         ...
         atomic_dec(&caps->nr_open_wins_progress);
         wake_up(&open_win_progress_wq);

// vas_migration_handler()

                 atomic_set(&migration_in_progress, 1);
                 ...
                         mutex_lock(&vas_pseries_mutex);
                         rc = reconfig_close_windows(vcaps, 
vcaps->nr_open_windows,
                                                         true);
                         mutex_unlock(&vas_pseries_mutex);
                         /*
                          * Windows are included in the list after successful
                          * open. So wait for closing these in-progress open
                          * windows in vas_allocate_window() which will be
                          * done if the migration_in_progress is set.
                          */
                         rc = wait_event_interruptible(open_win_progress_wq,
                                 !atomic_read(&vcaps->nr_open_wins_progress));

Maybe it's OK in practice for some reason? I'm finding it difficult to
reason about :-)


Thanks for the review.

Should be OK in this case since holding mutex before reading. But as you pointed, migration_in_progress flag should be just boolean. I will repost the patch with this change.

Thanks
Haren

Reply via email to