在 2020/8/5 5:56, Richard Weinberger 写道:
On Tue, Aug 4, 2020 at 4:58 AM Zhihao Cheng <chengzhih...@huawei.com> wrote:
Oh, you're thinking about influence by schedule(), I get it. But I think
it still works. Because the ubi_thread is still on runqueue, it will be
scheduled to execute later anyway.
It will not get woken. This is the problem.

op                                                    state of
ubi_thread           on runqueue
set_current_state(TASK_INTERRUPTIBLE) TASK_INTERRUPTIBLE              Yes
if (kthread_should_stop()) // not satisfy
TASK_INTERRUPTIBLE              Yes
kthread_stop:
    wake_up_process
      ttwu_queue
        ttwu_do_activate
          ttwu_do_wakeup TASK_RUNNING                       Yes
schedule
    __schedule(false)

   // prev->state is TASK_RUNNING, so we cannot move it from runqueue by
deactivate_task(). So just pick next task to execute, ubi_thread is
still on runqueue and will be scheduled to execute later.
It will be in state TASK_RUNNING only if your check is reached.

If kthread_stop() is called *before* your code:
+                       if (kthread_should_stop()) {
+                               set_current_state(TASK_RUNNING);
+                               break;
+                       }

...everything is fine.
But there is still a race window between your if
(kthread_should_stop()) and schedule() in the next line.
So if kthread_stop() is called right *after* the if and *before*
schedule(), the task state is still TASK_INTERRUPTIBLE
--> schedule() will not return unless the task is explicitly woken,
which does not happen.
Er, I can't get the point. I can list two possible situations, did I miss other situations?

P1:ubi_thread
  set_current_state(TASK_INTERRUPTIBLE)
  if (kthread_should_stop()) {
    set_current_state(TASK_RUNNING)
    break
  }
  schedule()                            -> don't *remove* task from runqueue if *TASK_RUNNING*, removing operation is protected by rq_lock

P2:kthread_stop
  set_bit(KTHREAD_SHOULD_STOP, &kthread->flags)
  wake_up_process(k)             -> enqueue task & set *TASK_RUNNING*, these two operations are protected by rq_lock
  wait_for_completion(&kthread->exited)


Situation 1:
P1_set_current_state               on-rq, TASK_RUNNING -> TASK_INTERRUPTIBLE
P1_kthread_should_stop        on-rq, TASK_INTERRUPTIBLE
P2_set_bit                               on-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P2_wake_up_process             on-rq, TASK_INTERRUPTIBLE -> TASK_RUNNING , KTHREAD_SHOULD_STOP P1_schedule                           on-rq, TASK_RUNNING , KTHREAD_SHOULD_STOP
P2_wait_for_completion        // wait for P1 exit

Situation 2:
P1_set_current_state             on-rq, TASK_RUNNING -> TASK_INTERRUPTIBLE
P1_kthread_should_stop       on-rq, TASK_INTERRUPTIBLE
P2_set_bit                             on-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P1_schedule                          off-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P2_wake_up_process             on-rq, TASK_INTERRUPTIBLE -> TASK_RUNNING , KTHREAD_SHOULD_STOP
P2_wait_for_completion       // wait for P1 exit
Before your patch, the race window was much larger, I fully agree, but
your patch does not cure the problem
it just makes it harder to hit.

And using mdelay() to verify such a thing is also tricky because
mdelay() will influence the task state.



Reply via email to