On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote:
> This could happen with the old scheme where exclusiveness
> was stored in the task, not the waitqueue.
> 
> >From test4:
> 
>         for (;;) {
>                 __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE);
>       /* WINDOW */
>                 spin_lock_irq(&io_request_lock);
>                 rq = get_request(q, rw);
>                 spin_unlock_irq(&io_request_lock);
>                 if (rq)
>                         break;
>                 generic_unplug_device(q);
>                 schedule();
>         }
> 
> As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this
> task becomes visible to __wake_up_common and can soak
> up a second wakeup. [..]

I disagree, none wakeup could be lost in test4 in the above section of code
(besides the __set_current_state that should be set_current_state ;) and that's
not an issue for both x86 and alpha where spin_lock is a full memory barrier).

Let's examine what happens in test4:

1) before the `for' loop, the task is just visible to __wake_up_common

2) as soon as we set the task state to TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE
   the task _start_ asking for an exclusive wakeup from __wake_up_common

At this point consider two different cases:

3a) assume we get a wakup in the /* WINDOW */: the task->state become
    runnable from under us (no problem, that's expected). And if it's true we
    got a wakeup in /* WINDOW */, then it means get_request will also return
    a valid rq and we'll break the loop and we'll use the resource. No wakeup
    lost in this case. So far so good. This was the 1 wakeup case.

3b) Assume _two_ wakeups happens in /* WINDOW */: the first one just make
    us RUNNABLE as in the 3a) case, the second one will wakeup any _other_
    wake-one task _after_ us in the wait queue (normal FIFO order), because the
    first wakeup implicitly made us RUNNABLE (not anymore TASK_EXCLUSIVE so we
    were not asking anymore for an exclusive wakeup).

    The task after us may as well be running in /* WINDOW */ and in such case
    it will get as well a `rq' from get_request (as in case 3a) without having
    to sleep). All right.

    No wakeup could be lost in any case. Two wakeups happened and two
    tasks got their I/O request even if both wakeups happend in /* WINDOW */.
    Example with N wakeups and N tasks is equivalent.

This race is been introduced in test1X because there the task keeps asking for
an exclusive wakeup even after getting a wakeup: until it has the time to
cleanup and unregister explicitly.

And btw, 2.2.19pre3 has the same race, while the alternate wake-one
implementation in 2.2.18aa2 doesn't have it (for the same reasons).

And /* WINDOW */ is not the only window for such race: the window is the whole
section where the task is registered in the waitqueue in exclusive mode:

        add_wait_queue_exclusive
        /* all is WINDOW here */
        remove_wait_queue

Until remove_wait_queue runs explicitly in the task context any second wakeup
will keep waking up only such task (so in turn it will be lost). So it should
trigger very easily under high load since two wakeups will happen much faster
compared to the task that needs to be rescheduled in order run
remove_wait_queue and cleanup.

Any deadlocks in test1X could be very well explained by this race condition.

> Still, changing the wakeup code in the way we've discussed
> seems the way to go. [..]

I agree. I'm quite convinced it's right way too and of course I see why we
moved the exclusive information in the waitqueue instead of keeping it in the
task struct to provide sane semantics in the long run ;).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to