On 08/08/16 03:22, Peter Zijlstra wrote: > That would be the exact scenario I drew a picture of, no? I'm still > failing to see the hole there. > > Please draw a picture like that and illustrate the hole. Hi Peter,
This is the sequence of which I think that it leads to the missed wakeup: Task 1 Task 2 Task 3 Task 4 lock_page() ... lock_page_killable() __lock_page_killable() __wait_on_bit_lock() bit_wait_io() io_schedule() ... lock_page() __lock_page() __wait_on_bit_lock() bit_wait_io() io_schedule() ... (signal delivery to task 2) try_to_wake_up(task2, ..., ...) (try_to_wake_up() returns 1) unlock_page() wake_up_page() __wake_up_bit() __wake_up(wq, TASK_NORMAL, 1, &key) __wake_up_common(wq, mode=TASK_NORMAL, nr_exclusive=1, 0, key) wake_bit_function() autoremove_wake_function() default_wake_function() try_to_wake_up() <- skips task 2 because task 3 already changed the task state of task 2 (autoremove_wake_function() does not do list_del_init(&wait->task_list)) bit_wait_io() returns -EINTR abort_exclusive_wait() is called by __wait_on_bit_lock() In the above sequence task 1 does not remove task 2 from the waitqueue because task 3 had already woken up task 2. The result is that when task 2 calls abort_exclusive_wait() that task 2 is still on the waitqueue. With the current implementation of abort_exclusive_wait() in the above scenario task 4 is not woken up although it should be woken up. Hence the patch that removes the "else" keyword from abort_exclusive_wait(). Bart.