On Wed, 21 Mar 2001, Linus Torvalds wrote:

> On Wed, 21 Mar 2001, Linus Torvalds wrote:
> >
> > The deadlock implies that somebody scheduled with page_table_lock held.
> > Which would be really bad.
>
> ..and it is probably do_swap_page().
>
> Despite the name, "lookup_swap_cache()" does more than a lookup - it will
> wait for the page that it looked up. And we call it with the
> page_table_lock held in do_swap_page().

Darn, you're too quick.  (just figured it out and was about to report:)

Trace; c012785b <__lock_page+83/ac>
Trace; c012789b <lock_page+17/1c>
Trace; c01279a1 <__find_lock_page+81/f0>
Trace; c013008b <lookup_swap_cache+4b/164>
Trace; c0125362 <do_swap_page+12/1cc>
Trace; c012571f <handle_mm_fault+77/c4>
Trace; c01148b4 <do_page_fault+0/426>
Trace; c0114a17 <do_page_fault+163/426>
Trace; c01148b4 <do_page_fault+0/426>
Trace; c011581e <schedule+3e6/5ec>
Trace; c010908c <error_code+34/3c>

> Ho humm. Does the appended patch fix it for you? Looks obvious enough, but
> this bug is actually hidden on true SMP, and I'm too lazy to test with
> "num_cpus=1" or something..

I'm sure it will, but will be back in a few with confirmation.

>
>               Linus
>
> -----
> diff -u --recursive --new-file pre6/linux/mm/memory.c linux/mm/memory.c
> --- pre6/linux/mm/memory.c    Tue Mar 20 23:13:03 2001
> +++ linux/mm/memory.c Wed Mar 21 22:21:27 2001
> @@ -1031,18 +1031,20 @@
>       struct vm_area_struct * vma, unsigned long address,
>       pte_t * page_table, swp_entry_t entry, int write_access)
>  {
> -     struct page *page = lookup_swap_cache(entry);
> +     struct page *page;
>       pte_t pte;
>
> +     spin_unlock(&mm->page_table_lock);
> +     page = lookup_swap_cache(entry);
>       if (!page) {
> -             spin_unlock(&mm->page_table_lock);
>               lock_kernel();
>               swapin_readahead(entry);
>               page = read_swap_cache(entry);
>               unlock_kernel();
> -             spin_lock(&mm->page_table_lock);
> -             if (!page)
> +             if (!page) {
> +                     spin_lock(&mm->page_table_lock);
>                       return -1;
> +             }
>
>               flush_page_to_ram(page);
>               flush_icache_page(vma, page);
> @@ -1053,13 +1055,13 @@
>        * Must lock page before transferring our swap count to already
>        * obtained page count.
>        */
> -     spin_unlock(&mm->page_table_lock);
>       lock_page(page);
> -     spin_lock(&mm->page_table_lock);
>
>       /*
> -      * Back out if somebody else faulted in this pte while we slept.
> +      * Back out if somebody else faulted in this pte while we
> +      * released the page table lock.
>        */
> +     spin_lock(&mm->page_table_lock);
>       if (pte_present(*page_table)) {
>               UnlockPage(page);
>               page_cache_release(page);

(oh well, I had the right idea at least)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to